GPU compute · mum-1a
VRAM by the hour, not by the contract.
Dedicated Nvidia RTX Pro Blackwell cards — 32, 48, or 96 GiB of VRAM — on instances backed with NVMe storage. Built for LLMs, GPU inference, and professional visualization. Billed hourly, on-demand: you're renting silicon, not signing a multi-year commit.
$ exc compute create \
--name infer-1 \
--image_id 1 \
--instance_type nv2a.xlarge \
--subnet_id 1 \
--ssh_pubkey my-key \
--wait
✓ instance infer-1 running
$ █ The rate readout
Three cards. Three numbers.
Every type ships a whole, dedicated GPU with an EBS disk, metered hourly in Mumbai
(mum-1a). The card is in the suffix: an nv1a.4xlarge ships an RTX 6000 Pro
Blackwell.
| Instance | GPU | vCPU | RAM | VRAM | Rate |
|---|---|---|---|---|---|
| nv2a.xlarge | Nvidia RTX 4500 Pro Blackwell | 4 | 16 GiB | 32 GiB | ₹44.554/hr |
| nv3a.2xlarge | Nvidia RTX 5000 Pro Blackwell | 8 | 32 GiB | 48 GiB | ₹63.849/hr |
| nv1a.4xlarge | Nvidia RTX 6000 Pro Blackwell | 16 | 64 GiB | 96 GiB | ₹126.784/hr |
Network is the same flat card as everything else: egress ₹1/GiB, ingress free, public IPv4 ₹0.3/hr, IPv6 free. Full details on the GPU pricing page.
The honest part
GPU access is quota-gated.
You can't launch a GPU instance on a fresh account: GPU types require a quota
increase first. Email
support@excloud.dev and tell us what you're
running — a human reads it and raises your quota. We'd rather say this plainly than
have you find out at create time.
Request a GPU quota
One email, no form. Once granted, GPU instances behave like any other compute instance — created, stopped, and deleted from the same CLI and console.
Email support@excloud.devWhat you actually get
A whole card on an ordinary VM.
Dedicated, not sliced
Every GPU instance ships the entire card. Your VRAM is yours — 32, 48, or 96 GiB of it — with NVMe-backed storage underneath.
Just a compute instance
GPU types are regular compute instances: same images, same subnets, same instance lifecycle. If you can run a VM here, you can run a GPU.
Built for inference and pixels
Sized for LLMs, GPU inference, and professional visualization workloads — pick the card by how much model (or scene) you need resident in VRAM.
On-demand, metered hourly
No reservations, no commitments. Spin one up for an afternoon of fine-tuning, delete it, and pay for the hours it existed.
Want tokens instead of CUDA? We also host Qwen3.6-27B inference at ₹20/1M input and ₹60/1M output tokens — no GPU, no quota email.
The cards are in the racks.
Get your quota raised, run one exc compute create, and start pushing
tensors from Mumbai.