GPU compute · mum-1a

VRAM by the hour, not by the contract.

Dedicated Nvidia RTX Pro Blackwell cards — 32, 48, or 96 GiB of VRAM — on instances backed with NVMe storage. Built for LLMs, GPU inference, and professional visualization. Billed hourly, on-demand: you're renting silicon, not signing a multi-year commit.

mum-1a · zsh nv2a.xlarge
$ exc compute create \
    --name infer-1 \
    --image_id 1 \
    --instance_type nv2a.xlarge \
    --subnet_id 1 \
    --ssh_pubkey my-key \
    --wait
✓ instance infer-1 running
$ █
₹44.554/hr smallest card, nv2a.xlarge
96 GiB max VRAM, RTX 6000 Pro Blackwell
1/1 one whole card per instance — dedicated
mum-1a Mumbai. One region, no asterisks.

The rate readout

Three cards. Three numbers.

Every type ships a whole, dedicated GPU with an EBS disk, metered hourly in Mumbai (mum-1a). The card is in the suffix: an nv1a.4xlarge ships an RTX 6000 Pro Blackwell.

GPU instance types and hourly rates
Instance GPU vCPU RAM VRAM Rate
nv2a.xlarge Nvidia RTX 4500 Pro Blackwell 4 16 GiB 32 GiB ₹44.554/hr
nv3a.2xlarge Nvidia RTX 5000 Pro Blackwell 8 32 GiB 48 GiB ₹63.849/hr
nv1a.4xlarge Nvidia RTX 6000 Pro Blackwell 16 64 GiB 96 GiB ₹126.784/hr

Network is the same flat card as everything else: egress ₹1/GiB, ingress free, public IPv4 ₹0.3/hr, IPv6 free. Full details on the GPU pricing page.

The honest part

GPU access is quota-gated.

You can't launch a GPU instance on a fresh account: GPU types require a quota increase first. Email support@excloud.dev and tell us what you're running — a human reads it and raises your quota. We'd rather say this plainly than have you find out at create time.

Request a GPU quota

One email, no form. Once granted, GPU instances behave like any other compute instance — created, stopped, and deleted from the same CLI and console.

Email support@excloud.dev

What you actually get

A whole card on an ordinary VM.

Dedicated, not sliced

Every GPU instance ships the entire card. Your VRAM is yours — 32, 48, or 96 GiB of it — with NVMe-backed storage underneath.

Just a compute instance

GPU types are regular compute instances: same images, same subnets, same instance lifecycle. If you can run a VM here, you can run a GPU.

Built for inference and pixels

Sized for LLMs, GPU inference, and professional visualization workloads — pick the card by how much model (or scene) you need resident in VRAM.

On-demand, metered hourly

No reservations, no commitments. Spin one up for an afternoon of fine-tuning, delete it, and pay for the hours it existed.

Want tokens instead of CUDA? We also host Qwen3.6-27B inference at ₹20/1M input and ₹60/1M output tokens — no GPU, no quota email.

The cards are in the racks.

Get your quota raised, run one exc compute create, and start pushing tensors from Mumbai.