GPU compute · mum-1a

VRAM by the hour, not by the contract.

Name: Excloud GPU instances
Brand: Excloud

Dedicated Nvidia RTX Pro Blackwell cards — 32, 48, or 96 GiB of VRAM — on instances backed with NVMe storage. Built for LLMs, GPU inference, and professional visualization. Billed hourly, on-demand: you're renting silicon, not signing a multi-year commit.

Open the console GPU pricing

mum-1a · zsh nv2a.xlarge

$ exc compute create \
    --name infer-1 \
    --image_id 1 \
    --instance_type nv2a.xlarge \
    --subnet_id 1 \
    --ssh_pubkey my-key \
    --wait
✓ instance infer-1 running
$ █

₹44.554/hr smallest card, nv2a.xlarge

96 GiB max VRAM, RTX 6000 Pro Blackwell

1/1 one whole card per instance — dedicated

mum-1a Mumbai. One region, no asterisks.

The rate readout

Three cards. Three numbers.

Every type ships a whole, dedicated GPU with an EBS disk, metered hourly in Mumbai (mum-1a). The card is in the suffix: an nv1a.4xlarge ships an RTX 6000 Pro Blackwell.

GPU instance types and hourly rates
Instance	GPU	vCPU	RAM	VRAM	Rate
nv2a.xlarge	Nvidia RTX 4500 Pro Blackwell	4	16 GiB	32 GiB	₹44.554/hr
nv3a.2xlarge	Nvidia RTX 5000 Pro Blackwell	8	32 GiB	48 GiB	₹63.849/hr
nv1a.4xlarge	Nvidia RTX 6000 Pro Blackwell	16	64 GiB	96 GiB	₹126.784/hr

Network is the same flat card as everything else: egress ₹1/GiB, ingress free, public IPv4 ₹0.3/hr, IPv6 free. Full details on the GPU pricing page.

The honest part

GPU access is quota-gated.

You can't launch a GPU instance on a fresh account: GPU types require a quota increase first. Email support@excloud.dev and tell us what you're running — a human reads it and raises your quota. We'd rather say this plainly than have you find out at create time.

Request a GPU quota

One email, no form. Once granted, GPU instances behave like any other compute instance — created, stopped, and deleted from the same CLI and console.

Email support@excloud.dev

What you actually get

A whole card on an ordinary VM.

Dedicated, not sliced

Every GPU instance ships the entire card. Your VRAM is yours — 32, 48, or 96 GiB of it — with NVMe-backed storage underneath.

Just a compute instance

GPU types are regular compute instances: same images, same subnets, same instance lifecycle. If you can run a VM here, you can run a GPU.

Built for inference and pixels

Sized for LLMs, GPU inference, and professional visualization workloads — pick the card by how much model (or scene) you need resident in VRAM.

On-demand, metered hourly

No reservations, no commitments. Spin one up for an afternoon of fine-tuning, delete it, and pay for the hours it existed.

Want tokens instead of CUDA? We also host Qwen3.6-27B inference at ₹20/1M input and ₹60/1M output tokens — no GPU, no quota email.

The cards are in the racks.

Get your quota raised, run one exc compute create, and start pushing tensors from Mumbai.

Open the console Request GPU quota