Pass-through token rates
You pay what the provider charges, per million tokens — input and output — with no platform margin added on top.
- No per-call or per-seat fee
- Live per-model rates in the catalog
- Self-hosted models route at $0
The Platform is pure usage. You pay each provider's published token rate — with no Cudator markup — and routing, load-balancing, and failover come included. No seats, no monthly minimum.
Fund a wallet, generate a key, and start routing. Every prompt and response is metered at provider rates and drawn down from prepaid or invoiced credit.
You pay what the provider charges, per million tokens — input and output — with no platform margin added on top.
Policy routing, weighted load-balancing, automatic failover, and residency enforcement are part of the platform — not a line item.
Every provider's spend across every key, team, and country settles to one Cudator wallet — prepaid or invoiced, in the currency you choose.
Rates are pass-through provider prices per million tokens. See every model's live rate in the catalog →
Caps are enforced before a request ever leaves, so a runaway loop can't run up the bill. Set limits anywhere in the hierarchy.
Hard and soft spend limits per API key, workspace, and legal entity — with alerts before you hit them.
Every call logged with model, provider, region, and cost — drill down by key or roll up to one invoice.
Top up a wallet as you go, or settle monthly on invoice with terms — multi-currency either way.
Run AI across dozens of subsidiaries, currencies, and tax regimes. Cudator meters every entity locally, handles FX and tax, and rolls it into a single consolidated bill — settled in the currency your finance team chooses.
No. You pay each provider's published token rate. Routing, failover, and residency enforcement are included — there's no per-call or per-seat charge today.
Not for the Platform. It's usage-only for now — fund a wallet and pay per token. Seat and volume plans are on the roadmap. (Cudator Chat, the team product, is priced per seat — see seat pricing.)
Requests routed to your own vLLM or VPC-hosted models incur no token charge from Cudator — you're running the compute. They still appear in the usage ledger for a complete audit trail.
Yes. Set hard and soft caps per key, workspace, and entity. Caps are enforced before a request leaves, so usage can't run past your limit. Top up prepaid or settle on invoice.
Generate a key, fund a wallet, and point your SDK at the gateway. No card required to start.
Prepaid or invoiced · multi-currency · cancel anytime