One API · every model · your infrastructure

Route every model.
Keep every byte.

The Cudator Platform is the routing layer for everything you build: one OpenAI-compatible endpoint that picks the best model per request, keeps sovereign workloads on ground you control, and meters spend to a single wallet.

OpenAI-compatible Zero data retention One wallet, any currency
POST  /v1/chat/completions
app OpenAI Claude Gemini vLLM routed in 38ms

Powering AI at teams that can't afford to get infrastructure wrong

What the Platform does

One gateway between your app and every model you'll ever use.

Stop wiring keys, quotas, and failover logic per vendor. Point your SDK at Cudator once and let the platform handle routing, residency, and money.

Model routing

Send one request; Cudator picks the right model on price, latency, and quality — then load-balances and fails over across a pool of credentials.

  • Policy-based routing by cost, latency or task
  • Weighted load-balancing & automatic failover
  • One OpenAI-compatible endpoint for all vendors

Sovereign private routing

Pin sensitive traffic to providers and regions you approve — or to your own self-hosted models. Data never leaves the ground you control.

  • Route to on-prem & VPC-hosted models (vLLM)
  • Region & residency pinning, per workspace
  • Zero retention, full request-level audit trail

Wallet & payments

Every provider's spend, in every country, rolls into one wallet. Set caps per key, team, and legal entity — then settle in the currency you choose.

  • One wallet across subsidiaries & currencies
  • Spend limits per key, user & workspace
  • Itemised usage settlement & export
How it works

From request to settled invoice, in one hop.

01

Point your SDK

Swap your base URL and key. Cudator speaks the OpenAI API, so existing code keeps working unchanged.

02

Apply a routing policy

Choose the cheapest, fastest, or highest-quality path — and pin sensitive workloads to approved regions or self-hosted models.

03

Cudator routes & fails over

Requests load-balance across a credential pool. If a provider errors or rate-limits, traffic shifts automatically.

04

Spend settles to the wallet

Every token is metered, attributed, and deducted from one wallet — with caps enforced before the call ever leaves.

Drop-in API

Already built? Change two lines.

Cudator speaks the OpenAI API. Point the base URL at the gateway, use a Cudator key, and routing, residency, and global billing come along for free.

  • Drop-in OpenAI SDK compatibility — streaming, tools & embeddings
  • One policy header to pin region or self-hosted models
  • See the full reference in the Platform docs
route.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cudator.ai/v1",
    api_key="cud_live_••••••••",
)

# route by policy — Cudator picks the model
resp = client.chat.completions.create(
    model="auto",
    extra_headers={"X-Cudator-Policy": "sovereign-eu"},
    messages=[{"role": "user",
               "content": "Summarise this record."}],
)

# → served by self-hosted vLLM · eu-west-1
# → $0.0004 metered to wallet · 38ms
Sovereign by design

Pick the provider, region, and terms — and prove it on every call.

For teams in finance, health, and the public sector, "where does this data go — and who can compel it?" isn't a footnote. Cudator turns provider choice, region, and data terms into a routing rule: enforced before the request leaves, proven after it lands.

  • Approved-pool routing

    Route only to the providers, regions, and data-processing terms you've cleared. Out-of-policy credentials aren't in the pool, so sensitive traffic can't slip out the wrong door by accident.

  • Region & jurisdiction aware

    Pin each workspace to approved regions, and watch for silent out-of-region fallback. Cudator surfaces the legal jurisdiction behind every provider — because EU-region hosting isn't the same as EU jurisdiction.

  • Zero retention, request-level proof

    Payloads are never stored, and routing respects each provider's no-retention terms. Every call is logged with model, provider, region, and cost — export the trail for your auditors.

workspace · meridian-health
residency policy · active
Allowed providersAzure OpenAI (EU) · Mistral
Allowed regioneu-west-1
Out-of-region fallbackblocked
Unapproved providersblocked
Payload retentionnone
HIPAA GDPR ISO 27001 Data residency
Models

Bring the frontier, the open, and your own.

Hosted frontier labs, embedding specialists, and OpenAI-compatible self-hosted clusters — managed as one routable pool.

OOpenAIchat · embeddings
AAnthropicchat
GGooglechat
MMistralchat
CCohereembeddings
VVoyage AIembeddings
LLlama · vLLMself-hosted
DDeepSeekself-hosted
Compare every model — context, price, and sovereignty — in the full model catalog. Browse the catalog
"We moved seven AI products onto Cudator in a quarter. Our auditors got a request-level trail, our finance team got one invoice, and engineering stopped babysitting provider keys."
MK
Mara Köhler
VP Platform Engineering · EclipseDC
8+
providers, one pool
99.98%
routing uptime
<40ms
routing overhead
0
payloads retained
Start building

Ship AI on ground you control.

Create a workspace, drop in your base URL, and route your first request in minutes. No card required to start.

Prepaid or invoiced · multi-currency · cancel anytime