Models

Call any model directly or let Sansa auto-route for you

Sansa supports two ways to pick a model on /v1/chat/completions: auto-routing with sansa-auto (Sansa picks the best model for each request) and direct model calls (you choose). Both go through the same endpoint and return the same response shape, plus a top-level sansa metadata object.

The model field

The model field controls how the request is dispatched.

Value	Behavior
`"sansa-auto"`, `null`, or omitted	Auto-route. Sansa's router picks the best underlying model for this request.
Any model ID from the catalog (e.g. `"openai/gpt-5.4"`, `"anthropic/claude-sonnet-4.6"`)	Direct gateway call. Sansa proxies the request to the specified model with no routing.
Unknown model ID	Returns `400` with code `invalid_model`.

Pass the model ID exactly as listed on the Models page in your dashboard. IDs follow the provider/model convention (e.g. openai/gpt-5.4, google/gemini-3.1-pro-preview, anthropic/claude-sonnet-4.6).

Auto-routing with sansa-auto

Use sansa-auto when you want Sansa to choose the best model for each request. The router looks at the conversation content, tools, reasoning configuration, and input modalities, then picks a model that balances quality and cost.

The response's model field returns the actual model that was used (e.g. "openai/gpt-5.4").
The top-level sansa object reports routing metadata (see below).
You are charged at the per-token rate of the selected model.

See the code panel for a complete example.

When to use auto-routing

You want price and quality handled for you across varied workloads.
Some requests need deep reasoning, others are simple lookups, and you don't want to build that switch yourself.
You're migrating an app that currently hard-codes a model — swap in "sansa-auto" and let Sansa take over without touching the rest of your code.

Calling a specific model (gateway)

When you pass a concrete model ID, Sansa proxies the request directly to that model. No router runs, and sansa.routed is false in the response.

Use direct model calls when:

You need a specific model's capabilities or behavior.
You want deterministic routing (the same model on every request).
You're A/B testing or benchmarking a specific model against sansa-auto.

Provider failover

Direct gateway calls get automatic provider failover. If the primary provider for the selected model is rate-limited, down, or returns a transient error, Sansa retries the same model on another available provider. The request is only considered failed after every configured provider has been tried.

Failover is transparent to your client — you see a single successful response, and usage reflects the tokens reported by whichever provider actually answered.

Unknown model

If the model ID doesn't match any entry in the catalog, the request fails immediately with 400 invalid_model:

{
  "error": {
    "code": "invalid_model",
    "message": "Model 'not-a-real-model' is not a valid model."
  }
}

The canonical list of supported models and their token prices lives on the Models page in your dashboard.

The `sansa` response object

Every completion response includes a top-level sansa object alongside the standard model, choices, and usage fields.

interface SansaCompletionExtension {
  // True for sansa-auto / null model requests.
  // False for direct model requests.
  routed: boolean;

  // The model the router selected. Only set when routed is true.
  routed_model: string | null;

  // Encoder routing latency in milliseconds. Only set when routed is true.
  routing_latency_ms: number | null;

  // Total cost for this completion in USD, billed at the served model's
  // per-token rates. Present on non-streaming responses and on the
  // streaming chunk that carries `usage`.
  cost: number | null;
}

Auto-routed response

{
  "sansa": {
    "routed": true,
    "routed_model": "anthropic/claude-sonnet-4.6",
    "routing_latency_ms": 312,
    "cost": 0.00214
  }
}

Direct model response

{
  "sansa": {
    "routed": false,
    "cost": 0.00081
  }
}

In streaming responses

When streaming, the sansa object appears twice: on the first chunk (carrying routing metadata) and on the final chunk that carries usage (carrying cost alongside the same routing fields). Chunks in between omit it.

Pricing

Pricing is per-model, per-million tokens, billed separately for input and output.

Auto-routed requests are billed at the rate of the model the router selected.
Direct model requests are billed at the rate of the model you specified.
Reasoning tokens are included in completion_tokens (per the OpenAI/OpenRouter spec) and billed at the output rate.

Token rates are listed on the Models page. See Reasoning for how reasoning tokens are counted and Completions for the full billing flow.

Errors

HTTP Status	Code	When
`400`	`invalid_model`	The `model` value isn't `"sansa-auto"`, `null`, or a known model ID
`400`	`unsupported_parameter`	See Completions for the full list
`402`	`insufficient_credits`	Account balance can't cover the reserved cost
`500`	`provider_unavailable`	All configured providers for the selected model failed after failover

See the Errors docs for full error handling guidance.