Models
Call any model directly or let Sansa auto-route for you
Sansa supports two ways to pick a model on /v1/chat/completions: auto-routing with sansa-auto (Sansa picks the best model for each request) and direct model calls (you choose). Both go through the same endpoint and return the same response shape, plus a top-level sansa metadata object.
The model field
The model field controls how the request is dispatched.
| Value | Behavior |
|---|---|
"sansa-auto", null, or omitted | Auto-route. Sansa's router picks the best underlying model for this request. |
Any model ID from the catalog (e.g. "openai/gpt-5.4", "anthropic/claude-sonnet-4.6") | Direct gateway call. Sansa proxies the request to the specified model with no routing. |
| Unknown model ID | Returns 400 with code invalid_model. |
Pass the model ID exactly as listed on the Models page in your dashboard. IDs follow the provider/model convention (e.g. openai/gpt-5.4, google/gemini-3.1-pro-preview, anthropic/claude-sonnet-4.6).
Auto-routing with sansa-auto
Use sansa-auto when you want Sansa to choose the best model for each request. The router looks at the conversation content, tools, reasoning configuration, and input modalities, then picks a model that balances quality and cost.
- The response's
modelfield returns the actual model that was used (e.g."openai/gpt-5.4"). - The top-level
sansaobject reports routing metadata (see below). - You are charged at the per-token rate of the selected model.
See the code panel for a complete example.
When to use auto-routing
- You want price and quality handled for you across varied workloads.
- Some requests need deep reasoning, others are simple lookups, and you don't want to build that switch yourself.
- You're migrating an app that currently hard-codes a model — swap in
"sansa-auto"and let Sansa take over without touching the rest of your code.
Calling a specific model (gateway)
When you pass a concrete model ID, Sansa proxies the request directly to that model. No router runs, and sansa.routed is false in the response.
Use direct model calls when:
- You need a specific model's capabilities or behavior.
- You want deterministic routing (the same model on every request).
- You're A/B testing or benchmarking a specific model against
sansa-auto.
Provider failover
Direct gateway calls get automatic provider failover. If the primary provider for the selected model is rate-limited, down, or returns a transient error, Sansa retries the same model on another available provider. The request is only considered failed after every configured provider has been tried.
Failover is transparent to your client — you see a single successful response, and usage reflects the tokens reported by whichever provider actually answered.
Unknown model
If the model ID doesn't match any entry in the catalog, the request fails immediately with 400 invalid_model:
{
"error": {
"code": "invalid_model",
"message": "Model 'not-a-real-model' is not a valid model."
}
}The canonical list of supported models and their token prices lives on the Models page in your dashboard.
The sansa response object
Every completion response includes a top-level sansa object alongside the standard model, choices, and usage fields.
interface SansaCompletionExtension {
// True for sansa-auto / null model requests.
// False for direct model requests.
routed: boolean;
// The model the router selected. Only set when routed is true.
routed_model: string | null;
// Encoder routing latency in milliseconds. Only set when routed is true.
routing_latency_ms: number | null;
}Auto-routed response
{
"sansa": {
"routed": true,
"routed_model": "anthropic/claude-sonnet-4.6",
"routing_latency_ms": 312
}
}Direct model response
{
"sansa": {
"routed": false
}
}In streaming responses
When streaming, the sansa object appears on the first chunk only (the chunk where choices[0].delta.role is set, before content starts). Subsequent chunks omit it. The final chunk still carries usage as usual.
Backward compatibility
Existing v1 integrations that send "sansa-auto" (or no model field) and ignore the response's model value keep working unchanged. Two subtle differences from v1:
- The response's
modelfield now returns the actual routed model ID (e.g."openai/gpt-5.4") instead of the literal string"sansa-auto". If your code assertsresponse.model === "sansa-auto", update it. - The
sansaobject is new. OpenAI SDKs surface it as an extra field; most clients ignore unknown keys automatically.
Pricing
Pricing is per-model, per-million tokens, billed separately for input and output.
- Auto-routed requests are billed at the rate of the model the router selected.
- Direct model requests are billed at the rate of the model you specified.
- Reasoning tokens are included in
completion_tokens(per the OpenAI/OpenRouter spec) and billed at the output rate.
Token rates are listed on the Models page. See Reasoning for how reasoning tokens are counted and Completions for the full billing flow.
Errors
| HTTP Status | Code | When |
|---|---|---|
400 | invalid_model | The model value isn't "sansa-auto", null, or a known model ID |
400 | unsupported_parameter | See Completions for the full list |
402 | insufficient_credits | Account balance can't cover the reserved cost |
500 | provider_unavailable | All configured providers for the selected model failed after failover |
See the Errors docs for full error handling guidance.