Streaming
Server-sent events streaming for chat completions
Overview
- Set
stream: truein the request - Response is Server-Sent Events (SSE)
- Compatible with OpenAI SDK streaming, Vercel AI SDK, etc.
- Usage data included in the final chunk automatically
- For
sansa-autorequests, thesansarouting metadata appears on the first chunk only (see Models)
Quick Example
See the code panel for streaming examples in Python, TypeScript, curl, and raw fetch with SSE parsing.
SSE Format
Each event is a JSON object followed by a blank line.
Note: Line breaks added for readability. Actual events are single-line.
data: {
"id": "...",
"object": "chat.completion.chunk",
"created": 1700000000,
"model": "openai/gpt-5.4-mini",
"choices": [{
"index": 0,
"delta": { "content": "Hello" },
"finish_reason": null
}]
}Stream ends with:
data: [DONE]The first chunk of a sansa-auto stream additionally carries the sansa routing object:
{
"id": "...",
"object": "chat.completion.chunk",
"created": 1700000000,
"model": "openai/gpt-5.4-mini",
"choices": [{
"index": 0,
"delta": { "role": "assistant" },
"finish_reason": null
}],
"sansa": {
"routed": true,
"routed_model": "openai/gpt-5.4-mini",
"routing_latency_ms": 287
}
}The final chunk before [DONE] includes usage:
{
"id": "...",
"model": "openai/gpt-5.4-mini",
"choices": [{
"index": 0,
"delta": {},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 84,
"total_tokens": 96
}
}Streaming Chunk Shape
interface ChatCompletionChunk {
id: string;
object: "chat.completion.chunk";
created: number;
// The underlying model that served the request.
model: string;
choices: {
index: number;
delta: {
// Present in first chunk only.
role?: "assistant";
// Incremental text content.
content?: string;
// Incremental tool call data.
tool_calls?: ToolCallChunk[];
// Reasoning blocks (when enabled).
reasoning_details?: ReasoningDetail[];
};
// Reason for stopping: "stop", "length", "tool_calls", or "error".
finish_reason: string | null;
}[];
// Present in final chunk only.
usage?: {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
};
// Present on the first chunk of a sansa-auto stream.
// Omitted on direct model requests and on subsequent chunks.
sansa?: {
routed: boolean;
routed_model: string | null;
routing_latency_ms: number | null;
};
}Sansa always includes usage in the final streaming chunk. The stream_options parameter is accepted for compatibility but ignored. See Models for the full shape of the sansa object.
Error Handling
Pre-Stream Errors
If validation fails before streaming starts (bad API key, invalid params, insufficient credits), a standard JSON error response is returned:
{
"error": {
"code": "insufficient_credits",
"message": "Insufficient credits. Please add credits to continue."
}
}Mid-Stream Errors
If an error occurs after streaming has begun (HTTP 200 already sent), the error is sent as an SSE event:
data: {
"error": {
"code": "provider_error",
"message": "Provider disconnected"
},
"choices": [{
"finish_reason": "error"
}]
}
data: [DONE]Mid-stream errors always have finish_reason: "error" and are followed by [DONE].
Streaming with Tool Calls
When the model makes a tool call during streaming:
delta.tool_callsappears incrementallyidandfunction.namearrive in the first tool call chunkfunction.argumentsstreams as partial JSON fragments (must be accumulated)finish_reasonis"tool_calls"in the final choice chunk
Streaming with Reasoning
When reasoning tokens are enabled:
delta.reasoning_detailsappears beforedelta.content- Reasoning blocks stream incrementally
- Content starts streaming after reasoning completes
Note: The OpenAI SDK does not currently auto-accumulate reasoning_details from streaming deltas, so reasoning continuity across turns is not available when streaming. If reasoning continuity matters (e.g., multi-turn tool calling), use non-streaming requests. See the Reasoning docs for details.
Streaming with Structured Outputs
Structured outputs (response_format: { type: "json_schema" }) work with streaming. The model streams valid partial JSON that forms a complete valid response when done.