API support matrix - Sail Research

Sail provides inference endpoints compatible with the OpenAI Responses API, the OpenAI Chat Completions API, and the Anthropic Messages API. All three inference APIs accept the same models and completion windows. Additionally, Sail offers a Batch API for running large numbers of Responses API requests efficiently in a single asynchronous job.

API	Endpoint	Maturity
OpenAI Responses	`POST /v1/responses`	Stable
OpenAI Chat Completions	`POST /v1/chat/completions`	Stable
Anthropic Messages	`POST /v1/messages`	Beta
Batch	`POST /v1/batches`	Stable

Responses API

RecommendedOpenAI SDK compatibleAPI reference

Supported features

Feature	Details
Core parameters	`model`, `input` (string or message array), `max_output_tokens`, `temperature`, `top_p`, `user`, `prompt_cache_key`
Instructions	`instructions` is prepended to the input as a system message.
Structured outputs	`text.format` with `type: "text"` or `type: "json_schema"`
Reasoning	`reasoning.effort` (`none` / `minimal` / `low` / `medium` / `high` / `xhigh`), `reasoning.generate_summary` (`auto` / `concise` / `detailed`)
Function tools	`tools` with `type: "function"` — client-side function calling with `name`, `description`, `parameters`, `strict`
Custom tools	`tools` with `type: "custom"`
Tool choice	`tool_choice`: `"none"`, `"auto"`, `"required"`, or a specific function/custom tool
Background mode	`background: true` returns `202` immediately; poll with `GET /v1/responses/{id}`
Streaming	`stream: true` on foreground requests returns Server-Sent Events using OpenAI Responses event names.
Prompt cache routing	`prompt_cache_key` is an optional routing hint for requests that share a large prompt prefix
Image input	`input_image` content blocks on multimodal models. Non-multimodal models accept text only.
Output logprobs	`include: ["message.output_text.logprobs"]` returns one logprob per output token (best effort; omitted when unavailable).

Not yet supported

Feature	Notes
Conversation chaining	`previous_response_id` and `conversation` are not supported. Send the full input each request.
Prompt templates	The `prompt` parameter is not supported.
Server-side tools	`web_search` and `web_search_preview` tools are accepted for OpenAI-client compatibility and removed from the request; the model has no web-search tool to call. `file_search`, `code_interpreter`, `computer_use`, `mcp`, `image_generation`, `shell`, and `apply_patch` are not supported.
Multimodal input	Audio and file input blocks are not supported. Image input is supported on multimodal models (see above).
Include	Accepted for compatibility when it is an array of strings. `reasoning.encrypted_content` is accepted for OpenAI-client compatibility, but reasoning items are returned without encrypted content. Requests that include `web_search_call.action.sources`, `code_interpreter_call.outputs`, `computer_call_output.output.image_url`, or `file_search_call.results` are rejected.
Truncation	Only `"disabled"` is accepted. Custom truncation strategies are not supported.
Parallel tool calls	`parallel_tool_calls` is accepted for compatibility. Models decide their own tool-call cadence, so the field has no effect.
json_object format	`text.format.type: "json_object"` is not supported. Use `"json_schema"` instead.
Service tier	Only `"auto"` is accepted. Use `metadata.completion_window` to control response timing instead.
Delete / cancel	`DELETE /v1/responses/{id}` and cancel endpoints are not implemented.

Chat Completions API

OpenAI SDK compatibleAPI reference

Supported features

Feature	Details
Core parameters	`model`, `messages`, `max_completion_tokens`, `temperature`, `top_p`, `user`
Message roles	`system`, `user`, `assistant`, `tool`, `function` (deprecated), `developer`
Structured outputs	`response_format` with `type: "text"`, `"json_object"`, or `"json_schema"`
Reasoning	`reasoning_effort` (`none` / `minimal` / `low` / `medium` / `high` / `xhigh`)
Function tools	`tools` with `type: "function"` — standard `{type, function: {name, description, parameters, strict}}` format
Custom tools	`tools` with `type: "custom"`
Tool choice	`tool_choice`: `"none"`, `"auto"`, `"required"`, or a specific function/custom tool
Parallel tool calls	`parallel_tool_calls` is passed through
Metadata	`metadata` with string key-value pairs, including `completion_window` and `completion_webhook`
Streaming	`stream: true` returns Server-Sent Events (`chat.completion.chunk`); `stream_options.include_usage` adds a final usage chunk. Reasoning is streamed as `reasoning_content` deltas and tool calls are emitted atomically.
Image input	`image_url` content parts on multimodal models. Non-multimodal models accept text only.

Not yet supported

Feature	Notes
Multiple choices	`n` must be `1`.
Multimodal content	Audio (`input_audio`) content parts are not supported. Image (`image_url`) input is supported on multimodal models (see above).
Sampling controls	`frequency_penalty`, `presence_penalty`, `logit_bias`, `stop`, `seed`, `top_logprobs`, `logprobs`, `verbosity` are not supported.
Audio modality	`audio` and `modalities: ["audio"]` are not supported.
Predicted output	`prediction` is not supported.
Web search	`web_search_options` is not supported.
Service tier	Only `"auto"` is accepted.
CRUD endpoints	`GET`, `POST`, `DELETE` on stored completions are not implemented.
Deprecated fields	`max_tokens`, `functions`, `function_call` are rejected. Use their modern replacements.

Response notes

Responses always contain exactly one choice (n=1).
finish_reason is either "stop" or "tool_calls". Other values like "length" and "content_filter" are not returned.
system_fingerprint and service_tier are not included in responses.
logprobs is always null.

Messages API

Anthropic Messages formatAnthropic SDK compatibleAPI reference

The Messages API is Anthropic-compatible for agentic use: system prompts, tool calling, and streaming (SSE) are supported. Prompt caching (cache_control) is accepted but not yet applied.

Supported features

Feature	Details
Core parameters	`model`, `max_tokens`, `messages`
System prompt	The top-level `system` parameter (string or array of text blocks)
Sampling	`temperature` (0–1), `top_p` (0–1)
Tools	`tools` and `tool_choice` (`"auto"`, `"any"`, `"tool"`, `"none"`). The model calls tools; responses include `tool_use` blocks, and `tool_use`/`tool_result` content blocks round-trip.
Extended thinking	`thinking` is translated to the model’s reasoning
Streaming	`stream: true` returns Anthropic Server-Sent Events (`message_start`, `content_block_delta`, `message_stop`, …)
Structured outputs	`output_config.format` with `type: "json_schema"`
Metadata	`metadata` with string key-value pairs, including `completion_window` and `completion_webhook`
Image input	`image` content blocks on multimodal models. Non-multimodal models accept text only.
Token counting	`POST /v1/messages/count_tokens` returns the request’s input token count without running the model

Not yet supported

Feature	Notes
Prompt caching	`cache_control` on content blocks is accepted but ignored (no cache read/write).
Stop sequences	`stop_sequences` is not supported.
Top-K sampling	`top_k` is not supported.
Multimodal content	Document content blocks are not supported. Image input is supported on multimodal models (see above).
Service tier	`service_tier` is not supported.
Inference geo	`inference_geo` is not supported.
Batches	`POST /v1/messages/batches` and related endpoints are not implemented.

Response notes

stop_reason reflects the outcome — "end_turn" normally, "tool_use" when the model calls a tool.
Responses contain text content blocks, plus tool_use blocks when the model calls a tool.
Cache-related usage fields (cache_creation_input_tokens, cache_read_input_tokens) are not included (prompt caching isn’t applied yet).

Compatibility notes

Sail uses Authorization: Bearer <key> for authentication. The Anthropic x-api-key header is not supported. When using the Anthropic SDK, pass your key via auth_token instead of api_key:

from anthropic import Anthropic

client = Anthropic(
    auth_token="YOUR_SAIL_API_KEY",
    base_url="https://api.sailresearch.com",
)

The anthropic-version header is not required or checked.
Error responses use the OpenAI-style error envelope format.

Batch API

The Batch API runs large numbers of Responses API requests asynchronously. Every item targets /v1/responses — batching /v1/chat/completions or /v1/messages is not currently supported. You can submit up to 100,000 requests in a single POST /v1/batches call, then poll GET /v1/batches/{id} for status and fetch each result by custom_id. See Sending Requests at Scale for the end-to-end workflow and the Batch API reference for the request and response schemas.

Cross-API behavior

These behaviors apply across the inference API surfaces:

Streaming — the Chat Completions API supports stream: true, returning Server-Sent Events (chat.completion.chunk); set stream_options.include_usage for a final usage chunk. The Messages API supports stream: true, returning Anthropic SSE events. The Responses API supports foreground stream: true, returning OpenAI Responses SSE events; background: true requests return 202 immediately and cannot be streamed — use polling or webhooks for long-running background work.
Completion windows — set metadata.completion_window to "asap", "priority", "standard", or "flex" to control scheduling and pricing. See Completion Windows and Pricing.
Webhooks — set metadata.completion_webhook to receive a POST when processing finishes. See Webhooks.
Response storage — store: false is accepted for OpenAI compatibility, but does not change Sail’s normal temporary request/response storage for processing, retries, polling, and idempotency. Customer Data remains governed by Sail’s DPA retention and deletion terms.

​Responses API

​Supported features

​Not yet supported

​Chat Completions API

​Supported features

​Not yet supported

​Response notes

​Messages API

​Supported features

​Not yet supported

​Response notes

​Compatibility notes

​Batch API

​Cross-API behavior

Responses API

Supported features

Not yet supported

Chat Completions API

Supported features

Not yet supported

Response notes

Messages API

Supported features

Not yet supported

Response notes

Compatibility notes

Batch API

Cross-API behavior