Sail exposes three inference API surfaces. All APIs accept the same models and completion windows.
| API | Endpoint | Maturity |
|---|
| Responses | POST /v1/responses | Stable |
| Chat Completions | POST /v1/chat/completions | Alpha |
| Messages | POST /v1/messages | Alpha |
| Batch | POST /v1/batches | Alpha |
Responses API
This is Sail’s primary API surface and has the broadest feature support.
Supported features
| Feature | Details |
|---|
| Core parameters | model, input (string or message array), max_output_tokens, temperature, top_p, user |
| Structured outputs | text.format with type: "text" or type: "json_schema" |
| Reasoning | reasoning.effort (low / medium / high), reasoning.generate_summary (auto / concise / detailed) |
| Function tools | tools with type: "function" — client-side function calling with name, description, parameters, strict |
| Custom tools | tools with type: "custom" |
| Tool choice | tool_choice: "none", "auto", "required", or a specific function/custom tool |
| Background mode | background: true returns 202 immediately; poll with GET /v1/responses/{id} |
Not yet supported
| Feature | Notes |
|---|
| Streaming | stream: true is rejected. All responses are returned as a single JSON object. |
| Instructions | instructions is not supported. Include system messages directly in input. |
| Conversation chaining | previous_response_id and conversation are not supported. Send the full input each request. |
| Prompt templates | The prompt parameter is not supported. |
| Server-side tools | web_search, file_search, code_interpreter, computer_use, mcp, image_generation, shell, apply_patch are not supported. |
| Multimodal input | Image, audio, and file input blocks are not supported. Text only. |
| Include | Accepted for compatibility when it is an array of strings. Requests that include reasoning.encrypted_content, web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, or message.output_text.logprobs are rejected. |
| Truncation | Only "disabled" is accepted. Custom truncation strategies are not supported. |
| Parallel tool calls | parallel_tool_calls is not supported. |
| json_object format | text.format.type: "json_object" is not supported. Use "json_schema" instead. |
| Delete / cancel | DELETE /v1/responses/{id} and cancel endpoints are not implemented. |
Chat Completions API
Alpha — Chat Completions support is still in active development. Behavior
and supported fields may change without notice. We recommend the Responses API
for production workloads.
Supported features
| Feature | Details |
|---|
| Core parameters | model, messages, max_completion_tokens, temperature, top_p, user |
| Message roles | system, user, assistant, tool, function (deprecated), developer |
| Structured outputs | response_format with type: "text", "json_object", or "json_schema" |
| Reasoning | reasoning_effort (low / medium / high) |
| Function tools | tools with type: "function" — standard {type, function: {name, description, parameters, strict}} format |
| Custom tools | tools with type: "custom" |
| Tool choice | tool_choice: "none", "auto", "required", or a specific function/custom tool |
| Parallel tool calls | parallel_tool_calls is passed through |
| Metadata | metadata with string key-value pairs, including completion_window and completion_webhook |
Not yet supported
| Feature | Notes |
|---|
| Streaming | stream: true is rejected. |
| Multiple choices | n must be 1. |
| Multimodal content | Image (image_url) and audio (input_audio) content parts are rejected. Text only. |
| Sampling controls | frequency_penalty, presence_penalty, logit_bias, stop, seed, top_logprobs, logprobs are not supported. |
| Audio modality | audio and modalities: ["audio"] are not supported. |
| Predicted output | prediction is not supported. |
| Web search | web_search_options is not supported. |
| Service tier | Only "auto" is accepted. |
| CRUD endpoints | GET, POST, DELETE on stored completions are not implemented. |
| Deprecated fields | max_tokens, functions, function_call are rejected. Use their modern replacements. |
Response notes
- Responses always contain exactly one choice (
n=1).
finish_reason is either "stop" or "tool_calls". Other values like "length" and "content_filter" are not returned.
system_fingerprint and service_tier are not included in responses.
logprobs is always null.
Anthropic Messages API
Alpha — Messages API support is still in active development. Behavior and
supported fields may change without notice. We recommend the Responses API for
production workloads.
Supported features
| Feature | Details |
|---|
| Core parameters | model, max_tokens, messages |
| Sampling | temperature (0–1), top_p (0–1) |
| Structured outputs | output_config.format with type: "json_schema" |
| Metadata | metadata with string key-value pairs, including completion_window and completion_webhook |
Not yet supported
| Feature | Notes |
|---|
| Streaming | stream: true is rejected. |
| System prompt | The system parameter is not supported. |
| Extended thinking | thinking is not supported. |
| Tools | tools and tool_choice are not supported. |
| Stop sequences | stop_sequences is not supported. |
| Top-K sampling | top_k is not supported. |
| Multimodal content | Image, document, and tool result content blocks are rejected. Text only. |
| Service tier | service_tier is not supported. |
| Inference geo | inference_geo is not supported. |
| Count tokens | POST /v1/messages/count_tokens is not implemented. |
| Batches | POST /v1/messages/batches and related endpoints are not implemented. |
Response notes
stop_reason is always "end_turn". Other values like "max_tokens", "tool_use", and "stop_sequence" are not returned.
- Response content always contains a single
text block. Thinking blocks and tool-use blocks are not returned.
- Cache-related usage fields (
cache_creation_input_tokens, cache_read_input_tokens) are not included.
Compatibility notes
- Sail uses
Authorization: Bearer <key> for authentication. The Anthropic x-api-key header is not supported. When using the Anthropic SDK, pass your key via auth_token instead of api_key:
from anthropic import Anthropic
client = Anthropic(
auth_token="YOUR_SAIL_API_KEY",
base_url="https://api.sailresearch.com",
)
- The
anthropic-version header is not required or checked.
- Error responses use the OpenAI-style error envelope format.
Cross-API behavior
These behaviors apply to all three API surfaces:
- Text only — no multimodal input or output is supported today.
- No streaming — all responses are returned as a single JSON payload. Use
background: true (Responses API) with polling or webhooks for long-running requests.
- Completion windows — set
metadata.completion_window to "asap", "15m", or "24h" to control scheduling. See Completion windows.
- Webhooks — set
metadata.completion_webhook to receive a POST when processing finishes. See Webhooks.
- Responses always stored —
store: false is not supported. All responses are persisted.