Skip to main content
Sail exposes three inference API surfaces. All APIs accept the same models and completion windows.
APIEndpointMaturity
ResponsesPOST /v1/responsesStable
Chat CompletionsPOST /v1/chat/completionsAlpha
MessagesPOST /v1/messagesAlpha
BatchPOST /v1/batchesAlpha

Responses API

This is Sail’s primary API surface and has the broadest feature support.

Supported features

FeatureDetails
Core parametersmodel, input (string or message array), max_output_tokens, temperature, top_p, user
Structured outputstext.format with type: "text" or type: "json_schema"
Reasoningreasoning.effort (low / medium / high), reasoning.generate_summary (auto / concise / detailed)
Function toolstools with type: "function" — client-side function calling with name, description, parameters, strict
Custom toolstools with type: "custom"
Tool choicetool_choice: "none", "auto", "required", or a specific function/custom tool
Background modebackground: true returns 202 immediately; poll with GET /v1/responses/{id}

Not yet supported

FeatureNotes
Streamingstream: true is rejected. All responses are returned as a single JSON object.
Instructionsinstructions is not supported. Include system messages directly in input.
Conversation chainingprevious_response_id and conversation are not supported. Send the full input each request.
Prompt templatesThe prompt parameter is not supported.
Server-side toolsweb_search, file_search, code_interpreter, computer_use, mcp, image_generation, shell, apply_patch are not supported.
Multimodal inputImage, audio, and file input blocks are not supported. Text only.
IncludeAccepted for compatibility when it is an array of strings. Requests that include reasoning.encrypted_content, web_search_call.action.sources, code_interpreter_call.outputs, computer_call_output.output.image_url, file_search_call.results, message.input_image.image_url, or message.output_text.logprobs are rejected.
TruncationOnly "disabled" is accepted. Custom truncation strategies are not supported.
Parallel tool callsparallel_tool_calls is not supported.
json_object formattext.format.type: "json_object" is not supported. Use "json_schema" instead.
Delete / cancelDELETE /v1/responses/{id} and cancel endpoints are not implemented.

Chat Completions API

Alpha — Chat Completions support is still in active development. Behavior and supported fields may change without notice. We recommend the Responses API for production workloads.

Supported features

FeatureDetails
Core parametersmodel, messages, max_completion_tokens, temperature, top_p, user
Message rolessystem, user, assistant, tool, function (deprecated), developer
Structured outputsresponse_format with type: "text", "json_object", or "json_schema"
Reasoningreasoning_effort (low / medium / high)
Function toolstools with type: "function" — standard {type, function: {name, description, parameters, strict}} format
Custom toolstools with type: "custom"
Tool choicetool_choice: "none", "auto", "required", or a specific function/custom tool
Parallel tool callsparallel_tool_calls is passed through
Metadatametadata with string key-value pairs, including completion_window and completion_webhook

Not yet supported

FeatureNotes
Streamingstream: true is rejected.
Multiple choicesn must be 1.
Multimodal contentImage (image_url) and audio (input_audio) content parts are rejected. Text only.
Sampling controlsfrequency_penalty, presence_penalty, logit_bias, stop, seed, top_logprobs, logprobs are not supported.
Audio modalityaudio and modalities: ["audio"] are not supported.
Predicted outputprediction is not supported.
Web searchweb_search_options is not supported.
Service tierOnly "auto" is accepted.
CRUD endpointsGET, POST, DELETE on stored completions are not implemented.
Deprecated fieldsmax_tokens, functions, function_call are rejected. Use their modern replacements.

Response notes

  • Responses always contain exactly one choice (n=1).
  • finish_reason is either "stop" or "tool_calls". Other values like "length" and "content_filter" are not returned.
  • system_fingerprint and service_tier are not included in responses.
  • logprobs is always null.

Anthropic Messages API

Alpha — Messages API support is still in active development. Behavior and supported fields may change without notice. We recommend the Responses API for production workloads.

Supported features

FeatureDetails
Core parametersmodel, max_tokens, messages
Samplingtemperature (0–1), top_p (0–1)
Structured outputsoutput_config.format with type: "json_schema"
Metadatametadata with string key-value pairs, including completion_window and completion_webhook

Not yet supported

FeatureNotes
Streamingstream: true is rejected.
System promptThe system parameter is not supported.
Extended thinkingthinking is not supported.
Toolstools and tool_choice are not supported.
Stop sequencesstop_sequences is not supported.
Top-K samplingtop_k is not supported.
Multimodal contentImage, document, and tool result content blocks are rejected. Text only.
Service tierservice_tier is not supported.
Inference geoinference_geo is not supported.
Count tokensPOST /v1/messages/count_tokens is not implemented.
BatchesPOST /v1/messages/batches and related endpoints are not implemented.

Response notes

  • stop_reason is always "end_turn". Other values like "max_tokens", "tool_use", and "stop_sequence" are not returned.
  • Response content always contains a single text block. Thinking blocks and tool-use blocks are not returned.
  • Cache-related usage fields (cache_creation_input_tokens, cache_read_input_tokens) are not included.

Compatibility notes

  • Sail uses Authorization: Bearer <key> for authentication. The Anthropic x-api-key header is not supported. When using the Anthropic SDK, pass your key via auth_token instead of api_key:
from anthropic import Anthropic

client = Anthropic(
    auth_token="YOUR_SAIL_API_KEY",
    base_url="https://api.sailresearch.com",
)
  • The anthropic-version header is not required or checked.
  • Error responses use the OpenAI-style error envelope format.

Cross-API behavior

These behaviors apply to all three API surfaces:
  • Text only — no multimodal input or output is supported today.
  • No streaming — all responses are returned as a single JSON payload. Use background: true (Responses API) with polling or webhooks for long-running requests.
  • Completion windows — set metadata.completion_window to "asap", "15m", or "24h" to control scheduling. See Completion windows.
  • Webhooks — set metadata.completion_webhook to receive a POST when processing finishes. See Webhooks.
  • Responses always storedstore: false is not supported. All responses are persisted.