Sail supports tool calling through the Responses API, letting you build agents that call external tools and reason over the results across multiple turns.Documentation Index
Fetch the complete documentation index at: https://docs.sailresearch.com/llms.txt
Use this file to discover all available pages before exploring further.
How it works
- Send a user message along with tool definitions to
/v1/responses. - The model may return one or more
function_callitems instead of (or alongside) text. - Execute the tools locally, then send the results back as
function_call_outputitems in a new request — together with the full conversation history. - Repeat until the model responds with text only.
response.output items directly to your conversation list — they are valid input items with no conversion needed — then append the function_call_output results.
Full example: multi-turn weather agent
This example usesmoonshotai/Kimi-K2.5 to build a two-turn conversation where the model calls a weather tool and then answers a follow-up question using context from the first turn.
What happens under the hood
-
Turn 1 — the model receives the user question plus the tool definition. It calls
get_weatherfor San Francisco. After we send the tool result back, a second request is made and the model produces a text summary. -
Turn 2 — the full conversation (including Turn 1’s tool call and result) is sent again. The model calls
get_weatherfor New York, gets the result, and compares it with the San Francisco data it already has in context.
Tips
- Choose a completion window for your agent loop. The default
standardwindow gives a good balance of cost and trajectory time for most agent workloads; reach forprioritywhen individual turns are latency-sensitive, orflexfor background batches where hours-scale queueing is fine. See Completion windows for response time distributions and pricing. background=Trueis recommended for long-running agents. Sail is throughput-optimized, so requests may take longer than a typical low-latency API. Background mode avoids HTTP timeouts and lets you poll for completion.- Send the full conversation in each request. Include all prior messages,
response.outputitems, and tool results. Output items from previous responses can be appended directly — no serialization or conversion is needed. strict: trueon tool parameters enables structured output guarantees — the model’sargumentsJSON will always conform to your schema.- Parallel tool calls are supported by default. The model may return multiple
function_callitems in a single response. - Add a per-request
Idempotency-Keyheader so retries will use the stored response instead of re-running inference and double charging. See Idempotent Requests.