Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sailresearch.com/llms.txt

Use this file to discover all available pages before exploring further.

Sail optimizes for throughput and cost, not single-turn latency. For agentic workloads, what matters is how long the full trajectory takes from start to finish, not how fast any one request returns. Select a completion window based on the trajectory wall-clock time your workload can tolerate.

Completion windows at a glance

WindowAvg. turn timeTypical use casePrice vs asap
asapImmediateInteractive UIs, human-in-the-loop
priority~1 minLatency-sensitive agent loops~50% lower
standard~5 minCost-optimized agents~65% lower
flexbest-effortBatch processing, evals, offline~80% lower
Actual turn times will vary based on your specific workload and chosen model.
Prices for all models and windows are listed on the Pricing page.

Completion window details

standard

The default completion window for models that support it. Uses Sail’s max-efficiency serving stack and targets an average turn time of roughly five minutes on a balanced workload. Most of Sail’s prices are quoted at this completion window.

priority

Sail’s tightest scheduled tier. Targets a shorter average turn time than standard (~1 min vs ~5 min on a balanced workload) in exchange for higher per-token pricing. Use priority for latency-sensitive agent loops where each turn directly feeds the next and several extra minutes per turn would meaningfully drag out the trajectory.

flex

Schedules work when compute is cheapest, e.g. overnight or off-peak, and does not target a specific response time. Best for batch, non-agentic workloads. Requires background=True.

asap

Runs immediately on the fastest hardware we have, in a latency-optimized config. Sail is not particularly optimized for this tier — it is provided as a convenience feature and priced similarly to other model API providers.

How to set completion windows

Set metadata.completion_window on your request:
response = client.responses.create(
    model="moonshotai/Kimi-K2.5",
    input="Explain the key ideas behind transformers.",
    background=True,
    metadata={
        "completion_window": "priority"
    }
)
Supported values: "asap", "priority", "standard", and "flex".

Default behavior

Requests that omit completion_window default to standard. If the standard completion window is not available for the model, the request will use the flex completion window if , or the asap completion window if . If you set completion_window explicitly to a window that is not supported for the model, the request is rejected with 400 invalid_request_error. The error message enumerates the model’s actual supported completion windows. You can also check the Pricing page for the up-to-date support matrix.