Sail optimizes for throughput and cost, not single-turn latency. For agentic workloads, what matters is how long the full trajectory takes from start to finish, not how fast any one request returns. Select a completion window based on the trajectory wall-clock time your workload can tolerate.Documentation Index
Fetch the complete documentation index at: https://docs.sailresearch.com/llms.txt
Use this file to discover all available pages before exploring further.
Completion windows at a glance
| Window | Avg. turn time | Typical use case | Price vs asap |
|---|---|---|---|
asap | Immediate | Interactive UIs, human-in-the-loop | — |
priority | ~1 min | Latency-sensitive agent loops | ~50% lower |
standard | ~5 min | Cost-optimized agents | ~65% lower |
flex | best-effort | Batch processing, evals, offline | ~80% lower |
Actual turn times will vary based on your specific workload and chosen model.
Completion window details
standard
The default completion window for models that support it. Uses Sail’s
max-efficiency serving stack and targets an average turn time of roughly five
minutes on a balanced workload. Most of Sail’s prices are quoted at this
completion window.
priority
Sail’s tightest scheduled tier. Targets a shorter average turn time than standard (~1 min vs ~5 min on a balanced workload) in exchange for higher per-token pricing. Use priority for latency-sensitive agent loops where each turn directly feeds the next and several extra minutes per turn would meaningfully drag out the trajectory.
flex
Schedules work when compute is cheapest, e.g. overnight or off-peak, and does not target a specific response time. Best for batch, non-agentic workloads. Requires background=True.
asap
Runs immediately on the fastest hardware we have, in a latency-optimized config. Sail is not particularly optimized for this tier — it is provided as a convenience feature and priced similarly to other model API providers.
How to set completion windows
Setmetadata.completion_window on your request:
"asap", "priority", "standard", and "flex".
Default behavior
Requests that omitcompletion_window default to standard. If the standard completion window is not available
for the model, the request will use the flex completion window if , or the asap completion window if .
If you set completion_window explicitly to a window that is not supported for the model, the request is rejected with 400 invalid_request_error. The error
message enumerates the model’s actual supported completion windows. You can also check the
Pricing page for the up-to-date support matrix.