Skip to main content
Sail supports three completion windows you can set in metadata.completion_window:
  • "asap"
  • "15m" (default)
  • "24h"
Use completion windows to trade off latency vs. scheduling flexibility.

Window options

asap

Runs immediately on the fastest hardware we have, in a latency-optimized config. Sail is not particularly optimized for this tier, it is provided as a convenience feature and priced similarly to other model API providers.

15m

Uses Sail’s max-efficiency serving stack, targeting completion within 15-minutes of the request being sent. This window covers our worst case SLA, when our GPU fleet needs to expand to cover a big surge of traffic. Typical response times, especially continuing a conversation, are under 5min. Most of Sail’s prices are quoted at this completion window.

24h

Waits for compute to be extra-cheap, e.g. overnight. Best for batch, non-agentic workloads. 50% discount over 15m completion window.

How to set completion windows

Set metadata.completion_window on supported requests.
{
  "metadata": {
    "completion_window": "15m"
  }
}
Supported values are exactly "asap", "15m", and "24h".