metadata.completion_window:
"asap""15m"(default)"24h"
Window options
asap
Runs immediately on the fastest hardware we have, in a latency-optimized config. Sail is not particularly optimized for this tier, it is provided as a convenience feature and priced similarly to other model API providers.
15m
Uses Sail’s max-efficiency serving stack, targeting completion within 15-minutes of the request being sent.
This window covers our worst case SLA, when our GPU fleet needs to expand to cover a big surge of traffic. Typical response times, especially continuing a conversation, are under 5min.
Most of Sail’s prices are quoted at this completion window.
24h
Waits for compute to be extra-cheap, e.g. overnight. Best for batch, non-agentic workloads. 50% discount over 15m completion window.
How to set completion windows
Setmetadata.completion_window on supported requests.
"asap", "15m", and "24h".