Skip to main content
  • Sail is most optimized for the 15m completion window today. More windows like 3m and 1h will be added soon.
  • You must set background=True for all 15m and 24h requests. Requests with background=False, or a missing background field, will be treated as asap requests.
  • The asap window is provided mostly for convenience. Sail is at parity with other providers, but doesn’t specialize in this tier.
  • Prompt caching is implicit, based on prefix matching.
  • Each model row links the exact Hugging Face checkpoint Sail currently serves. If we offer multiple quantizations, we list them as separate model IDs.
All prices are listed in USD per 1M tokens (MTok).
Model ID15 min24 hourASAP
InCachedOutInCachedOutInCachedOut
moonshotai/Kimi-K2.5$0.10$0.03$0.80$0.05$0.015$0.40$0.60$0.10$3.00
zai-org/GLM-4.7$0.12$0.05$0.60$0.06$0.025$0.30$0.60$0.30$2.20
zai-org/GLM-5$0.16$0.06$0.85$0.08$0.03$0.40$1.00$0.20$3.20
zai-org/GLM-5.1-FP8$0.25$0.12$2.00$0.125$0.06$1.00$1.40$0.26$4.40
deepseek-ai/DeepSeek-V3.2$0.08$0.02$0.50$0.04$0.01$0.25$0.56$0.28$1.68
openai/gpt-oss-20b$0.01$0.005$0.05$0.005$0.001$0.02$0.06$0.03$0.30
openai/gpt-oss-120b$0.015$0.006$0.07$0.007$0.001$0.025$0.08$0.04$0.40
New self-serve accounts without purchased credits are limited to 10 requests per minute. Self-serve accounts with low remaining credit balance may experience temporary rate limiting, which can be avoided by enabling auto-recharge with a $5 threshold.
Use GET /v1/models in your target environment to confirm runtime availability for your API key.