Who should use Sail?
If:- You’re building an agent that largely acts on its own
- You’re already running agents but face painful token costs, rate limits, or poor reliability
- You’re looking for the most cost-efficient access to AI models
- You have a token intensive, latency insensitive AI workload
Quickstart
Get an API key and make your first requests for free in minutes
The Sail difference
Our job is to dramatically increase intelligence per dollar, and make sure no compute goes to waste. We optimize the whole stack for throughput rather than single-turn latency. This tradeoff fits agentic work, delivering tokens reliably at a price that doesn’t limit the size of the tasks your agents can tackle. By leveraging Sail’s completion window options, you can get 2–5x more tokens per dollar compared to other open model APIs, and up to 10x more tokens per dollar compared to frontier model APIs.API compatibility
Sail is OpenAI and Anthropic-compatible, with support for the OpenAI Chat Completions API, Open AI Responses API, and Anthropic Messages API. Migrating from a different provider is often as simple as specifying the Sail base URL, an API key, and a model name:Maximize cost efficiency with Sail’s completion windows
Every request runs under a completion window, which specifies how long the workload can wait for a turn to finish. A longer window means a lower price for the same model and the same output. This gives you the flexibility to maximize cost efficiency in a way that’s appropriate for your workload.| Window | Avg. turn time | Price vs asap |
|---|---|---|
asap | Immediate | baseline |
priority | ~1 min | ~30-50% lower |
standard | ~5 min | ~45-65% lower |
flex | best-effort | ~60-80% lower |
Window availability varies by model. The Pricing page has the full
support matrix.
- Compared to other open-model providers:
asapmatches typical list prices, and the windows below it typically cost 2 to 5x less on supported models, with turnaround in minutes rather than a batch API’s 24-hour window. - Compared to a frontier API (GPT or Claude): a frontier-class open model at
standardorflextypically costs 5 to 10x less for agentic workloads.
Rate limiting
Sail does not publish rate-limit tiers, and there is no limit-increase process to go through before scaling up. The service is designed to absorb large bursts of background traffic. In particular, choosing theflex completion window gives you access to the most relaxed rate limits available.
Next steps
Quickstart
Make your first API request!
Migrate to Sail
The three-line diff, or a prompt for your coding agent.
Models
Browse the catalog and pick your model.
Pricing
Per-token rates across every window.