Skip to main content
Sail is a drop-in replacement for OpenAI-compatible inference providers. It serves the OpenAI Responses (/v1/responses) and Chat Completions (/v1/chat/completions) APIs, so switching from OpenAI — or any OpenAI-compatible provider — is just a base URL, API key, and model name change.

1. Get your API key

Sign up at the Sail dashboard and create an API key. Export it where your agent and app can read it:
export SAIL_API_KEY="YOUR_SAIL_API_KEY"

2. See what changes

Already calling the OpenAI Responses API? Then it’s three lines — base URL, key, and a model from the catalog. The request and response are identical:
Sail Responses API
import os

from openai import OpenAI

client = OpenAI(
    base_url="https://api.your-provider.com/v1",  
    base_url="https://api.sailresearch.com/v1",  
    api_key=os.environ["PROVIDER_API_KEY"],  
    api_key=os.environ["SAIL_API_KEY"],  
)

response = client.responses.create(
    model="<your-current-model>",  
    model="<sail-model>",  
    input="Explain the key ideas behind transformers.",
)
print(response.output_text)

3. Let your coding agent do it

Across a real codebase the same change touches every call site, env var, and doc reference so let your coding agent run the migration. Open your project and give it this prompt:
Migrate this codebase from its current LLM provider to Sail.

Reference docs: https://docs.sailresearch.com/llms-full.txt

Sail is drop-in compatible with the OpenAI Responses and Chat Completions
APIs, so keep whichever request shape this code already uses. Make these
changes wherever the LLM client is configured or called:

1. Point the client at Sail:
   - base_url: https://api.sailresearch.com/v1
   - api_key: read from the SAIL_API_KEY environment variable

2. Replace model names with a Sail-supported model. Check
   https://docs.sailresearch.com/models and pick the closest equivalent
   (for example, zai-org/GLM-5.1-FP8).

3. Update env var names and any docs/README references to match.
Review the diff it proposes, then run your tests.

Notes

  • Synchronous by default. responses.create blocks and returns the completed response, exactly like OpenAI. Sail is throughput-optimized, so requests can run longer than a typical low-latency API — for very long jobs you can optionally pass background=True to get an ID back immediately and poll, avoiding HTTP timeouts. See the Quickstart.
  • Pick a completion window to trade off cost against turnaround. The default standard window suits most workloads; reach for priority when latency matters or flex for cheap background batches. See Completion windows.

Next steps

Quickstart

Make your first request against Sail.

Models

Browse supported models and pick a replacement.

Pricing

Per-token rates by model and completion window.

Support

Email us if you hit anything unexpected.