Migrate to Sail

Sail is a drop-in replacement for OpenAI-compatible inference providers, supporting the OpenAI Responses (/v1/responses) and Chat Completions (/v1/chat/completions) APIs. Switching from OpenAI, or any OpenAI-compatible provider, is just a configuration change.

Want an agent to do it?Copy a migration prompt, paste it into your coding agent, and then use the guide below to review the changes.

Using Claude Code or Codex? Install the Sail skills instead and ask your agent to “Migrate this app to Sail”. The sail-migrate skill guides the full migration, including moving sandboxed execution to Sail.

Migration prompt

Migrate this project's LLM inference to Sail (https://sailresearch.com).

Sail docs — consult these as you work:
- MCP server: https://docs.sailresearch.com/mcp (connect to it if you support MCP)
- Full docs as plain text: https://docs.sailresearch.com/llms-full.txt
- Key pages: https://docs.sailresearch.com/models (catalog),
  https://docs.sailresearch.com/pricing (per-window rates),
  https://docs.sailresearch.com/completion-windows,
  https://docs.sailresearch.com/support (API feature matrix)

Sail is drop-in compatible with the OpenAI Responses and Chat Completions
APIs and the Anthropic Messages API, all served from
https://api.sailresearch.com — keep whichever request shape this code
already uses. Sail's Messages API supports system prompts, tool calling, and
streaming for agentic use; one caveat is that prompt caching (`cache_control`)
is accepted but not yet applied (see /support), so if an Anthropic call site
relies on cache hits, expect full-price reads until that lands. The exact base
URL differs by SDK (see step 4).

If you can ask the user questions, ask whenever a step below is ambiguous
instead of guessing. If you can't, make the best-supported choice and flag
it in your final report.

1. Survey the current setup. Find every place this project calls an LLM:
   SDK clients, raw HTTP calls, framework configs, env vars, and docs. For
   each call site record the provider, API shape, model, and features used
   (streaming, tool calls, structured outputs, images).

2. Choose replacement model(s). For each model currently in use, pick the
   closest match from https://docs.sailresearch.com/models, comparing
   capability tags, context window, and what the model is known to be good
   at. If a current model is not one Sail serves, research it (web search
   if available) to understand its strengths before choosing. If multiple
   Sail models are plausible, ask the user — otherwise pick the best fit
   and explain the choice in your report. Check the /support page for any
   features this code uses that Sail doesn't serve, and flag them.

3. Choose a completion window per call site — this is where Sail savings
   come from. The live https://docs.sailresearch.com/completion-windows and
   https://docs.sailresearch.com/pricing pages are the source of truth for
   the available windows and their turn times; read them. If you can't fetch
   them, fall back to this summary — decide how long each workload can wait
   for a turn:
   - asap: a human is actively waiting on each response (interactive UI)
   - priority (~1 min/turn): latency-sensitive agent loops
   - standard (~5 min/turn): autonomous agents and pipelines — the default
     and the right answer for most agentic workloads
   - flex (best-effort): batch jobs, evals, offline processing; requires
     background=True on the Responses API
   If the workload's latency tolerance isn't obvious from the code, ask
   the user. Set metadata.completion_window explicitly on every call site
   even when choosing the default, and confirm the chosen window is
   available for the chosen model on https://docs.sailresearch.com/pricing.

4. Make the changes wherever the client is configured or called:
   - base URL: use https://api.sailresearch.com/v1 for OpenAI-compatible
     clients (Responses and Chat Completions). For the Anthropic SDK, use
     the bare host https://api.sailresearch.com (e.g. set
     ANTHROPIC_BASE_URL=https://api.sailresearch.com) — the SDK appends
     /v1/messages itself, so a /v1 base URL would resolve to /v1/v1/messages
     and 404
   - API key: read from the SAIL_API_KEY environment variable — never
     hardcode a key or paste a literal key value into the code
     (if using the Anthropic SDK, pass it as auth_token, not api_key)
   - model: the Sail model(s) chosen in step 2
   - metadata.completion_window: the window(s) chosen in step 3
   - add background=True for flex or very long-running requests
   Update env var names, .env.example files, config templates, and any
   README/docs references. Do not change prompts, tools, or business logic.

5. Estimate the savings. Compare the published per-1M-token list prices
   (input, cached input, and output) of the previous model(s) against the
   chosen Sail model(s) at the chosen completion window(s) from
   https://docs.sailresearch.com/pricing — research current provider list
   prices if you don't know them. State the comparison as a simple table
   and an approximate overall multiplier (e.g. "roughly 6x cheaper per
   token"). Do not present this as a precise bill forecast.

6. Verify. Run the project's tests. Then make one real smoke request through
   the new configuration: if SAIL_API_KEY is already set, use it; if not,
   walk the user through creating a key at
   https://app.sailresearch.com/api-keys and setting SAIL_API_KEY, then run
   the smoke request once they have. Don't ask them to paste the key to you —
   have them export it in their own shell.

Finish with a short migration report: call sites changed; model mapping
with rationale; completion window(s) with rationale; the price comparison
from step 5; and anything that needs human follow-up (unsupported
features, ambiguous choices, untested paths). Close by telling the user
exactly where to set SAIL_API_KEY for their setup — locally and in their
production/deployment environment — so the migrated code can authenticate.

1. Get your API key

export SAIL_API_KEY="YOUR_SAIL_API_KEY"

2. See what changes

Already calling the OpenAI Responses API? The request and response are identical:

Sail Responses API

import os

from openai import OpenAI

client = OpenAI(
    base_url="https://api.your-provider.com/v1",  
    base_url="https://api.sailresearch.com/v1",  
    api_key=os.environ["PROVIDER_API_KEY"],  
    api_key=os.environ["SAIL_API_KEY"],  
)

response = client.responses.create(
    model="<your-current-model>",  
    model="<sail-model>",  
    input="Explain the key ideas behind transformers.",
)
print(response.output_text)

Notes

Synchronous by default. responses.create blocks and returns the completed response, exactly like OpenAI. Sail is throughput-optimized, so requests can run longer than a typical low-latency API — for very long jobs you can optionally pass background=True to get an ID back immediately and poll, avoiding HTTP timeouts. See the Quickstart.
Pick a completion window to trade off cost against turnaround. The default standard window suits most workloads; reach for priority when latency matters or flex for cheap background batches. See Completion windows.

Next steps

AI Quickstart

Set your coding agent up with Sail’s docs and skills.

Quickstart

Make your first request against Sail.

Models

Browse supported models and pick a replacement.

Completion windows

How the latency-for-price tradeoff works.

Pricing

Per-token rates by model and completion window.

Cost calculator

Estimate the cost of running your agent on Sail vs other providers.

Support

Email us if you hit anything unexpected.

Getting started

Inference

Guides

Sailbox

Voyages

1. Get your API key

2. See what changes

Notes

Next steps

AI Quickstart

Quickstart

Models

Completion windows

Pricing

Cost calculator

Support

​1. Get your API key

​2. See what changes

​Notes

​Next steps

AI Quickstart

Quickstart

Models

Completion windows

Pricing

Cost calculator

Support

1. Get your API key

2. See what changes

Notes

Next steps