Overview - Sail Research

Sail serves trillions of tokens, with support for the best open-source models and your own LoRA fine-tunes. To achieve maximum efficiency for long-horizon agents, we serve traffic at higher latencies in tiers of service called completion windows.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sailresearch.com/v1",
    api_key="YOUR_SAIL_API_KEY",
)

completion = client.chat.completions.create(
    model="zai-org/GLM-5.2-FP8",
    messages=[{"role": "user", "content": "What are the top 3 things to do in San Francisco?"}],
)

print(completion.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.sailresearch.com/v1",
  apiKey: process.env.SAIL_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "zai-org/GLM-5.2-FP8",
  messages: [
    {
      role: "user",
      content: "What are the top 3 things to do in San Francisco?",
    },
  ],
});

console.log(completion.choices[0].message.content);

curl https://api.sailresearch.com/v1/chat/completions \
  -H "Authorization: Bearer $SAIL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-5.2-FP8",
    "messages": [
      {
        "role": "user",
        "content": "What are the top 3 things to do in San Francisco?"
      }
    ]
  }'

Run an AI modelRun leading open-source AI models with our OpenAI-compatible inference API.Create a SailboxGive long-horizon agents persistent compute that can run indefinitely.Send requests at scaleUse completion windows and background requests for large workloads.

Intelligence at scale

More agents thinking longer and harder, with space to act and explore, can do incredible things:

Detailuses Sail inference to deeply scan codebases for their most consequential yet hard-to-catch bugs
Jack & Jillruns large-scale deep research with Sail inference, matching job seekers’ resumes with job descriptions from thousands of employers
Wewon Browsecomp-Plus, the AI deep research benchmark, using open models running on Sail inference
Webuilt Redis in Rustwith a swarm of 4 long-horizon coding agents running on Sailboxes with Sail inference over 27 hours

Trust Center

​Intelligence at scale

Intelligence at scale