Skip to main content
sail.inference provides thin wrappers over Sail’s hosted inference endpoints. They POST the JSON payload as given and return the raw JSON response as a dict. When a Voyage is active, the wrappers attach correlation headers so the model call shows up on the Voyage timeline, scoped to the active span and agent.
import sail

resp = sail.inference.responses.create(
    model="zai-org/GLM-5",
    input="Say hello in one sentence.",
)

chat = sail.inference.chat.completions.create(
    model="zai-org/GLM-5",
    messages=[{"role": "user", "content": "hello"}],
)

responses.create

def create(
    *,
    voyage: Voyage | None = None,
    headers: Mapping[str, str] | None = None,
    timeout: float | None = None,
    **payload,
) -> dict
POSTs payload to /v1/responses and returns the raw JSON response dict.
ParameterDefaultDescription
**payloadThe request body (e.g. model, input). Sent as-is.
voyageNoneCorrelate with an explicit Voyage. Defaults to the current Voyage.
headersNoneExtra request headers to merge.
timeoutNoneRequest timeout in seconds; defaults to a bounded 600s.

chat.completions.create

def create(
    *,
    voyage: Voyage | None = None,
    headers: Mapping[str, str] | None = None,
    timeout: float | None = None,
    **payload,
) -> dict
POSTs payload to /v1/chat/completions and returns the raw JSON response dict. Same parameters as responses.create.

Voyage correlation

If a current Voyage exists (or you pass voyage=), the wrappers add the X-Sail-Voyage-Id header plus the active span/agent context so the dashboard attributes the model call to the right place on the timeline. Pass voyage= to correlate with a specific Voyage, or call inference with no active Voyage for ordinary uncorrelated inference.
with voyage.agent("Reviewer", role="reviewer"):
    with voyage.span("draft"):
        # Auto-attributed to this agent/span.
        sail.inference.responses.create(model="zai-org/GLM-5", input="...")
Auto-spans: a wrapper call made with no active span gets a real, timed span synthesized around it automatically — named after the calling function when derivable — so the model call lands scoped instead of “Missing span”. Synthesized spans carry an _auto marker and render with an “auto” chip in the cockpit. Explicit spans always win (synthesis only happens where you declared nothing); set SAIL_VOYAGE_AUTO_SPANS=0 to disable.

Streaming is not supported

The Sail inference API does not support streaming responses, so passing stream=True raises sail.InferenceError (the API itself rejects it).

Raw HTTP / OpenAI clients

For an OpenAI-style client pointed at Sail’s API, wrap it once and every call attributes itself — headers are computed at call time (so the construction-time snapshot trap is impossible), and un-spanned calls get the same synthesized auto-spans as the sail.inference wrappers:
from openai import OpenAI
import sail

client = sail.voyage.wrap_openai(
    OpenAI(base_url=api_url + "/v1", api_key=os.environ["SAIL_API_KEY"])
)

with sail.voyage.agent("Reviewer", role="reviewer"):
    client.responses.create(model="zai-org/GLM-5", input="...")  # auto-attributed
wrap_openai wraps responses.create, responses.retrieve, and chat.completions.create in place (whichever exist), is idempotent, and follows the process-global current Voyage per call — pass voyage= to pin one. For any other HTTP client, call the endpoint directly and attach the attribution headers yourself with sail.voyage.headers(). The helper carries the full context — voyage id plus the span/agent active at call time — so compute it per request, never once at client construction:
import json, os, urllib.request
import sail

sail.voyage.create(name="raw-client")

api_urls = {
    "prod": "https://api.sailresearch.com",
    "dev": "https://dev.sailresearch.com",
    "staging": "https://staging.sailresearch.com",
}
mode = os.environ.get("SAIL_MODE", "prod").strip().lower() or "prod"
api_url = os.environ.get("SAIL_API_URL", "").strip() or api_urls[mode]

headers = sail.voyage.headers({"Content-Type": "application/json"})
headers["Authorization"] = "Bearer " + os.environ["SAIL_API_KEY"]

req = urllib.request.Request(
    api_url.rstrip("/") + "/v1/responses",
    data=json.dumps({"model": "zai-org/GLM-5", "input": "hello"}).encode(),
    headers=headers,
    method="POST",
)

Voyage limitations

The v0 Voyage SDK does not include native async helpers, an agent framework, orchestration, swarms, memory graphs, tool abstractions, an OpenAI client factory, or Sailbox auto-binding.

Errors

Inference wrappers raise sail.InferenceError (e.g. for stream=True or a missing API key) and sail.InferenceHTTPError for non-2xx responses. See Errors.