sail.inference provides thin wrappers over Sail’s hosted inference endpoints.
They POST the JSON payload as given and return the raw JSON response as a
dict. When a Voyage is active, the wrappers attach
correlation headers so the model call shows up on the Voyage timeline, scoped
to the active span and agent.
responses.create
payload to /v1/responses and returns the raw JSON response dict.
| Parameter | Default | Description |
|---|---|---|
**payload | — | The request body (e.g. model, input). Sent as-is. |
voyage | None | Correlate with an explicit Voyage. Defaults to the current Voyage. |
headers | None | Extra request headers to merge. |
timeout | None | Request timeout in seconds; defaults to a bounded 600s. |
chat.completions.create
payload to /v1/chat/completions and returns the raw JSON response
dict. Same parameters as responses.create.
Voyage correlation
If a current Voyage exists (or you passvoyage=), the wrappers add the
X-Sail-Voyage-Id header plus the active span/agent context so the dashboard
attributes the model call to the right place on the timeline. Pass voyage=
to correlate with a specific Voyage, or call inference with no active Voyage
for ordinary uncorrelated inference.
_auto marker and render with
an “auto” chip in the cockpit. Explicit spans always win (synthesis only
happens where you declared nothing); set SAIL_VOYAGE_AUTO_SPANS=0 to
disable.
Streaming is not supported
The Sail inference API does not support streaming responses, so passingstream=True raises sail.InferenceError (the API itself rejects it).
Raw HTTP / OpenAI clients
For an OpenAI-style client pointed at Sail’s API, wrap it once and every call attributes itself — headers are computed at call time (so the construction-time snapshot trap is impossible), and un-spanned calls get the same synthesized auto-spans as thesail.inference wrappers:
wrap_openai wraps responses.create, responses.retrieve, and
chat.completions.create in place (whichever exist), is idempotent, and
follows the process-global current Voyage per call — pass voyage= to pin
one. For any other HTTP client, call the endpoint directly and attach the
attribution headers yourself with
sail.voyage.headers(). The helper carries the full
context — voyage id plus the span/agent active at call time — so compute it
per request, never once at client construction:
Voyage limitations
The v0 Voyage SDK does not include native async helpers, an agent framework, orchestration, swarms, memory graphs, tool abstractions, an OpenAI client factory, or Sailbox auto-binding.Errors
Inference wrappers raisesail.InferenceError (e.g. for stream=True or a
missing API key) and sail.InferenceHTTPError for non-2xx responses. See
Errors.