Tinker

The sail.tinker helpers let you drive Sail inference from a Tinker RL or training loop. They bridge Tinker’s token-level sampling interface to Sail’s raw-token Responses path, so rollouts run against Sail-hosted models (optionally with a LoRA adapter) while logprobs flow back into your training code.

These helpers require tinker-cookbook installed alongside sail. Constructing a SailTokenCompleter without tinker-cookbook available raises sail.InferenceError.

sail.SailTokenCompleter

A Tinker TokenCompleter backed by Sail’s raw-token Responses API: each call sends the prompt token ids via the raw_prompt_tokens request parameter, which skips server-side chat templating and tokenization and forwards the ids verbatim to the model. Construct one with a model and sampling settings, then await it on tokenized prompts to get sampled tokens and their logprobs.

import asyncio

import sail
from tinker import types


async def main() -> None:
    completer = sail.SailTokenCompleter(
        model="moonshotai/Kimi-K2.6",
        max_tokens=256,
        temperature=0.7,
        completion_window="priority",
    )

    # Prompt token ids from your tokenizer, wrapped in Tinker's ModelInput.
    model_input = types.ModelInput.from_ints(tokens=[9906, 11, 1917, 0])

    result = await completer(model_input)
    print(result.tokens)         # list[int] of sampled token ids
    print(result.maybe_logprobs) # list[float] | None
    print(result.stop_reason)    # e.g. "stop", "length"


asyncio.run(main())

Constructor

SailTokenCompleter(
    *,
    model: str,
    max_tokens: int,
    temperature: float = 1.0,
    top_p: float = 1.0,
    completion_window: str = "priority",
    lora: str | None = None,
    tinker_lora_signed_url: str | None = None,
    adapter_config: Mapping[str, Any] | str | None = None,
    tinker_lora_name: str | None = None,
    metadata: Mapping[str, str] | None = None,
    timeout: float | None = None,
    poll_timeout: float | None = 1200.0,
    voyage: sail.Voyage | None = None,
    request_logprobs: bool = True,
)

Parameter	Description
`model`	Required. Model id to sample from. Raises `ValueError` if empty.
`max_tokens`	Required. Max tokens to generate; must be `> 0` (`ValueError` otherwise).
`temperature`	Sampling temperature. Default `1.0`.
`top_p`	Nucleus sampling cutoff. Default `1.0`.
`completion_window`	Completion window for each request. Default `"priority"`. `"asap"` is not supported by `SailTokenCompleter`; use `"priority"`, `"standard"`, or `"flex"` where the model supports them.
`lora`	Name of a Sail-registered LoRA adapter to apply. Mutually exclusive with `tinker_lora_signed_url`.
`tinker_lora_signed_url`	Signed URL to a Tinker checkpoint archive (see `get_tinker_checkpoint_signed_url_async`). Requires `adapter_config`. Mutually exclusive with `lora`.
`adapter_config`	LoRA adapter config as a mapping or JSON string. Required when `tinker_lora_signed_url` is set.
`tinker_lora_name`	Optional human-readable name attached to the Tinker LoRA.
`metadata`	Extra string metadata forwarded on each request.
`timeout`	Per-request timeout in seconds.
`poll_timeout`	Seconds to wait for a sampled result before giving up. Default `1200` (20 minutes); `None` waits indefinitely.
`voyage`	A `sail.Voyage` to attribute requests to a voyage.
`request_logprobs`	Whether to request logprobs from the server. Default `True`.

Passing both lora and tinker_lora_signed_url, or setting tinker_lora_signed_url without adapter_config, raises ValueError.

`async call(model_input, stop=None)`

async def __call__(model_input, stop=None) -> TokensWithLogprobs

model_input: must expose a callable .to_ints() returning the prompt token ids (this is Tinker’s ModelInput). A non-callable to_ints, a non-integer token, or an empty prompt raises TypeError/ValueError.
stop: optional stop condition. An int is wrapped as a single-element list; a tuple is converted to a list; other values pass through unchanged.

Returns a Tinker TokensWithLogprobs:

Field	Description
`tokens`	`list[int]`: the sampled token ids.
`maybe_logprobs`	`list[float]` or `None`: per-token logprobs, or `None` when the response carried none.
`stop_reason`	The model’s stop reason (e.g. `"stop"`, `"length"`), falling back to the response `status` or `"unknown"`.

If the Sail response is malformed (missing or non-integer token data, or mismatched token and logprob lengths), a sail.InferenceError is raised with the offending response attached as exc.response.

get_tinker_checkpoint_signed_url_async

await get_tinker_checkpoint_signed_url_async(
    service_client,
    tinker_path: str,
    *,
    ttl_seconds: int | None = None,
) -> str

Resolves a Tinker checkpoint path to a signed archive URL, suitable for passing as tinker_lora_signed_url to SailTokenCompleter. It resolves the path against the Tinker service client and returns the signed URL. This helper is async-only and requires a Tinker service client with async checkpoint-URL support.

Parameter	Description
`service_client`	A Tinker `ServiceClient` (exposes `create_rest_client()`).
`tinker_path`	The Tinker checkpoint path to resolve.
`ttl_seconds`	Optional checkpoint TTL to set or extend before resolving the signed URL.

Raises sail.InferenceError if the Tinker client does not provide async checkpoint URL methods, or if the response does not contain a URL.

import sail

signed_url = await sail.get_tinker_checkpoint_signed_url_async(
    service_client,
    tinker_path,
    ttl_seconds=3600,
)

completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    completion_window="priority",
    tinker_lora_signed_url=signed_url,
    adapter_config={"r": 16, "alpha": 32},
)

Inference APIs

Usage API

Sailbox

Voyages

CLI

SDKs

Tinker

sail.SailTokenCompleter

Constructor

`async call(model_input, stop=None)`

get_tinker_checkpoint_signed_url_async

​sail.SailTokenCompleter

​Constructor

​async __call__(model_input, stop=None)

​get_tinker_checkpoint_signed_url_async

sail.SailTokenCompleter

Constructor

`async call(model_input, stop=None)`

get_tinker_checkpoint_signed_url_async