Tinker

Tinker is a training API for fine-tuning open-weight models with LoRA. Sail can run the sampling side of your Tinker training loop: sail.SailTokenCompleter is a drop-in tinker-cookbook TokenCompleter that samples from your Tinker checkpoints on Sail — no manual adapter upload step. Token IDs go in and token IDs come out. The completer sends your prompt token IDs to Sail verbatim and returns sampled token IDs with per-token logprobs, so there is no chat-template or re-tokenization drift between training and sampling. Each call creates a background Responses API request for the completion window you choose (priority by default) and polls it to completion, retrying transient failures with exponential backoff. We also have a guide on how to use sail.SailTokenCompleter in a GRPO-style Tinker training loop.

Install

pip install sail tinker tinker-cookbook
export SAIL_API_KEY=sk_your_key_here
export TINKER_API_KEY=your_tinker_key

Sample on Sail

SailTokenCompleter works anywhere tinker-cookbook expects a TokenCompleter (i.e. RL rollouts, evals, or direct calls):

import sail
from tinker import types

completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    temperature=0.7,
    completion_window="priority",
)

prompt = types.ModelInput.from_ints(tokens=tokenizer.encode("Question: 2+2?\nAnswer:"))
result = await completer(prompt, stop=["\n"])

result.tokens          # sampled token IDs
result.maybe_logprobs  # per-token logprobs (None when request_logprobs=False)
result.stop_reason

Parameters

Parameter	Default	Description
`model`	(required)	Sail model ID. Must support LoRA serving when a LoRA source is set — see LoRAs.
`max_tokens`	(required)	Maximum sampled tokens per call.
`temperature`	`1.0`	Sampling temperature.
`top_p`	`1.0`	Nucleus sampling threshold.
`completion_window`	`"priority"`	Completion window for each request. LoRA requests cannot use `asap`; the selected window must be supported by the model.
`lora`	`None`	Name or ID of a LoRA uploaded to Sail.
`tinker_lora_signed_url`	`None`	Signed Tinker checkpoint archive URL. Mutually exclusive with `lora`.
`adapter_config`	`None`	PEFT `adapter_config.json` contents (dict or JSON string). Required with `tinker_lora_signed_url`.
`tinker_lora_name`	`None`	Optional label for the Tinker checkpoint.
`metadata`	`None`	Extra request metadata merged into each request.
`timeout`	`None`	Per-HTTP-call timeout in seconds.
`request_logprobs`	`True`	Request per-token logprobs with each sample.

The stop argument on the call itself accepts a string, a list of strings, or token IDs, matching the tinker-cookbook TokenCompleter contract.

Sample from a Tinker checkpoint

To sample from a LoRA you are training in Tinker, save sampler weights, resolve a signed archive URL, and pass both the URL and the adapter’s PEFT config to the completer. Sail downloads the checkpoint archive and loads the adapter for your requests.

import sail

# 1. Save sampler weights for the current step
save_future = await training_client.save_weights_for_sampler_async(
    "rl-step-7",
    ttl_seconds=3600,
)
save_result = await save_future
tinker_path = save_result.path  # tinker://<run-id>/sampler_weights/rl-step-7

# 2. Resolve a signed checkpoint archive URL
signed_url = await sail.get_tinker_checkpoint_signed_url_async(
    service_client,
    tinker_path,
    ttl_seconds=3600,  # optional: set/extend the checkpoint TTL
)

# 3. Sample from the checkpoint on Sail
completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    completion_window="priority",
    tinker_lora_signed_url=signed_url,
    adapter_config=adapter_config,  # contents of the PEFT adapter_config.json
    tinker_lora_name="rl-step-7",
)

adapter_config is the PEFT adapter config for the LoRA Tinker is training. The same compatibility rules apply as for uploaded LoRAs: the base model must match model, and the rank must be within the base model’s limit. When ttl_seconds is passed to get_tinker_checkpoint_signed_url_async, the helper sets the Tinker checkpoint’s TTL before resolving the URL, so per-step RL sampler checkpoints are cleaned up automatically instead of accumulating in your Tinker account.

Using an uploaded LoRA instead

If you have already uploaded a LoRA to Sail, pass its name or ID as lora instead of a signed URL:

completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    completion_window="priority",
    lora="funnier-v1",
)

Constraints

Tinker checkpoints only apply through SailTokenCompleter. The adapter is loaded on Sail’s raw-token sampling path. A plain text Responses or Chat Completions request that happens to carry Tinker checkpoint metadata is served by the base model. Sample from Tinker checkpoints only via SailTokenCompleter.
lora and tinker_lora_signed_url are mutually exclusive. Pass one LoRA source per completer.
adapter_config is required with tinker_lora_signed_url. Sail needs the PEFT config to load the checkpoint weights.
model must support LoRA serving when a LoRA source is set (see supported base models).
LoRA requests cannot use the asap completion window. Set completion_window to priority (the default), standard, or flex when the selected model supports that window (see Completion Windows).
Signed checkpoint URLs expire. Resolve a fresh URL for each new checkpoint, and re-resolve if a long-running loop reuses an old one.
tinker-cookbook must be installed. Constructing a SailTokenCompleter without it raises an error; the rest of the sail SDK works without Tinker packages.

Getting started

Inference

Guides

Sailbox

Voyages

Install

Sample on Sail

Parameters

Sample from a Tinker checkpoint

Using an uploaded LoRA instead

Constraints

​Install

​Sample on Sail

​Parameters

​Sample from a Tinker checkpoint

​Using an uploaded LoRA instead

​Constraints

Install

Sample on Sail

Parameters

Sample from a Tinker checkpoint

Using an uploaded LoRA instead

Constraints