Skip to main content
Tinker is a training API for fine-tuning open-weight models with LoRA. Sail can run the sampling side of your Tinker training loop: sail.SailTokenCompleter is a drop-in tinker-cookbook TokenCompleter that samples from your Tinker checkpoints on Sail — no manual adapter upload step. Token IDs go in and token IDs come out. The completer sends your prompt token IDs to Sail verbatim and returns sampled token IDs with per-token logprobs, so there is no chat-template or re-tokenization drift between training and sampling. Each call creates a background Responses request on the completion window you choose (priority by default) and polls it to completion, retrying transient failures with exponential backoff. To find a guide on how to use sail.SailTokenCompleter in a GRPO-style Tinker training loop, see the following guide.

Install

pip install sail-sdk tinker tinker-cookbook
export SAIL_API_KEY=sk_your_key_here
export TINKER_API_KEY=your_tinker_key

Sample on Sail

SailTokenCompleter works anywhere tinker-cookbook expects a TokenCompleter (i.e. RL rollouts, evals, or direct calls):
import sail
from tinker import types

completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    temperature=0.7,
    completion_window="priority",
)

prompt = types.ModelInput.from_ints(tokens=tokenizer.encode("Question: 2+2?\nAnswer:"))
result = await completer(prompt, stop=["\n"])

result.tokens          # sampled token IDs
result.maybe_logprobs  # per-token logprobs (None when request_logprobs=False)
result.stop_reason

Parameters

ParameterDefaultDescription
model(required)Sail model ID. Must support LoRA serving when a LoRA source is set — see LoRAs.
max_tokens(required)Maximum sampled tokens per call.
temperature1.0Sampling temperature.
top_p1.0Nucleus sampling threshold.
completion_window"priority"Completion window for each request. LoRA requests cannot use asap; the selected window must be supported by the model.
loraNoneName or ID of a LoRA uploaded to Sail.
tinker_lora_signed_urlNoneSigned Tinker checkpoint archive URL. Mutually exclusive with lora.
adapter_configNonePEFT adapter_config.json contents (dict or JSON string). Required with tinker_lora_signed_url.
tinker_lora_nameNoneOptional label for the Tinker checkpoint.
metadataNoneExtra request metadata merged into each request.
timeoutNonePer-HTTP-call timeout in seconds.
request_logprobsTrueRequest per-token logprobs with each sample.
The stop argument on the call itself accepts a string, a list of strings, or token IDs, matching the tinker-cookbook TokenCompleter contract.

Sample from a Tinker checkpoint

To sample from a LoRA you are training in Tinker, save sampler weights, resolve a signed archive URL, and pass both the URL and the adapter’s PEFT config to the completer. Sail downloads the checkpoint archive and loads the adapter for your requests.
import sail

# 1. Save sampler weights for the current step
save_future = await training_client.save_weights_for_sampler_async(
    "rl-step-7",
    ttl_seconds=3600,
)
save_result = await save_future
tinker_path = save_result.path  # tinker://<run-id>/sampler_weights/rl-step-7

# 2. Resolve a signed checkpoint archive URL
signed_url = await sail.get_tinker_checkpoint_signed_url_async(
    service_client,
    tinker_path,
    ttl_seconds=3600,  # optional: set/extend the checkpoint TTL
)

# 3. Sample from the checkpoint on Sail
completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    completion_window="priority",
    tinker_lora_signed_url=signed_url,
    adapter_config=adapter_config,  # contents of the PEFT adapter_config.json
    tinker_lora_name="rl-step-7",
)
adapter_config is the PEFT adapter config for the LoRA Tinker is training. The same compatibility rules apply as for uploaded LoRAs: the base model must match model, and the rank must be within the base model’s limit. When ttl_seconds is passed to get_tinker_checkpoint_signed_url_async, the helper sets the Tinker checkpoint’s TTL before resolving the URL, so per-step RL sampler checkpoints are cleaned up automatically instead of accumulating in your Tinker account.

Using an uploaded LoRA instead

If you have already uploaded a LoRA to Sail, pass its name or ID as lora instead of a signed URL:
completer = sail.SailTokenCompleter(
    model="moonshotai/Kimi-K2.6",
    max_tokens=256,
    completion_window="priority",
    lora="funnier-v1",
)

Constraints

  • Tinker checkpoints only apply through SailTokenCompleter. The adapter is loaded on Sail’s raw-token sampling path. A plain text Responses or Chat Completions request that happens to carry Tinker checkpoint metadata is served by the base model. Sample from Tinker checkpoints only via SailTokenCompleter.
  • lora and tinker_lora_signed_url are mutually exclusive. Pass one LoRA source per completer.
  • adapter_config is required with tinker_lora_signed_url. Sail needs the PEFT config to load the checkpoint weights.
  • model must support LoRA serving when a LoRA source is set (see supported base models).
  • LoRA requests cannot use the asap completion window. Set completion_window to priority (the default), standard, or flex when the selected model supports that window (see Completion Windows).
  • Signed checkpoint URLs expire. Resolve a fresh URL for each new checkpoint, and re-resolve if a long-running loop reuses an old one.
  • tinker-cookbook must be installed. Constructing a SailTokenCompleter without it raises an error; the rest of the sail SDK works without Tinker packages.