Skip to main content
Upload PEFT-trained LoRAs for supported base models. After upload, give the LoRA a name and pass that name or ID in request metadata. If you train LoRAs with Tinker, you can also sample directly from Tinker checkpoints without uploading — see Tinker.

Supported base models

LoRA serving is currently in a limited pilot with select partners. If you’d like to fine-tune and serve LoRAs on Sail, get in touch.
LoRA serving is available for:
Base modelMax rankTarget modules
moonshotai/Kimi-K2.632all
If a LoRA is incompatible with the base model or exceeds the rank limit, validation or inference fails.

Adapter requirements

Train with PEFT and export the two standard files:
  • adapter_config.json
  • adapter_model.safetensors
Use adapter config values compatible with the base model:
  • base_model_name_or_path should identify the base model you register in supported_models (e.g. moonshotai/Kimi-K2.6).
  • peft_type should be "LORA".
  • task_type should be "CAUSAL_LM".
  • r (rank) must be no more than the base model’s max rank.
  • target_modules can include any modules supported by the base model and runtime. Sail does not restrict Kimi K2.6 LoRAs to a fixed target-module allowlist.
Other adapter-config fields (lora_alpha, lora_dropout, bias settings, etc.) are preserved as-is. The adapter_config.json and adapter_model.safetensors files can each be up to 100 GiB.

Add a LoRA

Create a LoRA in three steps: upload the config file, upload the weights file, then call POST /v1/loras. The response includes validation records for each requested model.

1. Upload the two adapter files

Use POST /v1/files (multipart):
from openai import OpenAI

client = OpenAI(base_url="https://api.sailresearch.com/v1", api_key="YOUR_KEY")

with open("my-adapter/adapter_config.json", "rb") as f:
    cfg = client.files.create(file=f, purpose="lora")

with open("my-adapter/adapter_model.safetensors", "rb") as f:
    wts = client.files.create(file=f, purpose="lora")

2. Create the LoRA

import requests

resp = requests.post(
    "https://api.sailresearch.com/v1/loras",
    headers={"Authorization": "Bearer YOUR_KEY"},
    json={
        "name": "funnier-v1",
        "supported_models": ["moonshotai/Kimi-K2.6"],
        "config_file_id": cfg.id,
        "weights_file_id": wts.id,
        # optional
        "display_name": "Funnier v1",
        "description": "Fine-tuned on a standup-comedy corpus to write punchier, joke-forward replies.",
    },
    timeout=30,
)
resp.raise_for_status()
lora = resp.json()
print(lora["id"], lora["status"])  # e.g. 3fa85f64-... pending_validation
Each supported_models entry must be a known Sail model ID. Naming rules for the name field:
  • 2–64 characters
  • lowercase alphanumeric or dashes ([a-z0-9-])
  • must start and end with an alphanumeric character
  • unique within your organization (duplicate → 409)
The file IDs you pass must belong to the same organization as your API key.

Validation flow

POST /v1/loras creates one validation record per supported_models entry. Each record appears in the response under validations:
{
  "id": "3fa85f64-...",
  "name": "funnier-v1",
  "status": "pending_validation",
  "supported_models": ["moonshotai/Kimi-K2.6"],
  "validations": [
    {
      "model": "moonshotai/Kimi-K2.6",
      "status": "pending",
      "response_id": "resp_..."
    }
  ]
}
Sail validates the LoRA on each model in supported_models. Poll GET /v1/loras/{name} until validation finishes for the model you want to use.
Validation statusMeaning
pendingThe validation task has been created.
runningThe validation request is in progress.
succeededThe LoRA loaded and completed a validation request for that model.
failedThe LoRA is not usable for that model. result_code and result_message describe the failure.
unverifiedSail could not complete validation automatically.
Read validations for model-specific status. Requests using a LoRA are rejected only when the latest validation for that model is failed; pending, running, and unverified records remain usable. If you later add a model with PATCH /v1/loras/{name}, Sail creates validation records for newly added models that have not already succeeded validation.

3. Fetch LoRAs

# by name
curl -H "Authorization: Bearer $SAIL_API_KEY" https://api.sailresearch.com/v1/loras/funnier-v1

# by id
curl -H "Authorization: Bearer $SAIL_API_KEY" https://api.sailresearch.com/v1/loras/3fa85f64-...

# list all loras for your org
curl -H "Authorization: Bearer $SAIL_API_KEY" https://api.sailresearch.com/v1/loras

Use a LoRA

Pass the LoRA’s name (or its UUID) as metadata.lora on any Responses, Chat Completions, or Messages request. model must be one of the LoRA’s supported_models:
response = client.responses.create(
    model="moonshotai/Kimi-K2.6",
    input=[{"role": "user", "content": "Write a one-liner about a centrifuge that's having a bad day."}],
    metadata={
        "lora": "funnier-v1",
        "completion_window": "priority",
    },
    background=True,
)
Chat Completions:
response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.6",
    messages=[{"role": "user", "content": "Tell me a joke about Go channels."}],
    extra_body={"metadata": {"lora": "funnier-v1", "completion_window": "priority"}},
)

Constraints on LoRA requests

  • completion_window cannot be asap. Use priority, standard, or flex — see Completion Windows for what each means.
  • model must be in the LoRA’s supported_models list. Requesting a different base model returns 400.
  • A failed model validation blocks that model. GET /v1/loras/{name} includes validations[].result_message when validation fails.
  • The LoRA must belong to your organization. Names are scoped per-org; two orgs can independently own a LoRA called funnier-v1.
  • You can use either the LoRA’s name or its UUID in metadata.lora.