Skip to main content
When a POST request is sent with an idempotency key header, Sail stores the reservation alongside the resulting response record. Retrying the same request with the same key returns the previously reserved response — Sail does not re-run inference. This makes client retries safe across transient network failures, timeouts, and ambiguous 5xx responses.

Sending an idempotency key

Generate a key per logical request (UUIDs or any unique string ≤ 255 chars work well) and attach it to the first attempt and every retry.
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_SAIL_API_KEY",
    base_url="https://api.sailresearch.com/v1",
)

response = client.responses.create(
    model="openai/gpt-oss-20b",
    input="Summarize this document.",
    extra_headers={"Idempotency-Key": "order-9f8e7d6c"},
)

Behavior

  • Scope. Reservations are keyed by (organization, API key, idempotency key). The same key sent under a different API key is a distinct reservation — rotating keys does not break replay, but sharing an idempotency key across keys won’t dedupe.
  • Body fingerprint. Sail fingerprints the request body with SHA-256. Reusing a key with a materially different body returns 400 idempotency_error rather than replaying the earlier response. This prevents a retry from silently returning the wrong answer if the client changed the request.
  • Validation failures don’t consume the key. A request that fails validation (any 4xx before the reservation is written) does not reserve the key. You may retry with a corrected body using the same key.
  • Replays reflect current state. A replay reads the current state of the underlying response record, not a frozen snapshot. For background requests, the returned status reflects the task’s most recent transition (e.g. queuedin_progresscompleted).

Retrying with an idempotency key

The key only pays off when you actually retry. The first attempt reserves the key. Subsequent attempts hit the same reservation and get the stored response back without re-running inference.

When to retry

SignalWhat it meansWhat to do
Network error / timeoutAmbiguous — the server may or may not have received the requestRetry with the same idempotency key
5xx on a POSTTransient server-side failureRetry with the same idempotency key
429 + Retry-AfterRate limitWait the Retry-After value, then retry
body.status: "failed"Inference genuinely failedInvestigate the cause; do not blind-retry

How to retry

import random
import time
import uuid
import requests

API_KEY = "YOUR_SAIL_API_KEY"
BASE_URL = "https://api.sailresearch.com/v1"

# One key, reused across every retry.
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "Idempotency-Key": str(uuid.uuid4()),
}
body = {"model": "openai/gpt-oss-20b", "input": "Summarize this document."}

for attempt in range(5):
    try:
        resp = requests.post(f"{BASE_URL}/responses", headers=headers, json=body, timeout=30)
        resp.raise_for_status()
        break
    except requests.RequestException:
        # Network error or 5xx — safe to retry, the key dedupes server-side.
        time.sleep(random.uniform(0, 2**attempt))
else:
    raise RuntimeError("all retries exhausted")

print(resp.json()["id"])

See also

  • Sending Requests at Scale — batch and background patterns where idempotent retries are most useful.
  • Webhooks — pair idempotent retries with webhook delivery so clients can re-drive submission without re-running inference.