Idempotent Requests - Sail Research

When a POST request is sent with an idempotency key header, Sail stores the reservation alongside the resulting response record. Retrying the same request with the same key returns the previously reserved response — Sail does not re-run inference. This makes client retries safe across transient network failures, timeouts, and ambiguous 5xx responses.

Sending an idempotency key

Generate a key per logical request (UUIDs or any unique string ≤ 255 chars work well) and attach it to the first attempt and every retry.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_SAIL_API_KEY",
    base_url="https://api.sailresearch.com/v1",
)

response = client.responses.create(
    model="zai-org/GLM-5.1-FP8",
    input="Summarize this document.",
    extra_headers={"Idempotency-Key": "order-9f8e7d6c"},
)

Behavior

Scope. Reservations are keyed by (organization, API key, idempotency key). The same key sent under a different API key is a distinct reservation — rotating keys does not break replay, but sharing an idempotency key across keys won’t dedupe.
Body fingerprint. Sail fingerprints the request body with SHA-256. Reusing a key with a materially different body returns 400 idempotency_error rather than replaying the earlier response. This prevents a retry from silently returning the wrong answer if the client changed the request.
Validation failures don’t consume the key. A request that fails validation (any 4xx before the reservation is written) does not reserve the key. You may retry with a corrected body using the same key.
Replays reflect current state. A replay reads the current state of the underlying response record, not a frozen snapshot. For background requests, the returned status reflects the task’s most recent transition (e.g. queued → in_progress → completed).

Retrying with an idempotency key

The key only pays off when you actually retry. The first attempt reserves the key. Subsequent attempts hit the same reservation and get the stored response back without re-running inference.

When to retry

Signal	What it means	What to do
Network error / timeout	Ambiguous — the server may or may not have received the request	Retry with the same idempotency key
`5xx` on a POST	Transient server-side failure	Retry with the same idempotency key
`429` + `Retry-After`	Rate limit	Wait the `Retry-After` value, then retry
`body.status: "failed"`	Inference genuinely failed	Investigate the cause; do not blind-retry

How to retry

import random
import time
import uuid
import requests

API_KEY = "YOUR_SAIL_API_KEY"
BASE_URL = "https://api.sailresearch.com/v1"

# One key, reused across every retry.
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "Idempotency-Key": str(uuid.uuid4()),
}
body = {"model": "zai-org/GLM-5.1-FP8", "input": "Summarize this document."}

for attempt in range(5):
    try:
        resp = requests.post(f"{BASE_URL}/responses", headers=headers, json=body, timeout=30)
        resp.raise_for_status()
        break
    except requests.RequestException:
        # Network error or 5xx — safe to retry, the key dedupes server-side.
        time.sleep(random.uniform(0, 2**attempt))
else:
    raise RuntimeError("all retries exhausted")

print(resp.json()["id"])

​Sending an idempotency key

​Behavior

​Retrying with an idempotency key

​When to retry

​How to retry

​See also

Sending an idempotency key

Behavior

Retrying with an idempotency key

When to retry

How to retry

See also