Completion windows - Sail Research

Sail optimizes for throughput and cost, not single-turn latency. For agentic workloads, what matters is how long the full trajectory takes from start to finish, not how fast any one request returns. Select a completion window based on the trajectory wall-clock time your workload can tolerate.

Completion windows at a glance

Window	Avg. turn time	Typical use case	Price vs `asap`
`asap`	Immediate	Interactive UIs, human-in-the-loop	—
`priority`	~1 min	Latency-sensitive agent loops	~30-50% lower
`standard`	~5 min	Cost-optimized agents	~45-65% lower
`flex`	best-effort	Batch processing, evals, offline	~60-80% lower

Actual turn times will vary based on your specific workload and chosen model. Exact discounts vary by model and by token type (input, output, and cached tokens are discounted differently).

For what these rates add up to on a real workload, use the Agent Cost Calculator.

Completion window details

`standard`

The default completion window for models that support it. Uses Sail’s max-efficiency serving stack and targets an average turn time of roughly five minutes on a balanced workload. Most of Sail’s prices are quoted at this completion window.

`priority`

Sail’s tightest scheduled tier. Targets a shorter average turn time than standard (~1 min vs ~5 min on a balanced workload) in exchange for higher per-token pricing. Use priority for latency-sensitive agent loops where each turn directly feeds the next and several extra minutes per turn would meaningfully drag out the trajectory.

`flex`

Schedules work when compute is cheapest, e.g. overnight or off-peak, and does not target a specific response time. Best for batch, non-agentic workloads. Requires background=True.

`asap`

Runs immediately on the fastest hardware we have, in a latency-optimized config.

How to set completion windows

Set metadata.completion_window on your request:

response = client.responses.create(
    model="zai-org/GLM-5.2-FP8",
    input="Explain the key ideas behind transformers.",
    background=True,
    metadata={
        "completion_window": "priority"
    }
)

Supported values: "asap", "priority", "standard", and "flex".

Default behavior

Requests that omit completion_window default to standard. If the standard completion window is not available for the model, requests use the flex completion window when it is available, and use asap. If you set completion_window explicitly to a window that is not supported for the model, the request is rejected with 400 invalid_request_error. The error message enumerates the model’s actual supported completion windows. You can also check the pricing page for the up-to-date support matrix.

​Completion windows at a glance

​Completion window details

​standard

​priority

​flex

​asap

​How to set completion windows

​Default behavior

Completion windows at a glance

Completion window details

`standard`

`priority`

`flex`

`asap`

How to set completion windows

Default behavior