Skip to main content
You can estimate the inference costs of running your agent by picking a workload profile below, or set your own per-turn token mix.

The interactive calculator needs JavaScript, which your browser doesn’t support. Here is a representative worked example instead: a deep-research agent that runs 50 turns and totals 4.0M cached input reads, 1.5M fresh input tokens, and 250K output tokens, priced on Sail’s zai-org/GLM-5.1-FP8 and on generic frontier-class tiers at list prices (Sonnet-class: $3 input / $15 output / $0.30 cache reads per 1M tokens; Opus-class: $5 / $25 / $0.50).

Where it runsCostvs Sail standard
Sail flex$1.370.7x
Sail standard$1.861x
Sail priority$2.521.4x
Sail asap$4.242.3x
Sonnet-class API, batch tier (24h window)$4.732.5x
Sonnet-class API$9.455.1x
Opus-class API$15.758.5x

How the math works

cost per run = turns × (fresh × P_input + cached × P_cached + output × P_output) / 1,000,000
  • Fresh input: tokens the model reads for the first time each turn (new tool results, search snippets, file contents).
  • Cached input: tokens reread from prompt cache (the growing conversation history). Cache reads are billed at the cached rate.
  • Output: tokens the model writes (reasoning and answers).
Sail rates come from the Pricing page for the selected model and completion window; daily and monthly figures multiply by runs per day and a 30-day month.

Assumptions and caveats

  • Frontier tiers are generic list prices. “Sonnet-class” is $3 input / $0.30 cache reads / $15 output per 1M tokens (batch tier = 50% off inside a 24-hour window); “Opus-class” is $5 / $0.50 / $25.
  • The open-model provider pricing row is fetched live, using the current OpenRouter list price
  • The model has to do the job. For simplicity, the math assumes you’re using a single frontier-class open model for your task. Often, we see the hybrid approach using both frontier closed models and open models, or a mix of open models.