Usage API endpoints

All endpoints live under the base URL and require a Bearer API key (the same key used for the inference API). The org is derived from the key.

https://api.sailresearch.com

curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
  "https://api.sailresearch.com/v2/usage/summary?range=30d" | jq

Routes fall into two families: Billing & cost (spend, balance, and token counters) and Operational activity & latency (request counts, tasks, and latency distributions).

Common parameters

Most routes accept a rolling-window range and, for the operational routes, an environment filter.

Parameter	Values	Description
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	Rolling window, or the current billing `period`. The default varies per route. Unknown values fall back to the default rather than erroring.
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	Customer-facing environment label (operational routes). Combine with a comma, e.g. `dev,prod`. Default `all`.

range never returns a 400 for an unrecognized value. It silently falls back to the endpoint’s default window. Invalid environment, date, start, end, bucket_size, status, kind, or after values do return 400, as does an unsupported sla on /tasks and /latency/timeseries. On GET /v2/usage, an unrecognized sla or model is not rejected. It is applied as a filter that matches nothing.

Monetary fields (balance, period_spend, burn_rate, avg_cost_per_day, product_spend, sailbox_spend, and breakdown total) are fractional USD cents expressed as floats. For example 73795.09 is roughly $737.95. Token fields on the public endpoints are raw inference token counts.

Product accounting

Billing responses follow these guarantees:

period_spend and each breakdown bucket’s total include all positive Inference and Sailbox charges.
product_spend always contains exactly two product families, inference and sailboxes. It never emits an other family, and the two values add up to the corresponding combined total.
sailbox_spend reports the currently itemized Sailbox charges. Sailbox products without a line item still count toward product_spend.sailboxes, so the line-item fields may sum to less than it.
Token, model, completion-window, request, and latency metrics are inference-only. Sailbox quantities and identifiers are not represented as tokens, models, or completion windows.
Only base input, output, and cached-input token products add token counts. Surcharges add inference spend without duplicating token quantities. Cached input is included in input, so total = input + output and cached tokens must not be added to total again.
Model and completion-window breakdowns include only explicitly attributed inference spend. Missing attribution stays unassigned instead of creating an other model or completion window.

To calculate product percentages, divide each product amount by the combined total. When the combined total is zero, both percentages are zero.

inference percentage = product_spend.inference / period_spend
sailboxes percentage = product_spend.sailboxes / period_spend

Billing & cost

GET /v2/usage/summary

Combined Inference and Sailbox spend, credit balance, burn rate, days remaining, inference token totals, inference SLA spend mix, and a prior-period comparison.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`30d`

{
  "object": "usage.summary",
  "available": true,
  "empty": false,
  "has_metronome_customer": true,
  "range": "30d",
  "balance": 73795.09,
  "balance_unavailable": false,
  "period_spend": 105392.94,
  "product_spend": {
    "inference": 87200.31,
    "sailboxes": 18192.63
  },
  "sailbox_spend": {
    "active_vcpu": 9240,
    "active_memory": 4720,
    "active_disk": 3210,
    "creation_s": 522.63,
    "creation_m": 500
  },
  "burn_rate": 3513.1,
  "days_remaining": 21,
  "model_count": 14,
  "total_requests": 139197,
  "tokens": {
    "total": 2637290778,
    "input": 2562190467,
    "output": 75100311,
    "cached": 2133342376
  },
  "avg_cost_per_day": 3513.1,
  "sla_mix": {
    "priority": 0.78,
    "asap": 0.12,
    "standard": 0.09,
    "flex": 0.01
  },
  "prior_period": {
    "period_spend": 57216.45,
    "model_count": 13,
    "tokens": {
      "total": 1472254959,
      "input": 1402406455,
      "output": 69848504,
      "cached": 945778881
    },
    "avg_cost_per_day": 1907.21
  }
}

Field	Type	Description
`empty`	boolean	`true` when the org has a billing account but no positive billed Inference or Sailbox spend in the window.
`has_metronome_customer`	boolean	`false` (with zeroed fields) when the org has no billing account.
`balance`	number	Net credit balance, in fractional cents.
`balance_unavailable`	boolean	`true` when the balance lookup failed; other fields are still returned.
`period_spend`	number	Combined Inference and Sailbox spend over the range, in fractional cents.
`product_spend`	object	Combined spend split into exactly `inference` and `sailboxes`. The values add up to `period_spend`.
`product_spend.inference`	number	Positive Inference charges, including surcharges, in fractional cents.
`product_spend.sailboxes`	number	All positive Sailbox charges, including charges without a visible `sailbox_spend` field, in fractional cents.
`sailbox_spend`	object	Currently itemized Sailbox charges. These fields may sum to less than `product_spend.sailboxes`.
`sailbox_spend.active_vcpu`	number	Active Sailbox vCPU-hour spend, in fractional cents.
`sailbox_spend.active_memory`	number	Active Sailbox memory GiB-hour spend, in fractional cents.
`sailbox_spend.active_disk`	number	Active Sailbox disk GiB-hour spend across supported architectures, in fractional cents.
`sailbox_spend.creation_s`	number	Sailbox S creation spend, in fractional cents.
`sailbox_spend.creation_m`	number	Sailbox M creation spend, in fractional cents.
`burn_rate`	number	Combined spend per day over the range, in fractional cents.
`days_remaining`	number \| null	`balance / burn_rate`, or `null` when unknown or above 365.
`model_count`	number	Number of explicitly attributed inference models in the range.
`total_requests`	number \| null	Total billed inference requests over the range, when available.
`tokens`	object	`total`, `input`, `output`, and `cached` raw inference token counts. Cached input is a subset of input, and Sailbox quantities are excluded.
`sla_mix`	object	Fraction of explicitly attributed inference spend per completion window. Unattributed inference spend is outside the denominator and does not create a synthetic tier.
`prior_period`	object	Prior-window combined `period_spend` and `avg_cost_per_day`, plus inference-only `model_count` and token totals.

Plan tier is not exposed via the API; it is shown only on the dashboard.

GET /v2/usage/breakdown

Combined spend and inference token breakdowns per time bucket, plus inference model cost rankings. Use range=day with date=YYYY-MM-DD to drill into a single day at hourly granularity.

Parameter	Values	Default	Description
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`, `day`	`30d`	`1h`/`6h`/`24h`/`day` return hourly granularity.
`date`	`YYYY-MM-DD`	none	Required when `range=day`.

{
  "object": "usage.breakdown",
  "available": true,
  "range": "7d",
  "granularity": "day",
  "data": [
    {
      "timestamp": "2026-06-30",
      "total": 10984.59,
      "product_spend": {
        "inference": 10800,
        "sailboxes": 184.59
      },
      "models": {
        "zai-org/GLM-5.2-FP8": {
          "total": 10657.96,
          "tokens": 439626559,
          "input_tokens": 432900739,
          "output_tokens": 6725820,
          "cached_tokens": 419001344
        },
        "moonshotai/Kimi-K2.6": {
          "total": 22.33,
          "tokens": 283710,
          "input_tokens": 250852,
          "output_tokens": 32858,
          "cached_tokens": 194632
        }
      },
      "slas": { "priority": 10432.31, "standard": 366.8 }
    }
  ],
  "models": [
    {
      "model": "zai-org/GLM-5.2-FP8",
      "total": 10657.96,
      "tokens": 439626559,
      "input_tokens": 432900739,
      "output_tokens": 6725820,
      "cached_tokens": 419001344,
      "slas": { "priority": 10432.31 },
      "percentage": 0.998
    }
  ]
}

Field	Type	Description
`data[]`	array	One entry per time bucket. `total` includes Inference and Sailboxes.
`data[].product_spend`	object	The bucket’s `inference` and `sailboxes` spend, in fractional cents. The values add up to `data[].total`.
`data[].models`	object	Inference spend and token counts keyed by explicitly attributed model ID. Unattributed spend does not create a synthetic model.
`data[].slas`	object	Inference spend keyed by explicitly attributed completion window. Unattributed spend does not create a synthetic completion window.
`models[]`	array	Per-model inference rankings across the whole range, sorted by `total` spend descending.
`models[].percentage`	number	The model’s fraction of explicitly model-attributed inference spend. Sailbox and unattributed inference spend are outside the denominator.

Inference spend without a completion-window attribution is the remainder below. It is plain inference spend, not an other product or completion window.

unattributed inference = max(0, data[].product_spend.inference - sum(data[].slas values))

GET /v2/usage/api-keys

Per-API-key usage, broken down by (api_key_id, model, sla), with a windowed time series. display_name and display_prefix are populated for keys that still exist; deleted keys return null for both (only the stable api_key_id remains).

This endpoint reports inference usage only. Sailbox charges accrue over a Sailbox’s lifetime and do not map reliably to a single API key, so Sailbox usage is excluded from the per-key rows. Use /v2/usage/summary or /v2/usage/breakdown for product-aware spend.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`
`window`	`hour`, `day`	derived from range

{
  "object": "usage.api_keys",
  "available": true,
  "range": "7d",
  "granularity": "day",
  "keys": [
    {
      "key_identity": "id:key_abc123",
      "api_key_id": "key_abc123",
      "display_name": "Production",
      "display_prefix": "sk_QpCO",
      "request_count": 4210,
      "total_tokens": 31000000,
      "input_tokens": 24000000,
      "output_tokens": 7000000,
      "cached_tokens": 1200000,
      "model": "moonshotai/Kimi-K2.6",
      "sla": "priority"
    }
  ],
  "time_series": [
    {
      "time_bucket": "2026-06-30T00:00:00Z",
      "key_identity": "id:key_abc123",
      "api_key_id": "key_abc123",
      "display_name": "Production",
      "display_prefix": "sk_QpCO",
      "request_count": 600,
      "total_tokens": 4400000,
      "input_tokens": 3400000,
      "output_tokens": 1000000,
      "cached_tokens": 150000,
      "model": "moonshotai/Kimi-K2.6",
      "sla": "priority"
    }
  ]
}

Field	Type	Description
`key_identity`	string	Stable identity used to correlate `keys` with `time_series` rows.
`display_name`	string \| null	`null` for deleted keys.
`display_prefix`	string \| null	Non-secret key prefix (e.g. `sk_QpCO`); `null` for deleted keys.

GET /v2/usage/tokens

Inference token counters (input/output/cached) over the range, as raw counts. Only base token products contribute quantities. Sailbox usage and inference surcharges do not add tokens. Cached input is already included in input.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`

{
  "object": "usage.tokens",
  "available": true,
  "range": "24h",
  "tokens": {
    "total": 6159231,
    "input": 5582132,
    "output": 577099,
    "cached": 2050614
  }
}

GET /v2/usage/tokens/timeseries

Raw inference token counts per time bucket, broken down by input/output/cached. The same base-product and cached-input rules as /tokens apply.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`

{
  "object": "usage.tokens.timeseries",
  "available": true,
  "range": "7d",
  "series": [
    {
      "time_bucket": "2026-06-30T00:00:00Z",
      "total": 4400000,
      "input": 3400000,
      "output": 1000000,
      "cached": 150000
    }
  ]
}

Operational activity & latency

These routes report request counts, task activity, and latency. They all accept the environment filter.

GET /v2/usage/activity

Consolidated completed-request counts and average latency over rolling windows.

Parameter	Values	Default
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

{
  "object": "usage.activity",
  "available": true,
  "requests": {
    "last_1m": 1,
    "last_1h": 385,
    "last_24h": 3251,
    "last_7d": 55592
  },
  "latency": { "avg_1m_ms": 6114, "avg_1h_ms": 16494 }
}

GET /v2/usage/activity/timeseries

Per-model completed-request counts over time.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

{
  "object": "usage.activity.timeseries",
  "available": true,
  "metric": "requests",
  "range": "24h",
  "series": [
    {
      "time_bucket": "2026-07-01T00:00:00Z",
      "model": "openai/gpt-oss-120b",
      "count": 420
    }
  ]
}

GET /v2/usage/recent

The most recent requests, including active and finished requests when available.

Parameter	Values	Default
`limit`	`1`–`50`	`10`
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

{
  "object": "usage.activity.recent",
  "available": true,
  "recent_requests": [
    {
      "response_id": "resp_019f383e-efed-78c0-8f06-f14e80dd7a0d",
      "model": "zai-org/GLM-5.2-FP8",
      "sla": "standard",
      "status": "queued",
      "created_at": "2026-07-06T16:24:36.589Z",
      "updated_at": "2026-07-06T16:24:36.589Z"
    }
  ]
}

sla is null when the request has no resolved completion window.

GET /v2/usage/tasks

Paginated task activity with per-task token breakdowns, spanning both active and finished requests.

Parameter	Values	Default
`range`	`1h`, `24h`	`24h`
`status`	`queued`, `in_progress`, `completed`, `failed`, `cancelled`	all
`model`	model ID	all
`sla`	`asap`, `priority`, `standard`, `flex` (legacy: `15m`, `15min`, `24h`, `24hr`)	all
`sort`	`created`, `latency`	`created`
`limit`	`1`–`100`	`25`
`after`	cursor from `next_cursor`	none
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

Only 1h and 24h are valid for range; any other value resolves to 24h (reported back as effective_range).

{
  "object": "usage.activity.tasks",
  "available": true,
  "active_tasks_available": true,
  "tasks": [
    {
      "response_id": "resp_019f383e-c2a9-74c6-bff5-2627e18abc4b",
      "model": "openai/gpt-oss-120b",
      "status": "completed",
      "sla": "asap",
      "created_at": "2026-07-06T16:24:25.001Z",
      "updated_at": "2026-07-06T16:24:31.114Z",
      "completed_at": "2026-07-06T16:24:31.114Z",
      "duration_ms": 6114,
      "input_tokens": 71,
      "cached_input_tokens": 70,
      "output_tokens": 39,
      "reasoning_tokens": 0,
      "total_tokens": 110
    }
  ],
  "next_cursor": "eyJjcmVhdGVkQXQiOiIyMDI2LTA3LTA2VD...",
  "effective_range": "24h",
  "legacy_sla_filters": []
}

Field	Type	Description
`active_tasks_available`	boolean	`true` when in-flight (active) tasks could be merged into the page.
`sla`	string \| null	Completion window; `null` when unresolved.
`completed_at`	string \| null	`null` for tasks that have not finished.
`duration_ms`	number \| null	End-to-end duration; `null` until the task finishes.
`next_cursor`	string \| null	Pass as `after` to fetch the next page; `null` on the last page.
`effective_range`	string	The range actually applied (`1h` or `24h`).
`legacy_sla_filters`	array	Legacy SLA filter descriptors, when the org still has tasks under retired SLA labels.

GET /v2/usage/latency/turn

Turn latency distribution for a single request-to-response time, with p50/p95/p99, aggregate and per-SLA.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

{
  "object": "usage.latency.turn",
  "available": true,
  "aggregate": {
    "n": 14210,
    "avg_ms": 2050,
    "p50_ms": 1800,
    "p95_ms": 4200,
    "p99_ms": 6100,
    "buckets": [
      { "bucket_lo_ms": 0, "bucket_hi_ms": 1000, "count": 3200 },
      { "bucket_lo_ms": 8000, "bucket_hi_ms": null, "count": 140 }
    ],
    "approx": false
  },
  "by_sla": {
    "priority": {
      "n": 9000,
      "avg_ms": 1700,
      "p50_ms": 1500,
      "p95_ms": 3800,
      "p99_ms": 5400,
      "buckets": [],
      "approx": false
    }
  }
}

Field	Type	Description
`buckets[].bucket_hi_ms`	number \| null	Upper edge of the histogram bucket; `null` for the open-ended top bucket.
`percentile_mode`	string	Optional: how percentiles were derived, when the server reports it.
`approx`	boolean	`true` when percentiles were sampled rather than computed exactly (omitted when false in the time series).

GET /v2/usage/latency/trajectory

Trajectory latency distribution for a multi-turn conversation, with the same shape as /latency/turn. Trajectory responses additionally carry avg_turns_per_trajectory, and may set approx: true when percentiles are sampled.

Parameter	Values	Default
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

{
  "object": "usage.latency.trajectory",
  "available": true,
  "aggregate": {
    "n": 3200,
    "avg_ms": 9400,
    "p50_ms": 7600,
    "p95_ms": 21000,
    "p99_ms": 34000,
    "buckets": [{ "bucket_lo_ms": 0, "bucket_hi_ms": 5000, "count": 900 }],
    "approx": false,
    "avg_turns_per_trajectory": 4.7
  },
  "by_sla": {}
}

GET /v2/usage/latency/timeseries

Latency percentiles over time for turns or trajectories.

Parameter	Values	Default
`kind`	`turn`, `trajectory`	`turn`
`range`	`1h`, `6h`, `24h`, `7d`, `30d`, `period`	`24h`
`model`	model ID (only with `kind=turn`)	none
`sla`	completion window (only with `kind=turn`)	none
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`

model and sla filters are only supported for kind=turn. Passing either with kind=trajectory returns a 400.

{
  "object": "usage.latency.timeseries",
  "available": true,
  "kind": "turn",
  "range": "24h",
  "series": [
    {
      "time_bucket": "2026-07-06T10:00:00Z",
      "count": 420,
      "avg_ms": 2000,
      "p50_ms": 1800,
      "p95_ms": 4100,
      "p99_ms": 6000
    }
  ]
}

Series points include approx and avg_turns_per_trajectory only when relevant (both are omitted otherwise).

GET /v2/usage

Completed-request latency and counts in fixed time buckets, with one row per environment per bucket. Unlike the rolling-window routes above, this endpoint takes an explicit start/end/bucket_size window, making it suited to charting a specific time range at a fixed resolution.

Parameter	Values	Default	Description
`start`	RFC3339 timestamp	24h ago	Inclusive window start.
`end`	RFC3339 timestamp	now	Exclusive window end. When omitted, aligned to the bucket boundary.
`bucket_size`	`1m`, `1h`	auto	Defaults to `1m` for windows up to 6 hours, otherwise `1h`.
`environment`	`all`, `dev`, `prod`, `beta`, or a comma-separated list	`all`	Customer-facing environment label.
`model`	model ID	none	Optional exact model filter. Unrecognized values match nothing.
`sla`	`asap`, `priority`, `standard`, `flex`	none	Optional completion-window filter. Unrecognized values match nothing rather than returning a 400.

A window that would produce more than 10,000 buckets for the chosen bucket_size returns a 400. Widen bucket_size or shorten the window.

{
  "object": "usage",
  "available": true,
  "start": "2026-07-05T16:00:00Z",
  "end": "2026-07-06T16:00:00Z",
  "bucket_size": "1h",
  "buckets": [
    {
      "environment": "prod",
      "bucket_start": "2026-07-05T16:00:00Z",
      "bucket_end": "2026-07-05T17:00:00Z",
      "completed_count": 12,
      "latency_sum_ms": 24000,
      "latency_count": 12,
      "avg_latency_ms": 2000
    }
  ]
}

Empty buckets are included (with zeroed counts) so a series has no gaps.

Errors

Errors use the same envelope as the inference API. For usage-API errors, code is set to the same string as type, and param is always null:

{
  "error": {
    "message": "status: must be one of: queued, in_progress, completed, failed, cancelled",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_request_error"
  }
}

Authentication and rate-limit failures are produced by the shared API middleware and may carry a distinct code (for example invalid_api_key) or null (a missing Authorization header).

HTTP status	`type`	`code`	When
400	`invalid_request_error`	`invalid_request_error`	Invalid `date`, `start`, `end`, `bucket_size`, `environment`, `status`, `kind`, or `after` cursor; an unsupported `sla` on `/tasks` or `/latency/timeseries`; a `model`/`sla` filter with `kind=trajectory`; or a window exceeding 10,000 buckets on `GET /v2/usage`. An unknown `range` is not rejected. It falls back to the default.
401	`authentication_error`	`null` \| `invalid_api_key`	Missing (`code` null), invalid, or expired credentials.
402	`billing_error`	`credits_exhausted`	API key disabled due to insufficient credits. The error also includes a `billing_url`.
403	`missing_org_identity`	`missing_org_identity`	API key is not associated with an organization.
429	`rate_limit_error`	`rate_limited`	Too many concurrent requests.
500	`api_error`	`api_error`	Internal server error while fetching usage data.
503	`usage_unavailable`	`usage_unavailable`	Usage data is temporarily unavailable for the requested query.
504	`usage_query_timeout`	`usage_query_timeout`	Query timed out. The error object includes `"retryable": true` and the response sets a `Retry-After: 2` header.

Endpoints return an empty/zeroed payload with HTTP 200 (not an error) when the org has no billing account or no usage in the requested window.

Inference APIs

Usage API

Sailbox

Voyages

Tinker

CLI

SDKs

Common parameters

Product accounting

Billing & cost

GET /v2/usage/summary

GET /v2/usage/breakdown

GET /v2/usage/api-keys

GET /v2/usage/tokens

GET /v2/usage/tokens/timeseries

Operational activity & latency

GET /v2/usage/activity

GET /v2/usage/activity/timeseries

GET /v2/usage/recent

GET /v2/usage/tasks

GET /v2/usage/latency/turn

GET /v2/usage/latency/trajectory

GET /v2/usage/latency/timeseries

GET /v2/usage

Errors

​Common parameters

​Product accounting

​Billing & cost

​GET /v2/usage/summary

​GET /v2/usage/breakdown

​GET /v2/usage/api-keys

​GET /v2/usage/tokens

​GET /v2/usage/tokens/timeseries

​Operational activity & latency

​GET /v2/usage/activity

​GET /v2/usage/activity/timeseries

​GET /v2/usage/recent

​GET /v2/usage/tasks

​GET /v2/usage/latency/turn

​GET /v2/usage/latency/trajectory

​GET /v2/usage/latency/timeseries

​GET /v2/usage

​Errors

Common parameters

Product accounting

Billing & cost

GET /v2/usage/summary

GET /v2/usage/breakdown

GET /v2/usage/api-keys

GET /v2/usage/tokens

GET /v2/usage/tokens/timeseries

Operational activity & latency

GET /v2/usage/activity

GET /v2/usage/activity/timeseries

GET /v2/usage/recent

GET /v2/usage/tasks

GET /v2/usage/latency/turn

GET /v2/usage/latency/trajectory

GET /v2/usage/latency/timeseries

GET /v2/usage

Errors