The Usage API lets you query your Sail usage metrics from scripts, CLIs, or dashboards. It uses the same API key you use for api.sailresearch.com.
Base URL
https://usage.sailresearch.com
Authentication
All requests require a Bearer API key in the Authorization header.
This is the same API key you use for the inference API at
api.sailresearch.com. Create or manage keys from the Sail
dashboard.
import requests
headers = {"Authorization": "Bearer YOUR_SAIL_API_KEY"}
resp = requests.get("https://usage.sailresearch.com/v1/usage", headers=headers)
print(resp.json())
GET /v1/usage
Returns a spending and balance summary for the requested time range.
| Parameter | Values | Default | Description |
|---|
range | 24h, 7d, 30d, period | 30d | Time window. period = current billing period. |
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
"https://usage.sailresearch.com/v1/usage?range=7d" | jq
{
"object": "usage.summary",
"range": "7d",
"period_spend": 5432,
"burn_rate": 776,
"balance": 10000,
"balance_unavailable": false,
"days_remaining": 12,
"plan_type": "self_serve",
"model_count": 3,
"tokens": {
"total": 1000000,
"input": 600000,
"output": 350000,
"cached": 50000
},
"avg_cost_per_day": 776,
"sla_mix": {
"15m": 0.7,
"1h": 0.3
},
"prior_period": {
"period_spend": 4000,
"model_count": 2,
"tokens": {
"total": 800000,
"input": 500000,
"output": 280000,
"cached": 20000
},
"avg_cost_per_day": 571
}
}
All monetary values are in cents. prior_period covers the equivalent
window immediately before the requested range, for comparison.
GET /v1/usage/breakdown
Returns time-series spend data and per-model ranking. Use this for charting.
| Parameter | Values | Default | Description |
|---|
range | 24h, 7d, 30d, period, day | 30d | Time window. |
date | YYYY-MM-DD | — | Required when range=day. Drills into hourly data for that date. |
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
"https://usage.sailresearch.com/v1/usage/breakdown?range=7d" | jq
{
"object": "usage.breakdown",
"range": "7d",
"granularity": "day",
"data": [
{
"timestamp": "2025-01-15",
"total": 800,
"models": {
"zai-org/GLM-5": {
"total": 500,
"tokens": 50000,
"input_tokens": 30000,
"output_tokens": 15000,
"cached_tokens": 5000
}
},
"slas": { "15m": 800 }
}
],
"models": [
{
"model": "zai-org/GLM-5",
"total": 3500,
"tokens": 350000,
"input_tokens": 210000,
"output_tokens": 105000,
"cached_tokens": 35000,
"slas": { "15m": 3500 },
"percentage": 0.65
}
]
}
granularity is "day" for 7d/30d/period and "hour" for 24h or day drill-down.
GET /v1/usage/activity
Returns operational metrics: request counts, token throughput, latency, and recent requests.
| Parameter | Values | Default | Description |
|---|
limit | 1–100 | 10 | Number of recent requests to return. |
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
https://usage.sailresearch.com/v1/usage/activity | jq
{
"object": "usage.activity",
"available": true,
"requests": {
"last_1m": 5,
"last_1h": 120,
"last_24h": 2000,
"last_7d": 14000
},
"tokens": {
"last_1m": 500,
"last_1h": 12000,
"last_24h": 200000,
"last_7d": 1400000,
"token_breakdown_1h": {
"input": 8000,
"output": 3500,
"cached": 500
}
},
"latency": {
"avg_1m_ms": 1200,
"avg_1h_ms": 1500
},
"recent_requests": [
{
"response_id": "resp_abc123",
"model": "zai-org/GLM-5",
"sla": "15m",
"status": "completed",
"created_at": "2025-01-15T10:00:00Z",
"updated_at": "2025-01-15T10:01:00Z"
}
],
"has_more": false
}
If has_more is true, there are additional recent requests beyond the requested limit.
Activity data may have a slight delay and is not a real-time stream.
GET /v1/usage/activity/timeseries
Returns bucketed time-series data for requests or tokens. Use this for throughput charts.
| Parameter | Values | Default | Description |
|---|
type | requests, tokens | requests | What to chart. |
range | 1h, 6h, 24h | 24h | Time window. Bucket size: 1min / 5min / 1hr respectively. |
Requests by model:
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
"https://usage.sailresearch.com/v1/usage/activity/timeseries?type=requests&range=1h" | jq
{
"object": "usage.activity.timeseries",
"type": "requests",
"range": "1h",
"available": true,
"series": [
{
"time_bucket": "2025-01-15T10:00:00Z",
"model": "zai-org/GLM-5",
"count": 5
},
{
"time_bucket": "2025-01-15T10:01:00Z",
"model": "zai-org/GLM-5",
"count": 3
}
]
}
Token breakdown:
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
"https://usage.sailresearch.com/v1/usage/activity/timeseries?type=tokens&range=24h" | jq
{
"object": "usage.activity.timeseries",
"type": "tokens",
"range": "24h",
"available": true,
"series": [
{
"time_bucket": "2025-01-15T10:00:00Z",
"total_tokens": 1000,
"input_tokens": 600,
"output_tokens": 350,
"cached_tokens": 50
}
]
}
Errors
All errors follow the same format used by the inference API:
{
"error": {
"message": "Missing or invalid Authorization header",
"type": "authentication_error",
"param": null,
"code": null
}
}
| HTTP Status | type | When |
|---|
| 400 | invalid_request_error | Bad query parameter, missing required param |
| 401 | authentication_error | Missing, invalid, or expired API key |
| 402 | billing_error | Key disabled due to insufficient credits |
| 429 | rate_limit_error | Rate limit exceeded (includes Retry-After header) |
| 500 | api_error | Internal server error |
All responses include X-Request-ID: <uuid>. Include this in support requests for faster debugging.
Data freshness
Usage data may have a slight delay and is not real-time.