Skip to main content

GET /v1/usage

Returns a spending and balance summary for the requested time range.
ParameterValuesDefaultDescription
range24h, 7d, 30d, period30dTime window. period = current billing period.
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
  "https://usage.sailresearch.com/v1/usage?range=7d" | jq
{
  "object": "usage.summary",
  "range": "7d",
  "period_spend": 5432,
  "burn_rate": 776,
  "balance": 10000,
  "balance_unavailable": false,
  "days_remaining": 12,
  "plan_type": "self_serve",
  "model_count": 3,
  "tokens": {
    "total": 1000000,
    "input": 600000,
    "output": 350000,
    "cached": 50000
  },
  "avg_cost_per_day": 776,
  "sla_mix": {
    "priority": 0.7,
    "standard": 0.3
  },
  "prior_period": {
    "period_spend": 4000,
    "model_count": 2,
    "tokens": {
      "total": 800000,
      "input": 500000,
      "output": 280000,
      "cached": 20000
    },
    "avg_cost_per_day": 571
  }
}
All monetary values are in cents. prior_period covers the equivalent window immediately before the requested range, for comparison.
sla_mix keys are the completion window names: asap, priority, standard, flex. Historical entries from before the canonical rate-card promotion may also appear with the legacy keys 15min (now priority) or 24hr (now flex).

GET /v1/usage/breakdown

Returns time-series spend data and per-model ranking. Use this for charting.
ParameterValuesDefaultDescription
range24h, 7d, 30d, period, day30dTime window.
dateYYYY-MM-DDRequired when range=day. Drills into hourly data for that date.
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
  "https://usage.sailresearch.com/v1/usage/breakdown?range=7d" | jq
{
  "object": "usage.breakdown",
  "range": "7d",
  "granularity": "day",
  "data": [
    {
      "timestamp": "2025-01-15",
      "total": 800,
      "models": {
        "zai-org/GLM-5.1-FP8": {
          "total": 500,
          "tokens": 50000,
          "input_tokens": 30000,
          "output_tokens": 15000,
          "cached_tokens": 5000
        }
      },
      "slas": { "priority": 800 }
    }
  ],
  "models": [
    {
      "model": "zai-org/GLM-5.1-FP8",
      "total": 3500,
      "tokens": 350000,
      "input_tokens": 210000,
      "output_tokens": 105000,
      "cached_tokens": 35000,
      "slas": { "priority": 3500 },
      "percentage": 0.65
    }
  ]
}
granularity is "day" for 7d/30d/period and "hour" for 24h or day drill-down.

GET /v1/usage/activity

Returns operational metrics: request counts, token throughput, latency, and recent requests.
ParameterValuesDefaultDescription
limit110010Number of recent requests to return.
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
  https://usage.sailresearch.com/v1/usage/activity | jq
{
  "object": "usage.activity",
  "available": true,
  "requests": {
    "last_1m": 5,
    "last_1h": 120,
    "last_24h": 2000,
    "last_7d": 14000
  },
  "tokens": {
    "last_1m": 500,
    "last_1h": 12000,
    "last_24h": 200000,
    "last_7d": 1400000,
    "token_breakdown_1h": {
      "input": 8000,
      "output": 3500,
      "cached": 500
    }
  },
  "latency": {
    "avg_1m_ms": 1200,
    "avg_1h_ms": 1500
  },
  "recent_requests": [
    {
      "response_id": "resp_abc123",
      "model": "zai-org/GLM-5.1-FP8",
      "sla": "priority",
      "status": "completed",
      "created_at": "2025-01-15T10:00:00Z",
      "updated_at": "2025-01-15T10:01:00Z"
    }
  ],
  "has_more": false
}
If has_more is true, there are additional recent requests beyond the requested limit.
Activity data may have a slight delay and is not a real-time stream.

GET /v1/usage/activity/timeseries

Returns bucketed time-series data for requests or tokens. Use this for throughput charts.
ParameterValuesDefaultDescription
typerequests, tokensrequestsWhat to chart.
range1h, 6h, 24h24hTime window. Bucket size: 1min / 5min / 1 hour respectively.
Requests by model:
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
  "https://usage.sailresearch.com/v1/usage/activity/timeseries?type=requests&range=1h" | jq
{
  "object": "usage.activity.timeseries",
  "type": "requests",
  "range": "1h",
  "available": true,
  "series": [
    {
      "time_bucket": "2025-01-15T10:00:00Z",
      "model": "zai-org/GLM-5.1-FP8",
      "count": 5
    },
    {
      "time_bucket": "2025-01-15T10:01:00Z",
      "model": "zai-org/GLM-5.1-FP8",
      "count": 3
    }
  ]
}
Token breakdown:
curl -s -H "Authorization: Bearer $SAIL_API_KEY" \
  "https://usage.sailresearch.com/v1/usage/activity/timeseries?type=tokens&range=24h" | jq
{
  "object": "usage.activity.timeseries",
  "type": "tokens",
  "range": "24h",
  "available": true,
  "series": [
    {
      "time_bucket": "2025-01-15T10:00:00Z",
      "total_tokens": 1000,
      "input_tokens": 600,
      "output_tokens": 350,
      "cached_tokens": 50
    }
  ]
}

Errors

All errors follow the same format used by the inference API:
{
  "error": {
    "message": "Missing or invalid Authorization header",
    "type": "authentication_error",
    "param": null,
    "code": null
  }
}
HTTP StatustypeWhen
400invalid_request_errorBad query parameter, missing required param
400idempotency_errorIdempotency key reused with a different request body
401authentication_errorMissing, invalid, or expired API key
402billing_errorKey disabled due to insufficient credits
429rate_limit_errorRate limit exceeded (includes Retry-After header)
500api_errorInternal server error

Response headers

All responses include X-Request-ID: <uuid>. Include this in support requests for faster debugging.