GammaInfra API documentation

Last updated: 20 May 2026

On this page

Overview

GammaInfra is an intelligent LLM routing engine. The router classifies every prompt by task and dispatches to the best-fit model across every major LLM provider — delivered through one API.

If you already use OpenAI or any OpenAI-compatible SDK, switching to GammaInfra is a one-line change — update the base_url, keep everything else the same.

Base URL: https://gammainfra.com
Dashboard: dashboard.gammainfra.com · Status: status.gammainfra.com · Sign up: gammainfra.com/#signup

Quickstart

1. Get an API key

Sign up at gammainfra.com — email + password, one-time verification link, no credit card. New accounts come with $3.00 of free credit, enough to try the router end-to-end. Your API key is shown once after you click the verification link — store it somewhere safe. Need more keys later? Issue and revoke them from the dashboard.

2. Make your first call

curl -s -X POST https://api.gammainfra.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-gammainfra-..." \
  -d '{
    "model": "gammainfra/auto",
    "messages": [{"role": "user", "content": "Explain transformers in one paragraph."}]
  }'

The response is identical to OpenAI’s format. gammainfra/auto lets the router pick the best model for your prompt.

3. Drop-in replacement

from openai import OpenAI

client = OpenAI(
    api_key="sk-gammainfra-...",
    base_url="https://api.gammainfra.com/v1",
)

response = client.chat.completions.create(
    model="gammainfra/auto",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

The same pattern works with LangChain, LlamaIndex, and any other OpenAI-compatible library.

Ecosystem compatibility

If your code targets an OpenAI-compatible router, the migration is usually two strings — base_url and api_key. Both /v1/* and /api/v1/* prefixes are mounted with identical responses, so SDKs that hard-code either path keep working unchanged.

from openai import OpenAI
client = OpenAI(
    api_key="sk-gammainfra-...",
    base_url="https://gammainfra.com/api/v1",
)

Body fields

GammaInfra accepts the common ecosystem-extension request fields. The ones below change behavior; everything else is accepted silently for forward compatibility.

FieldBehavior
models: [str, ...]Honored. Becomes the authoritative fallback chain — tried in order, no auto-router. Fails loud (503) on exhaustion rather than silently picking another model.
provider.sortHonored. "price" → cost-optimized routing, "throughput"/"latency" → latency-optimized.
provider.only / .ignoreHonored. Filter the candidate provider set.
provider.orderHonored. Listed providers tried first; rest keep their relative order.
provider.allow_fallbacks: falseHonored. Returns 503 on the first provider failure instead of trying the next candidate.
provider.max_priceHonored. {prompt, completion} in USD per 1M tokens. Endpoints exceeding either cap are skipped.
reasoning: {effort, ...}Honored. effort translates to reasoning_effort for the GPT-5 family. Other providers drop it harmlessly.
stream_options.include_usageHonored on OpenAI; forwarded best-effort elsewhere.
tool_choice, response_format, parallel_tool_calls, seed, top_k, min_p, top_a, repetition_penalty, logprobs, top_logprobs, userForwarded to providers that support them.
transforms, routeAccepted silently (we always cascade through the fallback chain on failure).
plugins: [{id: "web"}]501 web_plugin_unsupported. The :online model variant is also rejected (400).

Model name aliases

GammaInfra uses the conventional vendor/slug format and accepts the common ecosystem aliases directly:

InputBehavior
Third-party router vendor/auto aliasesMapped to gammainfra/auto for migration compatibility.
vendor/model:nitroSuffix stripped, preference forced to latency.
vendor/model:floorSuffix stripped, preference forced to cost.
vendor/model:exactoSuffix stripped (GammaInfra quality-sorts by default).
vendor/model:online400 web_plugin_unsupported.
vendor/model:free400 free_tier_unavailable. New accounts get $3.00 of free balance on signup — see Balance & pricing.
meta-llama/llama-3.3-70b-instructRouted via Groq (groq/llama-3.3-70b-versatile).
meta-llama/llama-3.1-8b-instructRouted via Groq (groq/llama-3.1-8b-instant).

Unknown :suffix variants are stripped silently for forward-compat. For the full catalogue of native model IDs, see Model names below.

Compatibility endpoints

The auxiliary endpoints SDKs commonly call are mounted under both prefixes and return ecosystem-compatible JSON.

EndpointReturns
GET /api/v1/credits{data: {total_credits, total_usage}} — lifetime top-ups and lifetime spend in USD.
GET /api/v1/generation?id=<request_id>Post-hoc stats for a previous request: tokens, cost, latency. Use the X-GammaInfra-Request-Id response header value as the id. Customer-scoped (no cross-account lookups).
GET /api/v1/keyInfo about the calling key — label, name, lifetime usage. Per-key spend limits return null (not yet supported).
GET /api/v1/models/{author}/{slug}/endpointsEndpoint listing for a single model. GammaInfra routes each model through one provider, so the array has one entry.
POST /api/v1/completionsLegacy text completion. Internally wraps your prompt as a chat message and rewrites the response to the text_completion shape (choices[*].text instead of choices[*].message.content). Streaming supported.
GET /api/v1/modelsCatalogue with both GammaInfra-native fields (input_cost_per_1k, etc.) and ecosystem-shaped fields (pricing.{prompt,completion} per-token strings, context_length, architecture, supported_parameters, top_provider).

Headers

Send HTTP-Referer and X-Title on every request — GammaInfra stores them with each request log so per-app analytics work consistently. Both are best-effort and entirely optional.

What's not supported

Authentication

Every authenticated request carries a Bearer token. The public endpoints are /v1/models, /v1/status, /health, /ready, and the signup/login routes; everything else needs a valid key.

Authorization: Bearer sk-gammainfra-...

Keys are prefixed sk-gammainfra-. The plaintext is only returned on creation — GammaInfra stores a bcrypt hash. Create additional keys or revoke old ones from the dashboard.

StatusMeaning
401Missing or invalid API key
402Insufficient credits — top up and retry

Smart routing

Send model: "gammainfra/auto" and the router classifies your prompt into one of 8 task labels (plus 2 deterministic capability flags), then dispatches to the best-fit model for that type. If the primary model is unavailable, GammaInfra falls back through a chain of 3–4 models automatically.

Capability flags are decided up-front from the request body, never from prompt text:

FlagWhen it fires
tool_useRequest body has a non-empty tools array
multimodalAny message contains an image_url part

For everything else the prompt text is classified into one of these 8 labels:

Task labelWhen it fires
reasoningMulti-step analysis, math, root-cause questions (e.g. analyse, compare, evaluate, prove, probability, root cause, step-by-step)
codeCode generation, debugging, refactoring (e.g. function, implement, debug, regex, unit test)
creativeOriginal generative writing (e.g. poem, story, essay, brainstorm, lyrics)
rewriteEdit or transform existing text while preserving meaning
extractionPull structured fields out of text (e.g. sentence-start extract, parse, list all, classify, format as json)
summarizeCompress text (e.g. sentence-start summarize, tldr, key points, brief, condense)
translationCross-language conversion (e.g. sentence-start translate, in/to spanish/french/japanese/…)
chatDefault when nothing else matches
Prefer a trade-off? Send X-GammaInfra-Preference: quality (default), cost, or latency to bias the router.
Want finer control? Send a continuous X-GammaInfra-Cost-Quality: 0.0 (pure quality) … 1.0 (pure cost) header and GammaInfra will place you on that axis. The server echoes X-GammaInfra-Cost-Quality-Applied on the response so you can log exactly what landed. An explicit X-GammaInfra-Preference: latency always wins over the cost-quality dial.
Want to opt out? Send X-GammaInfra-Routing: off and GammaInfra will route straight to the exact model you named in model.

Model names

Smart aliases (recommended)

Model nameBehaviour
gammainfra/autoPicks the best-fit model for your prompt type
gammainfra/fastOptimises for lowest latency (equivalent to X-GammaInfra-Preference: latency)
gammainfra/cheapOptimises for lowest cost (equivalent to X-GammaInfra-Preference: cost)

Bare model names (logical)

Type a bare model name and GammaInfra's router picks the best endpoint that serves it. Useful when the same model is hosted by more than one provider (e.g. Claude Opus is reachable via the native Anthropic API and Amazon Bedrock).

claude-opus-4-7
claude-opus-4-6
claude-sonnet-4-6
claude-haiku-4-5
nova-pro
nova-2-lite
gpt-5-mini
gpt-5.4-mini
deepseek-v4-pro
mistral-large-2512
llama-3.3-70b-versatile
gemini-3.1-pro-preview
grok-4-1-fast-non-reasoning

Bare names that aren't in the registry return 404 model_not_found (no silent fallback). Use X-GammaInfra-Routing: literal to disable cross-host routing and pin the first registered endpoint instead.

Pin a specific model

Prefix any model with its provider slug:

openai/gpt-5.4
openai/gpt-5.4-mini
openai/gpt-5.4-nano
openai/gpt-5-mini
anthropic/claude-opus-4-6
anthropic/claude-sonnet-4-6
anthropic/claude-haiku-4-5
google/gemini-3.1-pro-preview
google/gemini-3-flash-preview
google/gemini-2.5-pro
google/gemini-2.5-flash
mistral/mistral-large-2512
mistral/mistral-small-2603
mistral/codestral-2508
mistral/devstral-2512
groq/llama-3.3-70b-versatile
groq/llama-3.1-8b-instant
groq/qwen/qwen3-32b
deepseek/deepseek-v4-pro
deepseek/deepseek-v4-flash
# Legacy V3 slugs — still routable via direct pin, retire 2026-07-24:
# deepseek/deepseek-chat, deepseek/deepseek-reasoner
grok/grok-4.20-0309-reasoning
grok/grok-4-1-fast-reasoning
grok/grok-4-1-fast-non-reasoning
bedrock/us.anthropic.claude-opus-4-7
bedrock/us.anthropic.claude-opus-4-6-v1
bedrock/us.anthropic.claude-sonnet-4-6
bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-large-2402-v1:0
bedrock/us.amazon.nova-pro-v1:0
bedrock/us.amazon.nova-2-lite-v1:0

Note on Bedrock IDs: Most Bedrock models require the us. cross-region inference profile prefix (Anthropic Claude, Amazon Nova). A few older models (Meta Llama 3, Mistral Large 24.02) use the bare ID without the prefix. The exact strings above are what AWS Bedrock accepts; copy verbatim. Bedrock's catalog of newer Meta and Mistral models lags behind these providers' direct APIs — for the latest Llama and Mistral, use groq/llama-3.3-70b-versatile or mistral/mistral-large-2512 respectively.

For the full, authoritative list:

curl -s https://api.gammainfra.com/v1/models | jq .

Streaming

Streaming works exactly like OpenAI — set stream: true and read Server-Sent Events. All providers are normalised to the OpenAI SSE format, so your existing code works unchanged.

stream = client.chat.completions.create(
    model="gammainfra/auto",
    messages=[{"role": "user", "content": "Write a haiku about distributed systems."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Headers

Request headers

HeaderValuePurpose
AuthorizationBearer sk-gammainfra-…Required
Content-Typeapplication/jsonRequired
X-GammaInfra-RoutingoffDisable smart routing; use the exact model you named
X-GammaInfra-RoutingliteralDisable logical resolution. Bare names take the first registered endpoint instead of the router's preference-based pick. Independent of off.
X-GammaInfra-Regionus / eu / apac or exact AWS region (e.g. us-east-1)Constrain endpoint selection to a region group or exact region. Native APIs (region-agnostic) always pass. Combine with provider.only: ["bedrock"] for strict-residency mode.
X-GammaInfra-Preferencequality (default) / cost / latencyBias the router when using gammainfra/auto
X-GammaInfra-Cost-QualityDecimal in 0.01.0Continuous cost/quality dial. 0.0 = pure quality, 1.0 = pure cost. Overrides X-GammaInfra-Preference: quality/cost. An explicit latency preference still wins. Malformed values are ignored and the legacy preset applies.
X-GammaInfra-Max-Latency-MsInteger ms in 60600000 (10 minutes)Bound total wall time across the fallback chain. Set to your hard deadline (e.g. 5000 for a 5-second SLA). On exceedance GammaInfra cancels any in-flight upstream call and returns 504 max_latency_exceeded. Strictly opt-in — absent header preserves prior behavior (per-provider 30 s default). Malformed or out-of-range values are ignored.

Response headers

HeaderMeaning
X-GammaInfra-Request-IdCorrelation ID — include when filing a support request
X-GammaInfra-ProviderWhich provider served the response (e.g. openai, anthropic)
X-GammaInfra-Router-VersionWhich routing path served the request. Values: v2 (default smart router), v2_keyword (sentence-start keyword shortcut), v2_flag (capability short-circuit for multimodal or tools), v2_short_prompt (length-based fast-path for trivial prompts), v2_hedged (parallel top-2 race for gammainfra/fast), v2_logical (cross-host logical-name routing), v1_fallback (low-confidence fall-through to the keyword router), direct (you pinned a specific provider/model), logical_literal (you opted into X-GammaInfra-Routing: literal), models_override (you supplied a models[] fallback list).
X-GammaInfra-Logical-ModelThe router's label for the prompt (e.g. reasoning, code, chat) or the logical model name (e.g. claude-opus-4-7) when bare-name / vendor-prefix routing fired. Use it to correlate your cost analytics with the type of work.
X-GammaInfra-EndpointThe actual physical endpoint that served the request, formatted as provider/model (e.g. bedrock/us.anthropic.claude-opus-4-7). Always present on successful chat completions.
X-GammaInfra-Region-UsedThe region the request was served from (e.g. us-east-1). Present on routes that went through a regional endpoint; absent for native APIs which are region-agnostic.
X-GammaInfra-FlagsCapability flags fired up-front, comma-separated (e.g. tool_use, multimodal). Absent when no flag fired.
X-GammaInfra-Cost-USDPer-request cost in USD with 6 decimals (e.g. 0.000087). Present on every successful chat completion. Sum it across calls to get exact spend without parsing usage × per-model price tables. Reflects provider list price — GammaInfra adds 0% token markup.
X-GammaInfra-Input-Cost-USD / X-GammaInfra-Output-Cost-USDPer-direction cost split for the same request, both in USD with 6 decimals. input + output = X-GammaInfra-Cost-USD within rounding. Useful for chargeback / per-team attribution against provider-side dashboards (Bedrock CloudWatch, OpenAI usage page) without a client-side recompute.
X-GammaInfra-Cost-Quality-AppliedPresent whenever the cost/quality dial drove the routing decision. Value is the parsed float (e.g. 0.800) so you can log and replay the decision.
X-GammaInfra-Fallback-ChainComma-separated provider/model list actually attempted on this request, in order. A single entry means one leg was tried; multiple entries mean GammaInfra cascaded after a failure. Useful for post-mortems.
X-GammaInfra-Attempted-CountInteger count of legs attempted — matches len(X-GammaInfra-Fallback-Chain.split(",")). Use when you need to detect whether a fallback occurred without parsing the chain header.
X-GammaInfra-Fallback-ReasonWhy the chain walked past the first pick (e.g. provider_error, low_confidence, flag_chain, short_prompt_chat, v2_keyword, models_override).
X-GammaInfra-Rust-VersionVersion of the gateway's native fast-path components (e.g. 0.1.0). Useful for support correlation; treat as opaque.
X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-ResetStandard sliding-window rate-limit signals, keyed per API key. Default limit is 240 requests/minute.
X-GammaInfra-Cache-ModeWhich cache mode was applied to this request. Values: auto (gateway injected breakpoints on the system+tools prefix), aggressive (also cached conversation history turns), manual (you supplied your own cache_control markers; gateway did not add more), off (caching disabled for this request). Always present on successful chat completions.
X-GammaInfra-Cache-Read-TokensNumber of tokens served from cache on this request. Only emitted when non-zero. Combined with X-GammaInfra-Cache-Write-Tokens and X-GammaInfra-Cost-USD, lets you verify caching is working and compute your actual effective rate without parsing provider-specific usage fields.
X-GammaInfra-Cache-Write-TokensNumber of tokens written to cache on this request (cache-priming cost). Only emitted when non-zero. Cache writes carry a small premium over standard input pricing for Anthropic and Bedrock; your X-GammaInfra-Cost-USD header reflects the all-in cost including that premium.

Preference precedence

You can express routing preference through five different channels — header, body, model suffix, or model shortcut. When more than one is set, the most-specific wins:

  1. X-GammaInfra-Preference: latency — always wins (latency is orthogonal to cost/quality).
  2. Body provider.sort (ecosystem-compat) — price → cost, throughput/latency → latency.
  3. X-GammaInfra-Cost-Quality header — continuous 0.01.0 dial.
  4. X-GammaInfra-Preference: quality | cost header — legacy preset.
  5. Model-slug variant (:nitro → latency, :floor → cost) or the gammainfra/fast / gammainfra/cheap shortcuts.
  6. Default: quality.

Balance & pricing

Approximate cost per 1M tokens

ModelInputOutput
gammainfra/auto (routed to the best model for each prompt)varies per task type — see rows belowvaries per task type — see rows below
openai/gpt-5.5$5.00$30.00
openai/gpt-5.4$2.00$8.00
openai/gpt-5.4-mini$0.40$1.60
openai/gpt-5-mini$0.25$2.00
anthropic/claude-opus-4-7$5.00$25.00
anthropic/claude-sonnet-4-6$3.00$15.00
google/gemini-3.1-pro-preview$1.25$5.00
google/gemini-3-flash-preview$0.30$2.50
deepseek/deepseek-v4-pro$1.74$3.48
deepseek/deepseek-v4-flash$0.14$0.28
groq/llama-3.1-8b-instant$0.06$0.08

Costs above are provider list prices — GammaInfra passes them straight through. The only GammaInfra fee is the 5% we charge when you top up credits (3% during the launch window). For the full cost table and legal terms, see the Terms of Service.

Reasoning tokens on gpt-5 and DeepSeek V4

OpenAI’s gpt-5 family and DeepSeek’s V4 reasoner (deepseek-v4-pro) bill hidden “reasoning tokens” in addition to the visible output. Reasoning tokens are the model’s chain-of-thought and are not returned in the response but are counted in usage.completion_tokens.

GammaInfra silently caps gpt-5 reasoning at max_completion_tokens=2048 when the caller omits the parameter, and picks a conservative reasoning_effort based on the router’s logical label (chat → low, code/summarize → medium, reasoning/math → high). This prevents the “‘hi’ burned 320 reasoning tokens” pathology from reaching your bill, but you should still budget 2–4× visible output tokens for gpt-5-family calls in batch sizing. Inspect usage.completion_tokens_details.reasoning_tokens in any response to see the split.

Prompt caching and your bill

When a repeated prefix is served from cache, the provider charges a reduced rate for those tokens. GammaInfra passes the discount straight through — you are billed at the provider’s actual cache-read rate, never the full input rate. Your X-GammaInfra-Cost-USD header reflects the all-in cost including any cache-read discounts and cache-write premiums on the same call. For per-direction detail, see X-GammaInfra-Cache-Read-Tokens and X-GammaInfra-Cache-Write-Tokens in the response headers above.

Cache writes carry a small premium over standard input pricing for Anthropic and Bedrock (the provider charges extra to prime the cache). On first use the total cost is slightly higher; on repeated calls the cache-read savings more than offset the initial write. Set X-GammaInfra-Cache: off if you are sending one-off requests and don’t want the write overhead.

Check your balance

curl -s https://api.gammainfra.com/v1/billing/balance \
  -H "Authorization: Bearer sk-gammainfra-..."
{"balance_usd": 0.97, "customer_id": "..."}

Top up

Top up your balance from dashboard.gammainfra.comTop up. You’ll be redirected to Stripe’s hosted checkout and back to your dashboard once payment clears. Card data is handled by Stripe — GammaInfra never sees it. Amount range: $5 – $1000; your balance updates within seconds of Stripe’s confirmation.

Bring your own key (BYOK)

Optional. By default GammaInfra uses its own provider API keys on your behalf — one GammaInfra key, every model. If you already have a direct relationship with a provider, add your own key at dashboard.gammainfra.comProvider Keys and GammaInfra will route requests to that provider through your key instead.

Add a key

curl -s -X POST https://api.gammainfra.com/v1/provider-keys \
  -H "Authorization: Bearer sk-gammainfra-..." \
  -H "Content-Type: application/json" \
  -d '{"provider_name": "openai", "api_key": "sk-..."}'

List your keys

curl -s https://api.gammainfra.com/v1/provider-keys \
  -H "Authorization: Bearer sk-gammainfra-..."

Delete a key

curl -s -X DELETE https://api.gammainfra.com/v1/provider-keys/openai \
  -H "Authorization: Bearer sk-gammainfra-..."

Or manage all of this from dashboard.gammainfra.comProvider Keys.

BYOK pricing — separate prepaid balance

BYOK traffic uses its own prepaid balance, distinct from your managed credits. Top it up from the dashboard's BYOK Balance tab or via POST /v1/billing/byok/checkout (minimum $5, no top-up fee). Each BYOK-routed request deducts a small per-request fee:

curl -s -X POST https://api.gammainfra.com/v1/billing/byok/checkout \
  -H "Authorization: Bearer sk-gammainfra-..." \
  -H "Content-Type: application/json" \
  -d '{"amount_usd": 25.0}'

Check your BYOK balance:

curl -s https://api.gammainfra.com/v1/billing/byok/balance \
  -H "Authorization: Bearer sk-gammainfra-..."

Prompt caching

Repeating the same system prompt, tool definitions, or conversation prefix across multiple calls is the most common source of avoidable spend. GammaInfra automatically caches these prefixes on providers that support it, so cache-hit tokens on follow-up calls cost a fraction of full input tokens.

How it works

GammaInfra detects cacheable prefixes and, where supported, injects cache_control breakpoints before the request leaves the gateway:

Controlling cache behaviour

Send the X-GammaInfra-Cache request header to override the default:

ValueEffect
auto (default)Gateway injects breakpoints on the system prompt and tools prefix when the prefix meets the minimum token threshold.
aggressiveExtends caching to include recent conversation history turns, in addition to the system prompt and tools prefix. Useful for long multi-turn sessions where the conversation context is stable across many calls.
offDisables auto-injection for this request. Use for one-off requests where you do not want to pay the cache-write premium. Provider-side automatic caching (OpenAI, DeepSeek, Gemini) is not affected.
manual (implicit)If GammaInfra detects that you have already added cache_control markers to your messages, it backs off and preserves your markers unchanged. The response will show X-GammaInfra-Cache-Mode: manual.

Unknown values for X-GammaInfra-Cache are silently ignored and fall back to the default — the header never returns a 400 error.

Verifying cache hits

Check the response headers on any call:

X-GammaInfra-Cache-Mode: auto
X-GammaInfra-Cache-Read-Tokens: 4096
X-GammaInfra-Cache-Write-Tokens: 512
X-GammaInfra-Cost-USD: 0.000041

X-GammaInfra-Cache-Read-Tokens is the number of input tokens served from cache on this call. X-GammaInfra-Cache-Write-Tokens is the number written to cache (priming cost). Both are absent when zero. X-GammaInfra-Cost-USD is always the all-in cost, inclusive of any cache-read discounts and write premiums.

When caching saves money

Caching wins when the same prefix is reused across at least two calls. On the first call GammaInfra writes the prefix to cache (small premium); on subsequent calls those tokens are served at the provider’s cache-read rate (significant discount). The break-even point is reached quickly for system prompts longer than ~1k tokens that are reused across many requests.

For truly one-off requests where the prefix will never repeat, set X-GammaInfra-Cache: off to skip the write cost.

Error codes

Error responses use a consistent JSON shape:

{
  "error": {
    "message": "Human-readable description",
    "type": "error_type",
    "code": "machine_readable_code",
    "request_id": "uuid"
  }
}
StatusCodeMeaning
400web_plugin_unsupportedRequest used the :online model suffix. GammaInfra doesn't ship web search yet — drop the suffix.
400free_tier_unavailableRequest used the :free model suffix. GammaInfra has no free tier; new accounts get $3.00 of free balance on signup.
400provider_excludedYou pinned a specific provider/model while your provider.only/provider.ignore filter excluded that provider. Drop the pin or adjust the filter.
401invalid_api_keyMissing or invalid API key.
402insufficient_creditsManaged balance can’t cover the request. Top up from the dashboard.
402byok_balance_emptyBYOK prepaid balance exhausted — top up to resume.
404model_not_foundUnknown provider/model. Check GET /v1/models for the live catalogue.
404generation_not_foundGET /v1/generation?id= couldn't find a record for that request_id, or it belongs to a different customer.
422Invalid request body (pydantic validation).
429rate_limit_exceededYou hit the 240 req/min per-API-key cap. Respect Retry-After.
429Provider-side rate limit passed through — respect Retry-After.
501web_plugin_unsupportedRequest body carried plugins:[{id:"web"}]. Remove the plugin.
503providers_downAll providers in the fallback chain failed.
Got a 503? It means every model in the fallback chain for that task type errored at the same time — usually transient. Retry with exponential backoff. Include X-GammaInfra-Request-Id from the response headers if you file a support ticket.

Rate limits

Status

Live per-provider uptime, latency, and error counts are published:

Both endpoints are public (no auth). Each provider is marked operational, degraded, or outage based on the rolling 24 h request log plus a live health-check ping.

Support

Email support@gammainfra.com, or join our Discord and open a ticket in #support. Include the X-GammaInfra-Request-Id response header from any failing request — it lets us trace the exact path the request took through the router.

For policy and billing terms, see Terms and Privacy.

FAQ

Common developer questions about the API. For the conceptual overview see the FAQ on the landing page.

What is the GammaInfra API base URL?
https://api.gammainfra.com/v1. The legacy apex https://gammainfra.com/v1/* is preserved as a back-compat alias. Authentication is Authorization: Bearer sk-gammainfra-<your-key>. Both /v1/* and /api/v1/* prefixes are mounted with identical responses.
How do I see the cost of each request?
Every successful response carries three USD cost headers: X-GammaInfra-Cost-USD (total), X-GammaInfra-Input-Cost-USD, and X-GammaInfra-Output-Cost-USD. Sum the totals across a session to know exactly what your workload cost. The X-GammaInfra-Endpoint header tells you which provider/model served the request.
What happens when an upstream provider rate-limits a request?
The router cascades to the next endpoint in the task's fallback chain. The actual cascade is reported in the X-GammaInfra-Fallback-Chain response header. For strict-provider behavior, set provider.only in your request body or pass X-GammaInfra-Routing: literal to constrain the chain.
Can I pin a specific model instead of using smart routing?
Yes — use a host-prefixed model name like openai/gpt-5-mini, anthropic/claude-opus-4-7, or bedrock/us.anthropic.claude-sonnet-4-6. Bare logical names (e.g., claude-opus-4-7) resolve through the registry — the router picks native vs Bedrock based on live p50 latency.
How do I enforce a max latency budget per request?
Set X-GammaInfra-Max-Latency-Ms: <ms> on the request (range 60 to 600 000). On timeout, the upstream call is cancelled and a 504 max_latency_exceeded response is returned. Malformed values are silently dropped — the header never causes a 400 error.
How does BYOK pricing differ from managed?
BYOK uses your own provider API keys (configured in the dashboard) and bills 1% of retail cost_usd per request during the launch window (2% standard) against a separate prepaid BYOK balance. Managed uses GammaInfra's negotiated provider rates with 0% token markup plus a 3% (launch) / 5% (standard) fee at top-up time. When BYOK balance hits $0, requests return 402 byok_balance_empty — never silently fall back to the managed balance.