Kraken API documentation

Last updated: 16 April 2026

On this page

Overview
Quickstart
Authentication
Smart routing
Model names
Streaming
Headers
Credits & pricing
Top up
BYOK
Error codes
Rate limits
Status
Support

Overview

Kraken is a managed LLM API gateway. One OpenAI-compatible endpoint routes your requests to every major LLM provider with a task-aware router that picks the best-fit model for every prompt.

If you already use OpenAI or any OpenAI-compatible SDK, switching to Kraken is a one-line change — update the base_url, keep everything else the same.

Base URL: https://gammainfra.com
Dashboard: dashboard.gammainfra.com · Status: status.gammainfra.com · Sign up: gammainfra.com/#signup

Quickstart

1. Get an API key

Sign up at gammainfra.com — email + password, one-time verification link, no credit card. New accounts come with $3.00 of free credit, enough to try the router end-to-end. Your API key is shown once after you click the verification link — store it somewhere safe. Need more keys later? Issue and revoke them from the dashboard.

2. Make your first call

curl -s -X POST https://gammainfra.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-kraken-..." \
  -d '{
    "model": "kraken/auto",
    "messages": [{"role": "user", "content": "Explain transformers in one paragraph."}]
  }'

The response is identical to OpenAI’s format. kraken/auto lets the router pick the best model for your prompt.

3. Drop-in replacement

from openai import OpenAI

client = OpenAI(
    api_key="sk-kraken-...",
    base_url="https://gammainfra.com/v1",
)

response = client.chat.completions.create(
    model="kraken/auto",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

The same pattern works with LangChain, LlamaIndex, and any other OpenAI-compatible library.

Authentication

Every request (except /v1/models and /v1/status) requires a Bearer token:

Authorization: Bearer sk-kraken-...

Keys are prefixed sk-kraken-. The plaintext is only returned on creation — Kraken stores a bcrypt hash. Create additional keys or revoke old ones from the dashboard.

Status	Meaning
`401`	Missing or invalid API key
`402`	Insufficient credits — top up and retry

Smart routing

Send model: "kraken/auto" and the router classifies your prompt into one of 10 task types, then dispatches to the best-fit model for that type. If the primary model is unavailable, Kraken falls back through a chain of 3–4 models automatically.

Task type	When it fires
`tool_use`	Request includes `tools` or function calls
`multimodal`	A message contains an image
`code_gen`	Keywords: function, code, implement, debug, refactor, regex, unit test
`math`	Keywords: solve, calculate, equation, integral, prove, probability
`reasoning`	Keywords: explain why, analyse, compare, evaluate, strategy, root cause
`creative`	Keywords: poem, story, essay, brainstorm, lyrics, rewrite
`translation`	Keywords: translate, localize, in/to spanish/french/japanese/…
`extraction`	Keywords: extract, parse, classify, format as json, list all, sentiment
`summarisation`	Keywords: summarise, tldr, key points, brief, condense
`chat`	Default when nothing else matches

Prefer a trade-off? Send X-Kraken-Preference: quality (default), cost, or latency to bias the router.
Want finer control? Send a continuous X-Kraken-Cost-Quality: 0.0 (pure quality) … 1.0 (pure cost) header and Kraken will place you on that axis. The server echoes X-Kraken-Cost-Quality-Applied on the response so you can log exactly what landed. An explicit X-Kraken-Preference: latency always wins over the cost-quality dial.
Want to opt out? Send X-Kraken-Routing: off and Kraken will route straight to the exact model you named in model.

Model names

Smart aliases (recommended)

Model name	Behaviour
`kraken/auto`	Picks the best-fit model for your prompt type
`kraken/fast`	Optimises for lowest latency (equivalent to `X-Kraken-Preference: latency`)
`kraken/cheap`	Optimises for lowest cost (equivalent to `X-Kraken-Preference: cost`)

Pin a specific model

Prefix any model with its provider slug:

openai/gpt-5.4
openai/gpt-5.4-mini
openai/gpt-5.4-nano
openai/gpt-5-mini
anthropic/claude-opus-4-6
anthropic/claude-sonnet-4-6
anthropic/claude-haiku-4-5
google/gemini-3.1-pro-preview
google/gemini-3-flash-preview
google/gemini-2.5-pro
google/gemini-2.5-flash
mistral/mistral-large-2512
mistral/mistral-small-2603
mistral/codestral-2508
mistral/devstral-2512
groq/llama-3.3-70b-versatile
groq/llama-3.1-8b-instant
groq/gpt-oss-120b
deepseek/deepseek-chat
deepseek/deepseek-reasoner
grok/grok-4
grok/grok-4-fast
grok/grok-code-fast-1

For the full, authoritative list:

curl -s https://gammainfra.com/v1/models | jq .

Streaming

Streaming works exactly like OpenAI — set stream: true and read Server-Sent Events. All providers are normalised to the OpenAI SSE format, so your existing code works unchanged.

stream = client.chat.completions.create(
    model="kraken/auto",
    messages=[{"role": "user", "content": "Write a haiku about distributed systems."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Headers

Request headers

Header	Value	Purpose
`Authorization`	`Bearer sk-kraken-…`	Required
`Content-Type`	`application/json`	Required
`X-Kraken-Routing`	`off`	Disable smart routing; use the exact model you named
`X-Kraken-Preference`	`quality` (default) / `cost` / `latency`	Bias the router when using `kraken/auto`
`X-Kraken-Cost-Quality`	Decimal in `0.0`…`1.0`	Continuous cost/quality dial. `0.0` = pure quality, `1.0` = pure cost. Overrides `X-Kraken-Preference: quality`/`cost`. An explicit `latency` preference still wins. Malformed values are ignored and the legacy preset applies.

Response headers

Header	Meaning
`X-Kraken-Request-Id`	Correlation ID — include when filing a support request
`X-Kraken-Provider`	Which provider served the response (e.g. `openai`, `anthropic`)
`X-Kraken-Router-Version`	`v1` today; `v2` once the ML router goes live
`X-Kraken-Logical-Model`	Router v2 only — the logical bucket the router picked
`X-Kraken-Cost-Quality-Applied`	Present only when an `X-Kraken-Cost-Quality` request header drove the routing decision. Value is the parsed float (e.g. `0.800`) so you can log and replay the decision.
`X-Kraken-Fallback-Chain`	Comma-separated `provider/model` list actually attempted on this request. Useful for post-mortems.
`X-Kraken-Fallback-Reason`	Why the chain walked past the first pick (e.g. `provider_error`, `low_confidence`).

Credits & pricing

Credits are USD-denominated. $1.00 = 100 credits.
New accounts start with $3.00 of free credit.
0% markup on tokens — you pay provider list prices.
5% fee on credit top-ups (3% launch-window rate during the first 60 days).
Minimum top-up: $10. No subscription. Credits never expire.

Approximate cost per 1M tokens

Model	Input	Output
`kraken/auto` (chat default)	~$0.10	~$0.60
`openai/gpt-5.4`	$2.00	$8.00
`openai/gpt-5.4-mini`	$0.40	$1.60
`anthropic/claude-opus-4-6`	$5.00	$25.00
`anthropic/claude-sonnet-4-6`	$3.00	$15.00
`google/gemini-3.1-pro-preview`	$1.25	$5.00
`google/gemini-3-flash-preview`	$0.30	$2.50
`deepseek/deepseek-chat`	$0.28	$0.42
`groq/llama-3.1-8b-instant`	$0.06	$0.08

Costs above are provider list prices — Kraken passes them straight through. The only Kraken fee is the 5% we charge when you top up credits (3% during the launch window). For the full cost table and legal terms, see the Terms of Service.

Reasoning tokens on gpt-5 and DeepSeek-Reasoner

OpenAI’s gpt-5 family and DeepSeek’s reasoner model bill hidden “reasoning tokens” in addition to the visible output. Reasoning tokens are the model’s chain-of-thought and are not returned in the response but are counted in usage.completion_tokens.

Kraken silently caps gpt-5 reasoning at max_completion_tokens=2048 when the caller omits the parameter, and picks a conservative reasoning_effort based on the router’s logical label (chat → low, code/summarize → medium, reasoning/math → high). This prevents the “‘hi’ burned 320 reasoning tokens” pathology from reaching your bill, but you should still budget 2–4× visible output tokens for gpt-5-family calls in batch sizing. Inspect usage.completion_tokens_details.reasoning_tokens in any response to see the split.

Check your balance

curl -s https://gammainfra.com/v1/billing/balance \
  -H "Authorization: Bearer sk-kraken-..."

{"balance_usd": 0.97, "customer_id": "..."}

Top up

Top up your balance from dashboard.gammainfra.com → Top up. You’ll be redirected to Stripe’s hosted checkout and back to your dashboard once payment clears. Card data is handled by Stripe — Kraken never sees it. Amount range: $5 – $1000; your balance updates within seconds of Stripe’s confirmation.

Bring your own key (BYOK)

Optional. By default Kraken uses its own provider API keys on your behalf — one Kraken key, every model. If you already have a direct relationship with a provider, add your own key at dashboard.gammainfra.com → Provider Keys and Kraken will route requests to that provider through your key instead.

Your key is stored encrypted at rest (Fernet).
You pay that provider directly for requests that use your key; the Kraken fee still applies.
Revoking or deleting your key falls back to managed routing (if the provider offers managed access) or skips that provider.
Supported providers: openai, anthropic, google, mistral, groq, deepseek, grok.

Add a key

curl -s -X POST https://gammainfra.com/v1/provider-keys \
  -H "Authorization: Bearer sk-kraken-..." \
  -H "Content-Type: application/json" \
  -d '{"provider_name": "openai", "api_key": "sk-..."}'

List your keys

curl -s https://gammainfra.com/v1/provider-keys \
  -H "Authorization: Bearer sk-kraken-..."

Delete a key

curl -s -X DELETE https://gammainfra.com/v1/provider-keys/openai \
  -H "Authorization: Bearer sk-kraken-..."

Or manage all of this from dashboard.gammainfra.com → Provider Keys.

BYOK pricing — separate prepaid balance

BYOK traffic uses its own prepaid balance, distinct from your managed credits. Top it up from the dashboard's BYOK Balance tab or via POST /v1/billing/byok/checkout (minimum $5, no top-up fee). Each BYOK-routed request deducts a small per-request fee:

2% of the retail provider cost (standard rate).
1% during the launch window (first 60 days).
When the BYOK balance hits $0, BYOK-routed requests return 402 byok_balance_empty — top up to resume. Managed credits are not touched.
You'll receive one low-balance email when your BYOK balance drops below 20% of your last top-up (7-day cooldown between reminders).

curl -s -X POST https://gammainfra.com/v1/billing/byok/checkout \
  -H "Authorization: Bearer sk-kraken-..." \
  -H "Content-Type: application/json" \
  -d '{"amount_usd": 25.0}'

Check your BYOK balance:

curl -s https://gammainfra.com/v1/billing/byok/balance \
  -H "Authorization: Bearer sk-kraken-..."

Error codes

Error responses use a consistent JSON shape:

{
  "error": {
    "message": "Human-readable description",
    "type": "error_type",
    "code": "machine_readable_code",
    "request_id": "uuid"
  }
}

Status	Code	Meaning
`401`	—	Missing or invalid API key
`402`	`insufficient_credits`	Managed credit balance can’t cover the request
`402`	`byok_balance_empty`	BYOK prepaid balance exhausted — top up at the dashboard to resume
`422`	—	Invalid request body
`429`	—	Provider-side rate limit passed through — respect `Retry-After`
`503`	`providers_down`	All providers in the fallback chain failed

Got a 503? It means every model in the fallback chain for that task type errored at the same time — usually transient. Retry with exponential backoff. Include X-Kraken-Request-Id from the response headers if you file a support ticket.

Rate limits

Chat completions: no hard Kraken-side limit today. Provider-side rate limits pass through as 429 responses with any Retry-After header the provider returned.
If you expect high-volume traffic, email hello@gammainfra.com ahead of time.

Status

Live per-provider uptime, latency, and error counts are published:

status.gammainfra.com — human-readable HTML dashboard, auto-refreshes every 30 s
GET /v1/status — same data as JSON, safe to poll from your own monitoring

Both endpoints are public (no auth). Each provider is marked operational, degraded, or outage based on the rolling 24 h request log plus a live health-check ping.

Support

Email hello@gammainfra.com. Include the X-Kraken-Request-Id response header from any failing request — it lets us trace the exact path the request took through the router.

For policy and billing terms, see Terms and Privacy.