Kraken API documentation

Last updated: 16 April 2026

On this page

Overview

Kraken is a managed LLM API gateway. One OpenAI-compatible endpoint routes your requests to every major LLM provider with a task-aware router that picks the best-fit model for every prompt.

If you already use OpenAI or any OpenAI-compatible SDK, switching to Kraken is a one-line change — update the base_url, keep everything else the same.

Base URL: https://gammainfra.com
Dashboard: dashboard.gammainfra.com · Status: status.gammainfra.com · Sign up: gammainfra.com/#signup

Quickstart

1. Get an API key

Sign up at gammainfra.com — email + password, one-time verification link, no credit card. New accounts come with $3.00 of free credit, enough to try the router end-to-end. Your API key is shown once after you click the verification link — store it somewhere safe. Need more keys later? Issue and revoke them from the dashboard.

2. Make your first call

curl -s -X POST https://gammainfra.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-kraken-..." \
  -d '{
    "model": "kraken/auto",
    "messages": [{"role": "user", "content": "Explain transformers in one paragraph."}]
  }'

The response is identical to OpenAI’s format. kraken/auto lets the router pick the best model for your prompt.

3. Drop-in replacement

from openai import OpenAI

client = OpenAI(
    api_key="sk-kraken-...",
    base_url="https://gammainfra.com/v1",
)

response = client.chat.completions.create(
    model="kraken/auto",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

The same pattern works with LangChain, LlamaIndex, and any other OpenAI-compatible library.

Authentication

Every request (except /v1/models and /v1/status) requires a Bearer token:

Authorization: Bearer sk-kraken-...

Keys are prefixed sk-kraken-. The plaintext is only returned on creation — Kraken stores a bcrypt hash. Create additional keys or revoke old ones from the dashboard.

StatusMeaning
401Missing or invalid API key
402Insufficient credits — top up and retry

Smart routing

Send model: "kraken/auto" and the router classifies your prompt into one of 10 task types, then dispatches to the best-fit model for that type. If the primary model is unavailable, Kraken falls back through a chain of 3–4 models automatically.

Task typeWhen it fires
tool_useRequest includes tools or function calls
multimodalA message contains an image
code_genKeywords: function, code, implement, debug, refactor, regex, unit test
mathKeywords: solve, calculate, equation, integral, prove, probability
reasoningKeywords: explain why, analyse, compare, evaluate, strategy, root cause
creativeKeywords: poem, story, essay, brainstorm, lyrics, rewrite
translationKeywords: translate, localize, in/to spanish/french/japanese/…
extractionKeywords: extract, parse, classify, format as json, list all, sentiment
summarisationKeywords: summarise, tldr, key points, brief, condense
chatDefault when nothing else matches
Prefer a trade-off? Send X-Kraken-Preference: quality (default), cost, or latency to bias the router.
Want finer control? Send a continuous X-Kraken-Cost-Quality: 0.0 (pure quality) … 1.0 (pure cost) header and Kraken will place you on that axis. The server echoes X-Kraken-Cost-Quality-Applied on the response so you can log exactly what landed. An explicit X-Kraken-Preference: latency always wins over the cost-quality dial.
Want to opt out? Send X-Kraken-Routing: off and Kraken will route straight to the exact model you named in model.

Model names

Smart aliases (recommended)

Model nameBehaviour
kraken/autoPicks the best-fit model for your prompt type
kraken/fastOptimises for lowest latency (equivalent to X-Kraken-Preference: latency)
kraken/cheapOptimises for lowest cost (equivalent to X-Kraken-Preference: cost)

Pin a specific model

Prefix any model with its provider slug:

openai/gpt-5.4
openai/gpt-5.4-mini
openai/gpt-5.4-nano
openai/gpt-5-mini
anthropic/claude-opus-4-6
anthropic/claude-sonnet-4-6
anthropic/claude-haiku-4-5
google/gemini-3.1-pro-preview
google/gemini-3-flash-preview
google/gemini-2.5-pro
google/gemini-2.5-flash
mistral/mistral-large-2512
mistral/mistral-small-2603
mistral/codestral-2508
mistral/devstral-2512
groq/llama-3.3-70b-versatile
groq/llama-3.1-8b-instant
groq/gpt-oss-120b
deepseek/deepseek-chat
deepseek/deepseek-reasoner
grok/grok-4
grok/grok-4-fast
grok/grok-code-fast-1

For the full, authoritative list:

curl -s https://gammainfra.com/v1/models | jq .

Streaming

Streaming works exactly like OpenAI — set stream: true and read Server-Sent Events. All providers are normalised to the OpenAI SSE format, so your existing code works unchanged.

stream = client.chat.completions.create(
    model="kraken/auto",
    messages=[{"role": "user", "content": "Write a haiku about distributed systems."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Headers

Request headers

HeaderValuePurpose
AuthorizationBearer sk-kraken-…Required
Content-Typeapplication/jsonRequired
X-Kraken-RoutingoffDisable smart routing; use the exact model you named
X-Kraken-Preferencequality (default) / cost / latencyBias the router when using kraken/auto
X-Kraken-Cost-QualityDecimal in 0.01.0Continuous cost/quality dial. 0.0 = pure quality, 1.0 = pure cost. Overrides X-Kraken-Preference: quality/cost. An explicit latency preference still wins. Malformed values are ignored and the legacy preset applies.

Response headers

HeaderMeaning
X-Kraken-Request-IdCorrelation ID — include when filing a support request
X-Kraken-ProviderWhich provider served the response (e.g. openai, anthropic)
X-Kraken-Router-Versionv1 today; v2 once the ML router goes live
X-Kraken-Logical-ModelRouter v2 only — the logical bucket the router picked
X-Kraken-Cost-Quality-AppliedPresent only when an X-Kraken-Cost-Quality request header drove the routing decision. Value is the parsed float (e.g. 0.800) so you can log and replay the decision.
X-Kraken-Fallback-ChainComma-separated provider/model list actually attempted on this request. Useful for post-mortems.
X-Kraken-Fallback-ReasonWhy the chain walked past the first pick (e.g. provider_error, low_confidence).

Credits & pricing

Approximate cost per 1M tokens

ModelInputOutput
kraken/auto (chat default)~$0.10~$0.60
openai/gpt-5.4$2.00$8.00
openai/gpt-5.4-mini$0.40$1.60
anthropic/claude-opus-4-6$5.00$25.00
anthropic/claude-sonnet-4-6$3.00$15.00
google/gemini-3.1-pro-preview$1.25$5.00
google/gemini-3-flash-preview$0.30$2.50
deepseek/deepseek-chat$0.28$0.42
groq/llama-3.1-8b-instant$0.06$0.08

Costs above are provider list prices — Kraken passes them straight through. The only Kraken fee is the 5% we charge when you top up credits (3% during the launch window). For the full cost table and legal terms, see the Terms of Service.

Reasoning tokens on gpt-5 and DeepSeek-Reasoner

OpenAI’s gpt-5 family and DeepSeek’s reasoner model bill hidden “reasoning tokens” in addition to the visible output. Reasoning tokens are the model’s chain-of-thought and are not returned in the response but are counted in usage.completion_tokens.

Kraken silently caps gpt-5 reasoning at max_completion_tokens=2048 when the caller omits the parameter, and picks a conservative reasoning_effort based on the router’s logical label (chat → low, code/summarize → medium, reasoning/math → high). This prevents the “‘hi’ burned 320 reasoning tokens” pathology from reaching your bill, but you should still budget 2–4× visible output tokens for gpt-5-family calls in batch sizing. Inspect usage.completion_tokens_details.reasoning_tokens in any response to see the split.

Check your balance

curl -s https://gammainfra.com/v1/billing/balance \
  -H "Authorization: Bearer sk-kraken-..."
{"balance_usd": 0.97, "customer_id": "..."}

Top up

Top up your balance from dashboard.gammainfra.comTop up. You’ll be redirected to Stripe’s hosted checkout and back to your dashboard once payment clears. Card data is handled by Stripe — Kraken never sees it. Amount range: $5 – $1000; your balance updates within seconds of Stripe’s confirmation.

Bring your own key (BYOK)

Optional. By default Kraken uses its own provider API keys on your behalf — one Kraken key, every model. If you already have a direct relationship with a provider, add your own key at dashboard.gammainfra.comProvider Keys and Kraken will route requests to that provider through your key instead.

Add a key

curl -s -X POST https://gammainfra.com/v1/provider-keys \
  -H "Authorization: Bearer sk-kraken-..." \
  -H "Content-Type: application/json" \
  -d '{"provider_name": "openai", "api_key": "sk-..."}'

List your keys

curl -s https://gammainfra.com/v1/provider-keys \
  -H "Authorization: Bearer sk-kraken-..."

Delete a key

curl -s -X DELETE https://gammainfra.com/v1/provider-keys/openai \
  -H "Authorization: Bearer sk-kraken-..."

Or manage all of this from dashboard.gammainfra.comProvider Keys.

BYOK pricing — separate prepaid balance

BYOK traffic uses its own prepaid balance, distinct from your managed credits. Top it up from the dashboard's BYOK Balance tab or via POST /v1/billing/byok/checkout (minimum $5, no top-up fee). Each BYOK-routed request deducts a small per-request fee:

curl -s -X POST https://gammainfra.com/v1/billing/byok/checkout \
  -H "Authorization: Bearer sk-kraken-..." \
  -H "Content-Type: application/json" \
  -d '{"amount_usd": 25.0}'

Check your BYOK balance:

curl -s https://gammainfra.com/v1/billing/byok/balance \
  -H "Authorization: Bearer sk-kraken-..."

Error codes

Error responses use a consistent JSON shape:

{
  "error": {
    "message": "Human-readable description",
    "type": "error_type",
    "code": "machine_readable_code",
    "request_id": "uuid"
  }
}
StatusCodeMeaning
401Missing or invalid API key
402insufficient_creditsManaged credit balance can’t cover the request
402byok_balance_emptyBYOK prepaid balance exhausted — top up at the dashboard to resume
422Invalid request body
429Provider-side rate limit passed through — respect Retry-After
503providers_downAll providers in the fallback chain failed
Got a 503? It means every model in the fallback chain for that task type errored at the same time — usually transient. Retry with exponential backoff. Include X-Kraken-Request-Id from the response headers if you file a support ticket.

Rate limits

Status

Live per-provider uptime, latency, and error counts are published:

Both endpoints are public (no auth). Each provider is marked operational, degraded, or outage based on the rolling 24 h request log plus a live health-check ping.

Support

Email hello@gammainfra.com. Include the X-Kraken-Request-Id response header from any failing request — it lets us trace the exact path the request took through the router.

For policy and billing terms, see Terms and Privacy.