Errors & Limits

Errors & Rate Limits

BelugAPI follows the OpenAI error format exactly. All errors return a JSON object with an error key, even for HTTP 5xx responses when possible.

Error object

Error response structure

{
  "error": {
    "message": "The model 'foo' does not exist.",
    "type":    "invalid_request_error",
    "code":    "model_not_found",
    "param":   "model"
  }
}

Field	Type	Description
message	string	Human-readable description of the error.
type	string	Error category. See table below.
code	string \| null	Machine-readable error code. May be `null`.
param	string \| null	The request parameter that caused the error, if applicable.

HTTP status codes

Status	Meaning	Common cause
200	OK	Request succeeded.
400	Bad Request	Missing or invalid parameter (e.g. no `model`, bad `messages`).
401	Unauthorized	Missing or invalid API key.
402	Payment Required	Insufficient balance. Top up at dashboard.
403	Forbidden	API key does not have access to this endpoint or model.
404	Not Found	Model does not exist, or task ID not found.
429	Too Many Requests	Rate limit exceeded. Back off and retry.
500	Internal Server Error	Unexpected error on BelugAPI side. Retry with exponential backoff.
502	Bad Gateway	Upstream provider returned an error or was unreachable.
503	Service Unavailable	Model catalog unavailable, or model temporarily down.
504	Gateway Timeout	Upstream provider timed out (e.g. image / video task polling timeout).

Error types

type	Description
invalid_request_error	Request is malformed — missing params, wrong model, wrong endpoint.
authentication_error	API key is missing, revoked, or invalid format.
insufficient_quota	Account balance is too low to process the request.
rate_limit_error	Too many requests in a short period.
upstream_error	The underlying AI provider returned an error.
server_error	Internal BelugAPI error (catalog, database, etc.).
timeout	Operation timed out (image polling 5 min, upstream 10 min).

Error codes

code	HTTP	Description
missing_parameter	400	A required field was not provided.
invalid_parameter	400	A field value is not valid.
model_not_found	404	The `model` ID does not exist in the catalog.
wrong_endpoint	400	Model used on the wrong endpoint (e.g. video model on `/chat/completions`).
insufficient_balance	402	Account balance is below minimum threshold.
endpoint_not_allowed	403	API key restricted from this endpoint.
model_not_allowed	403	API key has an allow-list that doesn't include this model.
upstream_not_configured	503	Upstream provider credentials are missing (edge models only).
image_task_timeout	504	Image task polling exceeded 5-minute deadline.
catalog_missing	503	Model catalog JSON file is missing or unreadable.

Handling errors in code

import openai

try:
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except openai.AuthenticationError as e:
    print("Invalid API key:", e.message)
except openai.RateLimitError as e:
    print("Rate limited — back off:", e.message)
except openai.InsufficientQuotaError as e:
    print("Low balance — top up at belugapi.com/dashboard")
except openai.BadRequestError as e:
    print("Bad request:", e.message, "param:", e.param)
except openai.APIStatusError as e:
    print("Unexpected error:", e.status_code, e.message)

import OpenAI from "openai";

try {
  const response = await client.chat.completions.create({...});
} catch (err) {
  if (err instanceof OpenAI.AuthenticationError) {
    console.error("Invalid API key");
  } else if (err instanceof OpenAI.RateLimitError) {
    console.error("Rate limited", err.message);
  } else if (err instanceof OpenAI.BadRequestError) {
    console.error("Bad request:", err.error?.error?.code);
  } else {
    console.error("API error:", err.status, err.message);
  }
}

Retry strategy

Implement exponential backoff for transient errors (429, 502, 503, 504).

import time, openai

def call_with_retry(fn, max_retries=4):
    delay = 1
    for attempt in range(max_retries):
        try:
            return fn()
        except (openai.RateLimitError, openai.APIStatusError) as e:
            if attempt == max_retries - 1:
                raise
            print(f"Retrying in {delay}s… ({e})")
            time.sleep(delay)
            delay *= 2

Errors in streaming responses

When stream: true, the HTTP status code is always 200 after headers are sent. Errors mid-stream arrive as a special SSE event followed by [DONE]:

Mid-stream error (SSE)

data: {"error":{"message":"Upstream error.","type":"upstream_error","code":"upstream_error"}}

data: [DONE]

Always listen for error fields in SSE chunks, not just HTTP status codes, when streaming is enabled.

Rate limits

BelugAPI enforces rate limits per API key and per workspace to ensure fair usage. Limits vary by plan.

Limit type	Default	Notes
Requests per minute (RPM)	60	Higher on Pro and Enterprise plans.
Tokens per minute (TPM)	200,000	LLM models only.
Concurrent video tasks	5	Maximum parallel in-flight video generation tasks.
Audio file size	25 MB	Hard limit for transcription uploads.
TTS input length	4096 chars	Per request. Split longer texts into chunks.

When you hit a rate limit (HTTP 429), wait for the Retry-After header value (in seconds) before retrying, or use exponential backoff starting at 1 second.

Health check

Use GET /v1/health to verify API availability. No authentication required.

curl https://api.belugapi.com/v1/health

import requests
r = requests.get("https://api.belugapi.com/v1/health")
print(r.json())

Health response

{
  "status":     "ok",
  "service":    "belugapi-gateway",
  "version":    "1.0.0",
  "time":       "2026-05-05T15:00:00+00:00",
  "checks": {
    "catalog":  { "ok": true, "total_models": 240 },
    "database": { "ok": true }
  }
}