Errors & Limits

Errors & Rate Limits

BelugAPI follows the OpenAI error format exactly. All errors return a JSON object with an error key, even for HTTP 5xx responses when possible.

Error object

Error response structure
{
  "error": {
    "message": "The model 'foo' does not exist.",
    "type":    "invalid_request_error",
    "code":    "model_not_found",
    "param":   "model"
  }
}
FieldTypeDescription
message string Human-readable description of the error.
type string Error category. See table below.
code string | null Machine-readable error code. May be null.
param string | null The request parameter that caused the error, if applicable.

HTTP status codes

StatusMeaningCommon cause
200 OK Request succeeded.
400 Bad Request Missing or invalid parameter (e.g. no model, bad messages).
401 Unauthorized Missing or invalid API key.
402 Payment Required Insufficient balance. Top up at dashboard.
403 Forbidden API key does not have access to this endpoint or model.
404 Not Found Model does not exist, or task ID not found.
429 Too Many Requests Rate limit exceeded. Back off and retry.
500 Internal Server Error Unexpected error on BelugAPI side. Retry with exponential backoff.
502 Bad Gateway Upstream provider returned an error or was unreachable.
503 Service Unavailable Model catalog unavailable, or model temporarily down.
504 Gateway Timeout Upstream provider timed out (e.g. image / video task polling timeout).

Error types

typeDescription
invalid_request_errorRequest is malformed — missing params, wrong model, wrong endpoint.
authentication_errorAPI key is missing, revoked, or invalid format.
insufficient_quotaAccount balance is too low to process the request.
rate_limit_errorToo many requests in a short period.
upstream_errorThe underlying AI provider returned an error.
server_errorInternal BelugAPI error (catalog, database, etc.).
timeoutOperation timed out (image polling 5 min, upstream 10 min).

Error codes

codeHTTPDescription
missing_parameter400A required field was not provided.
invalid_parameter400A field value is not valid.
model_not_found404The model ID does not exist in the catalog.
wrong_endpoint400Model used on the wrong endpoint (e.g. video model on /chat/completions).
insufficient_balance402Account balance is below minimum threshold.
endpoint_not_allowed403API key restricted from this endpoint.
model_not_allowed403API key has an allow-list that doesn't include this model.
upstream_not_configured503Upstream provider credentials are missing (edge models only).
image_task_timeout504Image task polling exceeded 5-minute deadline.
catalog_missing503Model catalog JSON file is missing or unreadable.

Handling errors in code

import openai

try:
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except openai.AuthenticationError as e:
    print("Invalid API key:", e.message)
except openai.RateLimitError as e:
    print("Rate limited — back off:", e.message)
except openai.InsufficientQuotaError as e:
    print("Low balance — top up at belugapi.com/dashboard")
except openai.BadRequestError as e:
    print("Bad request:", e.message, "param:", e.param)
except openai.APIStatusError as e:
    print("Unexpected error:", e.status_code, e.message)
import OpenAI from "openai";

try {
  const response = await client.chat.completions.create({...});
} catch (err) {
  if (err instanceof OpenAI.AuthenticationError) {
    console.error("Invalid API key");
  } else if (err instanceof OpenAI.RateLimitError) {
    console.error("Rate limited", err.message);
  } else if (err instanceof OpenAI.BadRequestError) {
    console.error("Bad request:", err.error?.error?.code);
  } else {
    console.error("API error:", err.status, err.message);
  }
}

Retry strategy

Implement exponential backoff for transient errors (429, 502, 503, 504).

import time, openai

def call_with_retry(fn, max_retries=4):
    delay = 1
    for attempt in range(max_retries):
        try:
            return fn()
        except (openai.RateLimitError, openai.APIStatusError) as e:
            if attempt == max_retries - 1:
                raise
            print(f"Retrying in {delay}s… ({e})")
            time.sleep(delay)
            delay *= 2

Errors in streaming responses

When stream: true, the HTTP status code is always 200 after headers are sent. Errors mid-stream arrive as a special SSE event followed by [DONE]:

Mid-stream error (SSE)
data: {"error":{"message":"Upstream error.","type":"upstream_error","code":"upstream_error"}}

data: [DONE]
Always listen for error fields in SSE chunks, not just HTTP status codes, when streaming is enabled.

Rate limits

BelugAPI enforces rate limits per API key and per workspace to ensure fair usage. Limits vary by plan.

Limit typeDefaultNotes
Requests per minute (RPM) 60 Higher on Pro and Enterprise plans.
Tokens per minute (TPM) 200,000 LLM models only.
Concurrent video tasks 5 Maximum parallel in-flight video generation tasks.
Audio file size 25 MB Hard limit for transcription uploads.
TTS input length 4096 chars Per request. Split longer texts into chunks.
When you hit a rate limit (HTTP 429), wait for the Retry-After header value (in seconds) before retrying, or use exponential backoff starting at 1 second.

Health check

Use GET /v1/health to verify API availability. No authentication required.

curl https://api.belugapi.com/v1/health
import requests
r = requests.get("https://api.belugapi.com/v1/health")
print(r.json())
Health response
{
  "status":     "ok",
  "service":    "belugapi-gateway",
  "version":    "1.0.0",
  "time":       "2026-05-05T15:00:00+00:00",
  "checks": {
    "catalog":  { "ok": true, "total_models": 240 },
    "database": { "ok": true }
  }
}