Chat Completions

The core endpoint for conversational AI. Send a list of messages, get a reply. Supports streaming, vision, function calling, and every leading LLM provider.

POST https://api.belugapi.com/v1/chat/completions

Requires Authorization: Bearer bapi_… header.

Request parameters

Parameter	Type	Required	Description
model	string	required	Model ID slug — see table below.
messages	array	required	Array of `{role, content}` objects. Roles: `system`, `user`, `assistant`.
stream	boolean	optional	If `true`, tokens stream as SSE events. Default: `false`.
max_tokens	integer	optional	Maximum tokens to generate in the response.
temperature	number	optional	Sampling temperature 0–2. Higher = more creative. Default: 1.
top_p	number	optional	Nucleus sampling probability. Default: 1.
tools	array	optional	Function definitions for function calling (tool use).
tool_choice	string \| object	optional	`"auto"`, `"none"`, or `{"type":"function","function":{"name":"…"}}`.
response_format	object	optional	`{"type":"json_object"}` or `{"type":"json_schema",…}` for structured output.
stop	string \| array	optional	Up to 4 sequences where generation stops.
frequency_penalty	number	optional	–2.0 to 2.0. Penalises repeated tokens.
presence_penalty	number	optional	–2.0 to 2.0. Penalises tokens already in the prompt.
seed	integer	optional	For deterministic outputs (model dependent).

Basic example

from openai import OpenAI

client = OpenAI(
    api_key="bapi_your_key_here",
    base_url="https://api.belugapi.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system",  "content": "You are a helpful assistant."},
        {"role": "user",    "content": "Explain quantum entanglement in 3 sentences."}
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

import OpenAI from "openai";
const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" });

const res = await client.chat.completions.create({
  model: "gpt-5.4",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user",   content: "Explain quantum entanglement in 3 sentences." },
  ],
  max_tokens: 256,
  temperature: 0.7,
});
console.log(res.choices[0].message.content);

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user",   "content": "Explain quantum entanglement in 3 sentences."}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Response object

200 OK

{
  "id":      "chatcmpl-abc123",
  "object":  "chat.completion",
  "created": 1716900000,
  "model":   "gpt-5.4",
  "choices": [{
    "index":         0,
    "message": {
      "role":    "assistant",
      "content": "Quantum entanglement is a phenomenon…"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens":     28,
    "completion_tokens": 74,
    "total_tokens":      102
  }
}

Streaming (SSE)

Set "stream": true. Tokens arrive as data: {...} chunks followed by data: [DONE].

stream = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Write a haiku about the sea."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

const stream = await client.chat.completions.create({
  model:    "claude-opus-4-7",
  messages: [{ role: "user", content: "Write a haiku about the sea." }],
  stream:   true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "claude-opus-4-7",
    "messages": [{"role":"user","content":"Write a haiku about the sea."}],
    "stream": true
  }'

Vision (image input)

For models with vision support, pass an array of content items including image_url parts.

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url",
             "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
      ]
    }]
  }'

Function calling (tools)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "What is the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }],
    tool_choice="auto",
)

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{"role":"user","content":"What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": {"type":"string"} },
          "required": ["city"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Available LLM models

Use any of these IDs in the model field. All models are OpenAI-compatible.

Model ID	Name	Provider	Type	Stream	Vision
`gpt-5.2-codex`	GPT-5.2 Codex	OpenAI	code	✓	—
`gpt-5.2-pro`	GPT-5.2 Pro	OpenAI	chat	✓	✓
`gpt-5.2-chat-latest`	GPT-5.2 Chat Latest	OpenAI	chat	✓	✓
`gpt-5.2`	GPT-5.2	OpenAI	chat	✓	✓
`gpt-5.1-chat-latest`	GPT-5.1 Chat Latest	OpenAI	chat	✓	✓
`gpt-5.1`	GPT-5.1	OpenAI	chat	✓	✓
`gpt-5.1-2025-11-13`	GPT-5.1 (2025-11-13)	OpenAI	chat	✓	✓
`gpt-5.1-codex`	GPT-5.1 Codex	OpenAI	code	✓	—
`gpt-5.1-codex-mini`	GPT-5.1 Codex Mini	OpenAI	code	✓	—
`gpt-5-pro-2025-10-06`	GPT-5 Pro	OpenAI	chat	✓	✓
`gpt-5-2025-08-07`	GPT-5	OpenAI	chat	✓	✓
`gpt-5-mini-2025-08-07`	GPT-5 Mini	OpenAI	chat	✓	✓
`gpt-5-nano-2025-08-07`	GPT-5 Nano	OpenAI	chat	✓	—
`gpt-5-search-api`	GPT-5 Search API	OpenAI	chat	✓	—
`gpt-5-search-api-2025-10-14`	GPT-5 Search API (2025-10-14)	OpenAI	chat	✓	—
`gpt-4o-transcribe`	GPT-4o Transcribe	OpenAI	audio	✓	—
`gpt-4o-mini-transcribe`	GPT-4o Mini Transcribe	OpenAI	audio	✓	—
`gpt-4.1-mini-2025-04-14`	GPT-4.1 Mini	OpenAI	chat	✓	✓
`gpt-4.1-nano-2025-04-14`	GPT-4.1 Nano	OpenAI	chat	✓	—
`gpt-4-1106-preview`	GPT-4 Turbo (1106 Preview)	OpenAI	chat	✓	✓
`gpt-3.5-turbo-16k`	GPT-3.5 Turbo 16K	OpenAI	chat	✓	—
`o3-2025-04-16`	o3	OpenAI	reasoning	✓	✓
`o4-mini-2025-04-16`	o4 Mini	OpenAI	reasoning	✓	✓
`o3-mini-2025-01-31`	o3 Mini	OpenAI	reasoning	✓	—
`o1-2024-12-17`	o1	OpenAI	reasoning	✓	✓
`o1-mini-2024-09-12`	o1 Mini	OpenAI	reasoning	✓	—
`claude-opus-4-7`	Claude Opus 4.7	Anthropic	chat	✓	✓
`claude-opus-4-6`	Claude Opus 4.6	Anthropic	chat	✓	✓
`claude-opus-4-6-thinking`	Claude Opus 4.6 Thinking	Anthropic	reasoning	✓	✓
`claude-opus-4-5-20251101`	Claude Opus 4.5	Anthropic	chat	✓	✓
`claude-opus-4-5-20251101-thinking`	Claude Opus 4.5 Thinking	Anthropic	reasoning	✓	✓
`claude-sonnet-4-6`	Claude Sonnet 4.6	Anthropic	chat	✓	✓
`claude-sonnet-4-6-thinking`	Claude Sonnet 4.6 Thinking	Anthropic	reasoning	✓	✓
`claude-sonnet-4-5-20250929`	Claude Sonnet 4.5	Anthropic	chat	✓	✓
`claude-sonnet-4-5-20250929-thinking`	Claude Sonnet 4.5 Thinking	Anthropic	reasoning	✓	✓
`claude-haiku-4-5-20251001`	Claude Haiku 4.5	Anthropic	chat	✓	✓
`claude-haiku-4-5-20251001-thinking`	Claude Haiku 4.5 Thinking	Anthropic	reasoning	✓	✓
`claude-3-7-sonnet-20250219-thinking`	Claude 3.7 Sonnet Thinking	Anthropic	reasoning	✓	✓
`gemini-3.1-pro-preview`	Gemini 3.1 Pro Preview	Google	chat	✓	✓
`gemini-3-pro-preview`	Gemini 3 Pro Preview	Google	chat	✓	✓
`gemini-3-pro-preview-thinking`	Gemini 3 Pro Preview Thinking	Google	reasoning	✓	✓
`gemini-3-flash-preview`	Gemini 3 Flash Preview	Google	chat	✓	✓
`gemini-3-flash-preview-nothinking`	Gemini 3 Flash Preview (No Thinking)	Google	chat	✓	✓
`gemini-2.5-pro`	Gemini 2.5 Pro	Google	chat	✓	✓
`gemini-2.5-pro-thinking`	Gemini 2.5 Pro Thinking	Google	reasoning	✓	✓
`gemini-2.5-pro-nothinking`	Gemini 2.5 Pro (No Thinking)	Google	chat	✓	✓
`gemini-2.5-flash`	Gemini 2.5 Flash	Google	chat	✓	✓
`gemini-2.5-flash-thinking`	Gemini 2.5 Flash Thinking	Google	reasoning	✓	✓
`gemini-2.5-flash-nothinking`	Gemini 2.5 Flash (No Thinking)	Google	chat	✓	✓
`gemini-2.5-flash-lite`	Gemini 2.5 Flash Lite	Google	chat	✓	✓
`gemini-2.0-flash`	Gemini 2.0 Flash	Google	chat	✓	✓
`deepseek-v3.2`	DeepSeek V3.2	DeepSeek	chat	✓	—
`deepseek-v3.2-exp`	DeepSeek V3.2 Experimental	DeepSeek	chat	✓	—
`deepseek-v3.1-terminus`	DeepSeek V3.1 Terminus	DeepSeek	chat	✓	—
`deepseek-v3-0324`	DeepSeek V3	DeepSeek	chat	✓	—
`deepseek-r1-250528`	DeepSeek R1	DeepSeek	reasoning	✓	—
`deepseek-r1-0528`	DeepSeek R1 (0528)	DeepSeek	reasoning	✓	—
`deepseek-ocr`	DeepSeek OCR	DeepSeek	vision	✓	✓
`glm-5.1`	GLM-5.1	Zhipu	chat	✓	✓
`glm-4.7`	GLM-4.7	Zhipu	chat	✓	✓
`glm-4.6`	GLM-4.6	Zhipu	chat	✓	✓
`minimax-m2.1`	MiniMax M2.1	MiniMax	chat	✓	✓
`kimi-k2-instruct`	Kimi K2 Instruct	Moonshot AI	chat	✓	✓
`kimi-k2-thinking`	Kimi K2 Thinking	Moonshot AI	reasoning	✓	✓
`llama3.1-8b`	Meta Llama 3.1 8B	BelugAPI	chat	✓	—
`gpt-oss-120b`	OpenAI GPT OSS 120B	BelugAPI	chat	✓	—
`qwen-3-235b-a22b-instruct-2507`	Qwen 3 235B Instruct	BelugAPI	chat	✓	—
`zai-glm-4.7`	Z.ai GLM 4.7	BelugAPI	chat	✓	—

Examples by model

Claude (Anthropic)

response = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Write a Python function to reverse a linked list."}],
    max_tokens=1024,
)

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "messages": [{"role":"user","content":"Write a Python function to reverse a linked list."}],
    "max_tokens": 1024
  }'

Gemini (Google)

response = client.chat.completions.create(
    model="gemini-3-flash-preview",  # fast & cheap
    messages=[{"role": "user", "content": "Summarise the French Revolution in 5 points."}],
)

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [{"role":"user","content":"Summarise the French Revolution in 5 points."}]
  }'

DeepSeek

# DeepSeek R1 — reasoning model
response = client.chat.completions.create(
    model="deepseek-r1-250528",
    messages=[{"role": "user", "content": "Solve: if 2x+3=11, what is x?"}],
)

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1-250528",
    "messages": [{"role":"user","content":"Solve: if 2x+3=11, what is x?"}]
  }'

BelugAPI Edge (high-throughput)

Edge models (llama3.1-8b, gpt-oss-120b, qwen-3-235b-a22b-instruct-2507) run on BelugAPI's own infrastructure for ultra-low latency and up to 3,000 tokens/s throughput.

# Llama 3.1 8B — fastest model, ~2200 tok/s
response = client.chat.completions.create(
    model="llama3.1-8b",
    messages=[{"role": "user", "content": "Hello! What can you do?"}],
    stream=True,
)

curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-8b",
    "messages": [{"role":"user","content":"Hello! What can you do?"}],
    "stream": true
  }'