Chat Completions

Chat Completions

The core endpoint for conversational AI. Send a list of messages, get a reply. Supports streaming, vision, function calling, and every leading LLM provider.

POST https://api.belugapi.com/v1/chat/completions

Requires Authorization: Bearer bapi_… header.

Request parameters

ParameterTypeRequiredDescription
model string required Model ID slug — see table below.
messages array required Array of {role, content} objects. Roles: system, user, assistant.
stream boolean optional If true, tokens stream as SSE events. Default: false.
max_tokens integer optional Maximum tokens to generate in the response.
temperature number optional Sampling temperature 0–2. Higher = more creative. Default: 1.
top_p number optional Nucleus sampling probability. Default: 1.
tools array optional Function definitions for function calling (tool use).
tool_choice string | object optional "auto", "none", or {"type":"function","function":{"name":"…"}}.
response_format object optional {"type":"json_object"} or {"type":"json_schema",…} for structured output.
stop string | array optional Up to 4 sequences where generation stops.
frequency_penalty number optional –2.0 to 2.0. Penalises repeated tokens.
presence_penalty number optional –2.0 to 2.0. Penalises tokens already in the prompt.
seed integer optional For deterministic outputs (model dependent).

Basic example

from openai import OpenAI

client = OpenAI(
    api_key="bapi_your_key_here",
    base_url="https://api.belugapi.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system",  "content": "You are a helpful assistant."},
        {"role": "user",    "content": "Explain quantum entanglement in 3 sentences."}
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" });

const res = await client.chat.completions.create({
  model: "gpt-5.4",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user",   content: "Explain quantum entanglement in 3 sentences." },
  ],
  max_tokens: 256,
  temperature: 0.7,
});
console.log(res.choices[0].message.content);
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user",   "content": "Explain quantum entanglement in 3 sentences."}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Response object

200 OK
{
  "id":      "chatcmpl-abc123",
  "object":  "chat.completion",
  "created": 1716900000,
  "model":   "gpt-5.4",
  "choices": [{
    "index":         0,
    "message": {
      "role":    "assistant",
      "content": "Quantum entanglement is a phenomenon…"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens":     28,
    "completion_tokens": 74,
    "total_tokens":      102
  }
}

Streaming (SSE)

Set "stream": true. Tokens arrive as data: {...} chunks followed by data: [DONE].

stream = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Write a haiku about the sea."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
const stream = await client.chat.completions.create({
  model:    "claude-opus-4-7",
  messages: [{ role: "user", content: "Write a haiku about the sea." }],
  stream:   true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{
    "model": "claude-opus-4-7",
    "messages": [{"role":"user","content":"Write a haiku about the sea."}],
    "stream": true
  }'

Vision (image input)

For models with vision support, pass an array of content items including image_url parts.

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url",
             "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
      ]
    }]
  }'

Function calling (tools)

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "What is the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }],
    tool_choice="auto",
)
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{"role":"user","content":"What is the weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": {"type":"string"} },
          "required": ["city"]
        }
      }
    }],
    "tool_choice": "auto"
  }'

Available LLM models

Use any of these IDs in the model field. All models are OpenAI-compatible.

Model ID Name Provider Type Stream Vision
gpt-5.2-codex GPT-5.2 Codex OpenAI code
gpt-5.2-pro GPT-5.2 Pro OpenAI chat
gpt-5.2-chat-latest GPT-5.2 Chat Latest OpenAI chat
gpt-5.2 GPT-5.2 OpenAI chat
gpt-5.1-chat-latest GPT-5.1 Chat Latest OpenAI chat
gpt-5.1 GPT-5.1 OpenAI chat
gpt-5.1-2025-11-13 GPT-5.1 (2025-11-13) OpenAI chat
gpt-5.1-codex GPT-5.1 Codex OpenAI code
gpt-5.1-codex-mini GPT-5.1 Codex Mini OpenAI code
gpt-5-pro-2025-10-06 GPT-5 Pro OpenAI chat
gpt-5-2025-08-07 GPT-5 OpenAI chat
gpt-5-mini-2025-08-07 GPT-5 Mini OpenAI chat
gpt-5-nano-2025-08-07 GPT-5 Nano OpenAI chat
gpt-5-search-api GPT-5 Search API OpenAI chat
gpt-5-search-api-2025-10-14 GPT-5 Search API (2025-10-14) OpenAI chat
gpt-4o-transcribe GPT-4o Transcribe OpenAI audio
gpt-4o-mini-transcribe GPT-4o Mini Transcribe OpenAI audio
gpt-4.1-mini-2025-04-14 GPT-4.1 Mini OpenAI chat
gpt-4.1-nano-2025-04-14 GPT-4.1 Nano OpenAI chat
gpt-4-1106-preview GPT-4 Turbo (1106 Preview) OpenAI chat
gpt-3.5-turbo-16k GPT-3.5 Turbo 16K OpenAI chat
o3-2025-04-16 o3 OpenAI reasoning
o4-mini-2025-04-16 o4 Mini OpenAI reasoning
o3-mini-2025-01-31 o3 Mini OpenAI reasoning
o1-2024-12-17 o1 OpenAI reasoning
o1-mini-2024-09-12 o1 Mini OpenAI reasoning
claude-opus-4-7 Claude Opus 4.7 Anthropic chat
claude-opus-4-6 Claude Opus 4.6 Anthropic chat
claude-opus-4-6-thinking Claude Opus 4.6 Thinking Anthropic reasoning
claude-opus-4-5-20251101 Claude Opus 4.5 Anthropic chat
claude-opus-4-5-20251101-thinking Claude Opus 4.5 Thinking Anthropic reasoning
claude-sonnet-4-6 Claude Sonnet 4.6 Anthropic chat
claude-sonnet-4-6-thinking Claude Sonnet 4.6 Thinking Anthropic reasoning
claude-sonnet-4-5-20250929 Claude Sonnet 4.5 Anthropic chat
claude-sonnet-4-5-20250929-thinking Claude Sonnet 4.5 Thinking Anthropic reasoning
claude-haiku-4-5-20251001 Claude Haiku 4.5 Anthropic chat
claude-haiku-4-5-20251001-thinking Claude Haiku 4.5 Thinking Anthropic reasoning
claude-3-7-sonnet-20250219-thinking Claude 3.7 Sonnet Thinking Anthropic reasoning
gemini-3.1-pro-preview Gemini 3.1 Pro Preview Google chat
gemini-3-pro-preview Gemini 3 Pro Preview Google chat
gemini-3-pro-preview-thinking Gemini 3 Pro Preview Thinking Google reasoning
gemini-3-flash-preview Gemini 3 Flash Preview Google chat
gemini-3-flash-preview-nothinking Gemini 3 Flash Preview (No Thinking) Google chat
gemini-2.5-pro Gemini 2.5 Pro Google chat
gemini-2.5-pro-thinking Gemini 2.5 Pro Thinking Google reasoning
gemini-2.5-pro-nothinking Gemini 2.5 Pro (No Thinking) Google chat
gemini-2.5-flash Gemini 2.5 Flash Google chat
gemini-2.5-flash-thinking Gemini 2.5 Flash Thinking Google reasoning
gemini-2.5-flash-nothinking Gemini 2.5 Flash (No Thinking) Google chat
gemini-2.5-flash-lite Gemini 2.5 Flash Lite Google chat
gemini-2.0-flash Gemini 2.0 Flash Google chat
deepseek-v3.2 DeepSeek V3.2 DeepSeek chat
deepseek-v3.2-exp DeepSeek V3.2 Experimental DeepSeek chat
deepseek-v3.1-terminus DeepSeek V3.1 Terminus DeepSeek chat
deepseek-v3-0324 DeepSeek V3 DeepSeek chat
deepseek-r1-250528 DeepSeek R1 DeepSeek reasoning
deepseek-r1-0528 DeepSeek R1 (0528) DeepSeek reasoning
deepseek-ocr DeepSeek OCR DeepSeek vision
glm-5.1 GLM-5.1 Zhipu chat
glm-4.7 GLM-4.7 Zhipu chat
glm-4.6 GLM-4.6 Zhipu chat
minimax-m2.1 MiniMax M2.1 MiniMax chat
kimi-k2-instruct Kimi K2 Instruct Moonshot AI chat
kimi-k2-thinking Kimi K2 Thinking Moonshot AI reasoning
llama3.1-8b Meta Llama 3.1 8B BelugAPI chat
gpt-oss-120b OpenAI GPT OSS 120B BelugAPI chat
qwen-3-235b-a22b-instruct-2507 Qwen 3 235B Instruct BelugAPI chat
zai-glm-4.7 Z.ai GLM 4.7 BelugAPI chat

Examples by model

Claude (Anthropic)

response = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Write a Python function to reverse a linked list."}],
    max_tokens=1024,
)
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7",
    "messages": [{"role":"user","content":"Write a Python function to reverse a linked list."}],
    "max_tokens": 1024
  }'

Gemini (Google)

response = client.chat.completions.create(
    model="gemini-3-flash-preview",  # fast & cheap
    messages=[{"role": "user", "content": "Summarise the French Revolution in 5 points."}],
)
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [{"role":"user","content":"Summarise the French Revolution in 5 points."}]
  }'

DeepSeek

# DeepSeek R1 — reasoning model
response = client.chat.completions.create(
    model="deepseek-r1-250528",
    messages=[{"role": "user", "content": "Solve: if 2x+3=11, what is x?"}],
)
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1-250528",
    "messages": [{"role":"user","content":"Solve: if 2x+3=11, what is x?"}]
  }'

BelugAPI Edge (high-throughput)

Edge models (llama3.1-8b, gpt-oss-120b, qwen-3-235b-a22b-instruct-2507) run on BelugAPI's own infrastructure for ultra-low latency and up to 3,000 tokens/s throughput.
# Llama 3.1 8B — fastest model, ~2200 tok/s
response = client.chat.completions.create(
    model="llama3.1-8b",
    messages=[{"role": "user", "content": "Hello! What can you do?"}],
    stream=True,
)
curl https://api.belugapi.com/v1/chat/completions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-8b",
    "messages": [{"role":"user","content":"Hello! What can you do?"}],
    "stream": true
  }'