Audio

Audio: TTS & STT

Two audio endpoints: Text-to-Speech (convert any text to natural audio) and Speech-to-Text (transcribe audio files using Whisper). Both are OpenAI-compatible.


Text-to-Speech (TTS)

Convert any text (up to 4096 characters) to natural-sounding speech. The response is a binary audio file.

POST https://api.belugapi.com/v1/audio/speech

Returns binary audio. Requires Authorization: Bearer bapi_…

Parameters

ParameterTypeRequiredDescription
model string required tts-1 (fast, standard quality) or tts-1-hd (highest quality).
input string required Text to synthesise (max 4096 characters).
voice string required Voice name. Options: alloy, echo, fable, onyx, nova, shimmer.
response_format string optional Audio format: mp3 (default), opus, aac, flac, wav, pcm.
speed number optional Speech speed 0.25–4.0. Default: 1.0.

Examples

from pathlib import Path
from openai  import OpenAI

client = OpenAI(
    api_key="bapi_your_key_here",
    base_url="https://api.belugapi.com/v1"
)

# TTS-1 — fast & affordable
response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="Welcome to BelugAPI — your gateway to every major AI model at wholesale prices.",
    response_format="mp3",
)
response.stream_to_file("output.mp3")
print("Saved to output.mp3")

# TTS-1 HD — highest quality
response_hd = client.audio.speech.create(
    model="tts-1-hd",
    voice="alloy",
    input="This is a high-definition speech synthesis example.",
    speed=1.1,
)
import { writeFileSync } from "fs";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" });

const response = await client.audio.speech.create({
  model:           "tts-1",
  voice:           "nova",
  input:           "Welcome to BelugAPI!",
  response_format: "mp3",
});

const buffer = Buffer.from(await response.arrayBuffer());
writeFileSync("output.mp3", buffer);
console.log("Saved to output.mp3");
curl https://api.belugapi.com/v1/audio/speech \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model":           "tts-1",
    "voice":           "nova",
    "input":           "Welcome to BelugAPI!",
    "response_format": "mp3"
  }' \
  --output output.mp3

Available voices

VoiceCharacter
alloyNeutral, balanced. Great for general-purpose use.
echoMale, clear. Well-suited for technical content.
fableExpressive, storytelling. Perfect for narratives.
onyxDeep, authoritative. Ideal for news/announcements.
novaWarm, friendly. Great for customer-facing apps.
shimmerSoft, calm. Suited for meditation / wellbeing apps.

TTS models

Model IDNameProviderMax inputFormats
tts-1 TTS-1 OpenAI 4096 characters mp3, opus, aac, flac, wav, pcm
tts-1-hd TTS-1 HD OpenAI 4096 characters mp3, opus, aac, flac, wav, pcm

Speech-to-Text (Transcription)

Transcribe audio files to text using Whisper-1. Send a multipart/form-data request with your audio file.

POST https://api.belugapi.com/v1/audio/transcriptions

Multipart/form-data. Requires Authorization: Bearer bapi_…

Parameters

ParameterTypeRequiredDescription
file file required Audio file. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm. Max 25 MB.
model string required whisper-1 — the only supported transcription model.
language string optional ISO-639-1 code (e.g. en, fr, de). Leave empty for auto-detect.
prompt string optional Context text to guide the transcription (optional).
response_format string optional json (default), text, srt, verbose_json, vtt.
temperature number optional Sampling temperature 0–1. Default: 0.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="bapi_your_key_here",
    base_url="https://api.belugapi.com/v1"
)

# Basic transcription
with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
    )
print(transcription.text)

# With explicit language + verbose JSON
with open("audio_fr.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        language="fr",
        response_format="verbose_json",
    )
print(result.text, result.duration)
import { createReadStream } from "fs";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" });

const transcription = await client.audio.transcriptions.create({
  model:  "whisper-1",
  file:   createReadStream("audio.mp3"),
});

console.log(transcription.text);
curl https://api.belugapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

# With SRT subtitle output
curl https://api.belugapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1" \
  -F "response_format=srt"
Transcription response (json)
{
  "text": "Welcome to BelugAPI, your gateway to every major AI model at wholesale prices."
}
Transcription response (verbose_json)
{
  "task":     "transcribe",
  "language": "english",
  "duration": 4.2,
  "text":     "Welcome to BelugAPI…",
  "segments": [{
    "id":    0,
    "start": 0.0,
    "end":   4.2,
    "text":  " Welcome to BelugAPI…"
  }]
}

STT models

Model IDNameProviderMax fileLanguages
whisper-1 Whisper-1 OpenAI 25MB 38 languages

Supported languages (Whisper-1)

Whisper-1 auto-detects language or you can specify with the language parameter (ISO-639-1).

en English zh Chinese es Spanish fr French de German ja Japanese ko Korean ar Arabic hi Hindi pt Portuguese ru Russian it Italian nl Dutch pl Polish tr Turkish vi Vietnamese th Thai id Indonesian sv Swedish da Danish fi Finnish el Greek he Hebrew fa Persian ro Romanian hu Hungarian cs Czech +10 more