Audio

Audio: TTS & STT

Two audio endpoints: Text-to-Speech (convert any text to natural audio) and Speech-to-Text (transcribe audio files using Whisper). Both are OpenAI-compatible.

Text-to-Speech (TTS)

Convert any text (up to 4096 characters) to natural-sounding speech. The response is a binary audio file.

POST https://api.belugapi.com/v1/audio/speech

Returns binary audio. Requires Authorization: Bearer bapi_…

Parameters

Parameter	Type	Required	Description
model	string	required	`tts-1` (fast, standard quality) or `tts-1-hd` (highest quality).
input	string	required	Text to synthesise (max 4096 characters).
voice	string	required	Voice name. Options: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`.
response_format	string	optional	Audio format: `mp3` (default), `opus`, `aac`, `flac`, `wav`, `pcm`.
speed	number	optional	Speech speed 0.25–4.0. Default: `1.0`.

Examples

from pathlib import Path
from openai  import OpenAI

client = OpenAI(
    api_key="bapi_your_key_here",
    base_url="https://api.belugapi.com/v1"
)

# TTS-1 — fast & affordable
response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="Welcome to BelugAPI — your gateway to every major AI model at wholesale prices.",
    response_format="mp3",
)
response.stream_to_file("output.mp3")
print("Saved to output.mp3")

# TTS-1 HD — highest quality
response_hd = client.audio.speech.create(
    model="tts-1-hd",
    voice="alloy",
    input="This is a high-definition speech synthesis example.",
    speed=1.1,
)

import { writeFileSync } from "fs";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" });

const response = await client.audio.speech.create({
  model:           "tts-1",
  voice:           "nova",
  input:           "Welcome to BelugAPI!",
  response_format: "mp3",
});

const buffer = Buffer.from(await response.arrayBuffer());
writeFileSync("output.mp3", buffer);
console.log("Saved to output.mp3");

curl https://api.belugapi.com/v1/audio/speech \
  -H "Authorization: Bearer bapi_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model":           "tts-1",
    "voice":           "nova",
    "input":           "Welcome to BelugAPI!",
    "response_format": "mp3"
  }' \
  --output output.mp3

Available voices

Voice	Character
alloy	Neutral, balanced. Great for general-purpose use.
echo	Male, clear. Well-suited for technical content.
fable	Expressive, storytelling. Perfect for narratives.
onyx	Deep, authoritative. Ideal for news/announcements.
nova	Warm, friendly. Great for customer-facing apps.
shimmer	Soft, calm. Suited for meditation / wellbeing apps.

TTS models

Model ID	Name	Provider	Max input	Formats
`tts-1`	TTS-1	OpenAI	4096 characters	mp3, opus, aac, flac, wav, pcm
`tts-1-hd`	TTS-1 HD	OpenAI	4096 characters	mp3, opus, aac, flac, wav, pcm

Speech-to-Text (Transcription)

Transcribe audio files to text using Whisper-1. Send a multipart/form-data request with your audio file.

POST https://api.belugapi.com/v1/audio/transcriptions

Multipart/form-data. Requires Authorization: Bearer bapi_…

Parameters

Parameter	Type	Required	Description
file	file	required	Audio file. Supported formats: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`. Max 25 MB.
model	string	required	`whisper-1` — the only supported transcription model.
language	string	optional	ISO-639-1 code (e.g. `en`, `fr`, `de`). Leave empty for auto-detect.
prompt	string	optional	Context text to guide the transcription (optional).
response_format	string	optional	`json` (default), `text`, `srt`, `verbose_json`, `vtt`.
temperature	number	optional	Sampling temperature 0–1. Default: 0.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="bapi_your_key_here",
    base_url="https://api.belugapi.com/v1"
)

# Basic transcription
with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
    )
print(transcription.text)

# With explicit language + verbose JSON
with open("audio_fr.mp3", "rb") as f:
    result = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        language="fr",
        response_format="verbose_json",
    )
print(result.text, result.duration)

import { createReadStream } from "fs";
import OpenAI from "openai";

const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" });

const transcription = await client.audio.transcriptions.create({
  model:  "whisper-1",
  file:   createReadStream("audio.mp3"),
});

console.log(transcription.text);

curl https://api.belugapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

# With SRT subtitle output
curl https://api.belugapi.com/v1/audio/transcriptions \
  -H "Authorization: Bearer bapi_your_key_here" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1" \
  -F "response_format=srt"

Transcription response (json)

{
  "text": "Welcome to BelugAPI, your gateway to every major AI model at wholesale prices."
}

Transcription response (verbose_json)

{
  "task":     "transcribe",
  "language": "english",
  "duration": 4.2,
  "text":     "Welcome to BelugAPI…",
  "segments": [{
    "id":    0,
    "start": 0.0,
    "end":   4.2,
    "text":  " Welcome to BelugAPI…"
  }]
}

STT models

Model ID	Name	Provider	Max file	Languages
`whisper-1`	Whisper-1	OpenAI	25MB	38 languages

Supported languages (Whisper-1)

Whisper-1 auto-detects language or you can specify with the language parameter (ISO-639-1).

en English zh Chinese es Spanish fr French de German ja Japanese ko Korean ar Arabic hi Hindi pt Portuguese ru Russian it Italian nl Dutch pl Polish tr Turkish vi Vietnamese th Thai id Indonesian sv Swedish da Danish fi Finnish el Greek he Hebrew fa Persian ro Romanian hu Hungarian cs Czech +10 more