Audio
Audio: TTS & STT
Two audio endpoints: Text-to-Speech (convert any text to natural audio) and Speech-to-Text (transcribe audio files using Whisper). Both are OpenAI-compatible.
Text-to-Speech (TTS)
Convert any text (up to 4096 characters) to natural-sounding speech. The response is a binary audio file.
POST
https://api.belugapi.com/v1/audio/speech
Returns binary audio. Requires Authorization: Bearer bapi_…
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | required | tts-1 (fast, standard quality) or tts-1-hd (highest quality). |
| input | string | required | Text to synthesise (max 4096 characters). |
| voice | string | required | Voice name. Options: alloy, echo, fable, onyx, nova, shimmer. |
| response_format | string | optional | Audio format: mp3 (default), opus, aac, flac, wav, pcm. |
| speed | number | optional | Speech speed 0.25–4.0. Default: 1.0. |
Examples
from pathlib import Path from openai import OpenAI client = OpenAI( api_key="bapi_your_key_here", base_url="https://api.belugapi.com/v1" ) # TTS-1 — fast & affordable response = client.audio.speech.create( model="tts-1", voice="nova", input="Welcome to BelugAPI — your gateway to every major AI model at wholesale prices.", response_format="mp3", ) response.stream_to_file("output.mp3") print("Saved to output.mp3") # TTS-1 HD — highest quality response_hd = client.audio.speech.create( model="tts-1-hd", voice="alloy", input="This is a high-definition speech synthesis example.", speed=1.1, )
import { writeFileSync } from "fs"; import OpenAI from "openai"; const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" }); const response = await client.audio.speech.create({ model: "tts-1", voice: "nova", input: "Welcome to BelugAPI!", response_format: "mp3", }); const buffer = Buffer.from(await response.arrayBuffer()); writeFileSync("output.mp3", buffer); console.log("Saved to output.mp3");
curl https://api.belugapi.com/v1/audio/speech \ -H "Authorization: Bearer bapi_your_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "tts-1", "voice": "nova", "input": "Welcome to BelugAPI!", "response_format": "mp3" }' \ --output output.mp3
Available voices
| Voice | Character |
|---|---|
| alloy | Neutral, balanced. Great for general-purpose use. |
| echo | Male, clear. Well-suited for technical content. |
| fable | Expressive, storytelling. Perfect for narratives. |
| onyx | Deep, authoritative. Ideal for news/announcements. |
| nova | Warm, friendly. Great for customer-facing apps. |
| shimmer | Soft, calm. Suited for meditation / wellbeing apps. |
TTS models
| Model ID | Name | Provider | Max input | Formats |
|---|---|---|---|---|
tts-1 |
TTS-1 | OpenAI | 4096 characters | mp3, opus, aac, flac, wav, pcm |
tts-1-hd |
TTS-1 HD | OpenAI | 4096 characters | mp3, opus, aac, flac, wav, pcm |
Speech-to-Text (Transcription)
Transcribe audio files to text using Whisper-1. Send a multipart/form-data request with your audio file.
POST
https://api.belugapi.com/v1/audio/transcriptions
Multipart/form-data. Requires Authorization: Bearer bapi_…
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | required | Audio file. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm. Max 25 MB. |
| model | string | required | whisper-1 — the only supported transcription model. |
| language | string | optional | ISO-639-1 code (e.g. en, fr, de). Leave empty for auto-detect. |
| prompt | string | optional | Context text to guide the transcription (optional). |
| response_format | string | optional | json (default), text, srt, verbose_json, vtt. |
| temperature | number | optional | Sampling temperature 0–1. Default: 0. |
Examples
from openai import OpenAI client = OpenAI( api_key="bapi_your_key_here", base_url="https://api.belugapi.com/v1" ) # Basic transcription with open("audio.mp3", "rb") as f: transcription = client.audio.transcriptions.create( model="whisper-1", file=f, ) print(transcription.text) # With explicit language + verbose JSON with open("audio_fr.mp3", "rb") as f: result = client.audio.transcriptions.create( model="whisper-1", file=f, language="fr", response_format="verbose_json", ) print(result.text, result.duration)
import { createReadStream } from "fs"; import OpenAI from "openai"; const client = new OpenAI({ apiKey: "bapi_your_key_here", baseURL: "https://api.belugapi.com/v1" }); const transcription = await client.audio.transcriptions.create({ model: "whisper-1", file: createReadStream("audio.mp3"), }); console.log(transcription.text);
curl https://api.belugapi.com/v1/audio/transcriptions \ -H "Authorization: Bearer bapi_your_key_here" \ -F "file=@audio.mp3" \ -F "model=whisper-1" # With SRT subtitle output curl https://api.belugapi.com/v1/audio/transcriptions \ -H "Authorization: Bearer bapi_your_key_here" \ -F "file=@audio.mp3" \ -F "model=whisper-1" \ -F "response_format=srt"
Transcription response (json)
{
"text": "Welcome to BelugAPI, your gateway to every major AI model at wholesale prices."
}
Transcription response (verbose_json)
{
"task": "transcribe",
"language": "english",
"duration": 4.2,
"text": "Welcome to BelugAPI…",
"segments": [{
"id": 0,
"start": 0.0,
"end": 4.2,
"text": " Welcome to BelugAPI…"
}]
}
STT models
| Model ID | Name | Provider | Max file | Languages |
|---|---|---|---|---|
whisper-1 |
Whisper-1 | OpenAI | 25MB | 38 languages |
Supported languages (Whisper-1)
Whisper-1 auto-detects language or you can specify with the language parameter (ISO-639-1).
en
English
zh
Chinese
es
Spanish
fr
French
de
German
ja
Japanese
ko
Korean
ar
Arabic
hi
Hindi
pt
Portuguese
ru
Russian
it
Italian
nl
Dutch
pl
Polish
tr
Turkish
vi
Vietnamese
th
Thai
id
Indonesian
sv
Swedish
da
Danish
fi
Finnish
el
Greek
he
Hebrew
fa
Persian
ro
Romanian
hu
Hungarian
cs
Czech
+10 more