API Documentation

Speech-to-text API for audio and video transcription

Build speech-to-text and video transcription into your app. Upload files, import public URLs, request speaker labels and timestamps, receive webhooks, and export TXT, DOCX, SRT, VTT, and PDF transcripts from one REST API.

Create an API key Ask about API plans

Base URL

https://api.instanttranscriber.com/v1

Authentication

All endpoints require a bearer API key created in account settings.

Authorization: Bearer it_live_<key_id>_<secret>

Processing

Jobs usually process at roughly 10x audio speed. Up to 10 jobs can run in parallel.

API Use Cases

Speech-to-text API and video transcription API examples

Speech-to-text API

Send audio from meetings, interviews, calls, podcasts, voice notes, or recorded support sessions and receive transcript text, segments, speaker labels, timestamps, summaries, and export links.

Video transcription API

Upload MP4, MOV, WebM, MKV, and other supported video files when they contain audio. Export captions as SRT or VTT, or send DOCX/PDF transcripts to review workflows.

Production transcript workflow

Use webhooks for completion, quota endpoints for usage controls, and API keys from account settings so your product can create transcriptions without routing users through the dashboard.

Start API setup Check API quota

Quick Start

Create a transcription

Submit a multipart file upload or a JSON body with a public file_url. Enhanced speaker labels can clean up unclear turns, and optional speaker name detection replaces generic labels only when names are clearly stated in the transcript.

curl -X POST https://api.instanttranscriber.com/v1/transcriptions \
  -H "Authorization: Bearer $INSTANTTRANSCRIBER_API_KEY" \
  -F "[email protected]" \
  -F "speaker_labels=enhanced" \
  -F "speaker_name_detection=auto" \
  -F "timestamps=false"

Request Options

Control transcript output per job

Field	Values	Default
speaker_labels	none, standard, enhanced	standard
speaker_name_detection	off, auto	off
timestamps	true, false	true
summaries	short, detailed, or both	none
language	auto, omitted, or language code	auto
num_speakers	exact speaker count	auto
min_speakers / max_speakers	speaker range	auto
callback_url	webhook URL	unset
callback_secret	HMAC secret	unset
wait	hold request open, max 70 seconds	0

Use language=auto or omit language for automatic language detection. The API supports the same 100 transcription language codes available in the web app.

Plans And Quota

Included API audio time

Plan	Included API audio time	Max upload size	Max audio duration
Free	1 hour per UTC calendar month	50 MiB	35 minutes
Premium	8 hours per Stripe billing month	3 GB	10 hours
API Plan	100 hours per Stripe billing month	3 GB	10 hours

API usage is billed by audio duration, rounded down to the completed audio second. Each job has a 1 minute minimum. Failed jobs do not count, and dashboard transcriptions do not count against API quota. API Plan overage is billed at $0.49 per audio hour and can be capped from the quota endpoint.

Create API key

Upload And URL Imports

Supported media and remote file rules

The API accepts common audio and video containers/codecs that ffmpeg can probe and decode, including mp3, m4a, wav, flac, ogg, opus, aiff, mp4, mov, mkv, webm, and 3gp. Video files are accepted when they contain an audio stream.

URL imports must be public http:// or https:// URLs on standard ports 80 or 443. Localhost, private IPs, link-local hosts, authenticated URLs, and YouTube-family URLs are rejected.

Remote download model

InstantTranscriber downloads the remote file server-side before validation, transcoding, queueing, and billing checks. Workers do not stream directly from your URL.

Endpoints

API surface

POST/v1/transcriptions

Create a transcription job from a file upload or public file_url.

GET/v1/transcriptions/{id}/status

Poll queued, transcribing, post_processing, completed, or failed status.

GET/v1/transcriptions/{id}

Fetch the completed transcript, segments, language, and requested summaries.

GET/v1/transcriptions?limit=25&offset=0

List recent API-created jobs and quota counted for them.

GET/v1/quota

Inspect included hours, used hours, remaining quota, reset time, and overage state.

PATCH/v1/quota/overage-cap

Set or remove an API Plan monthly overage cap.

DELETE/v1/transcriptions/{id}

Delete a transcript and make future requests for that ID return 404.

Webhooks

Receive completion callbacks

Set callback_url to receive a best-effort POST when the top-level job reaches completed or failed. If callback_secret is set, requests include an X-IT-Signature header using HMAC-SHA256 over compact JSON.

Webhooks are at-least-once notifications. Use the transcript ID plus status as your idempotency key and fetch the transcript by ID for the authoritative result.

{
  "id": "0a2c9f72-0f0b-42f3-a30b-15dc82619500",
  "status": "completed",
  "download_urls": {
    "srt": "https://api.instanttranscriber.com/export/0a2c9f72.srt",
    "vtt": "https://api.instanttranscriber.com/export/0a2c9f72.vtt",
    "docx": "https://api.instanttranscriber.com/export/0a2c9f72.docx",
    "pdf": "https://api.instanttranscriber.com/export/0a2c9f72.pdf"
  }
}

Polling

Job statuses

Status	Meaning
queued	Accepted but not yet started.
transcribing	ASR is running.
succeeded	ASR has finished and post-processing may start shortly.
post_processing	Enhanced speaker labels or summaries are still being generated.
completed	The final result is ready.
failed	The job failed or was rejected.

Poll every 5 seconds for one-off jobs. For many concurrent jobs, use webhook callbacks or per-job backoff. The default status polling budget is 60 status requests per minute per API key and no more than 1 status request per job every 5 seconds.

Results And Exports

Fetch transcript text, segments, summaries, and exports

GET /v1/transcriptions/{id} returns 409 result_not_ready until requested post-processing finishes. Completed results include full transcript text, language, speaker segments, and requested short or detailed summaries.

Completed webhook payloads include download URLs for SRT, VTT, DOCX, and PDF exports. Plain transcript text is available from the result endpoint.

Export formats

Use TXT for automation, SRT/VTT for captions, DOCX for review, and PDF for sharing or archive.

Errors

Stable error codes

Canonical errors return JSON with error.code, error.message, optional details, and a request_id. Include the request ID in support requests.

HTTP	error.code	Retry guidance
400	validation_failed	Fix invalid request body, query string, or option values.
400	unsupported_format	Retry with a supported audio or video file.
400	remote_fetch_failed	Fix the public URL or upload the file directly.
402	quota_exceeded	Upgrade, increase the cap, or wait for the quota reset.
402	requires_upgrade	Upgrade before starting a free-plan job above plan duration limits.
403	auth_failed	Send a valid API key.
404	not_found	Use another transcript ID; the object is missing, deleted, or inaccessible.
409	result_not_ready	Poll status or wait for a webhook before fetching the result.
413	payload_too_large	Use a smaller file or a higher plan.
422	file_too_long	Trim or split the file.
429	rate_limited	Honor the Retry-After header.
500	internal_error	Retry with backoff and include request_id if it persists.

Implementation Guides

Explore transcription API workflows

Speech-to-Text API

Add speech-to-text to your app with file uploads, URL imports, speaker labels, timestamps, webhooks, and transcript exports.

Read the Speech-to-Text API guide

Video Transcription API

Transcribe MP4, MOV, WebM, and other video files through an API. Generate text, subtitles, speaker labels, and export files.

Read the Video Transcription API guide

Transcription Webhooks

Receive webhook callbacks when transcription jobs complete or fail. Use signed callbacks, polling fallback, and transcript export links.

Read the Transcription Webhooks guide

SRT and VTT Export API

Generate SRT and VTT caption files from audio or video through the InstantTranscriber API, with TXT, DOCX, and PDF exports too.

Read the SRT and VTT Export API guide

Whisper API Alternative

Compare InstantTranscriber API as a Whisper API alternative with exports, speaker labels, webhooks, dashboard usage, and quota controls.

Read the Whisper API Alternative guide

Start building with the transcription API

Create an account, generate an API key in account settings, and submit your first audio-to-text job.

Create an API key Contact sales