Fastest model. Best for short files under 30 minutes that do not require accurate timestamps. Supports translation between 15 languages.
Most reliable model. Best for both transcriptions and captions workflows.
Most versatile model. Best for recordings with noisy environments, dialects and accents, singing voices, and Chinese/Cantonese. Performs verbatim transcription including disfluencies and filler words.
Most accurate model. Best for accuracy-critical use cases.
Please do not include personally identifiable information in the job name or file names.
If you select a different language from the source language, the output will be translated to the selected language.