Speech-to-Text
Convert speech to text using AI
Overview
Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization.
Setup
- Add the Speech-to-Text block to your workflow
- Select your preferred provider
- Enter the provider's API key
- Upload an audio/video file or provide a URL
Providers
| Provider | Models | Features |
|---|---|---|
| OpenAI Whisper | whisper-1 | Translation to English |
| Deepgram | nova-3, nova-2, enhanced, base | Speaker diarization |
| ElevenLabs | scribe_v1 | High accuracy |
| AssemblyAI | best, nano | Sentiment analysis, entity detection, PII redaction, summarization |
| Google Gemini | gemini-2.0-flash, gemini-2.5-pro | Large file support |
Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
provider | dropdown | Yes | STT provider |
model | dropdown | Yes | Provider-specific model |
apiKey | string | Yes | Provider API key |
audioFile | file | Yes | Audio/video file to transcribe |
audioUrl | string | No | Publicly accessible audio/video URL |
language | dropdown | Yes | Language code or auto-detect |
timestamps | dropdown | Yes | Timestamp granularity: none, sentence, word |
diarization | boolean | No | Speaker diarization (Deepgram/AssemblyAI) |
translateToEnglish | boolean | No | Translate to English (Whisper only) |
AssemblyAI-specific Options
| Parameter | Type | Description |
|---|---|---|
sentiment | boolean | Enable sentiment analysis |
entityDetection | boolean | Enable entity detection |
piiRedaction | boolean | Enable PII redaction |
summarization | boolean | Enable auto-summarization |
Output
| Parameter | Type | Description |
|---|---|---|
transcript | string | Full transcribed text |
segments | array | Timestamped segments with speaker labels |
language | string | Detected or specified language |
duration | number | Audio duration in seconds |
confidence | number | Confidence score (Deepgram/AssemblyAI/Gemini) |
sentiment | array | Sentiment results (AssemblyAI only) |
entities | array | Detected entities (AssemblyAI only) |
summary | string | Auto-generated summary (AssemblyAI only) |
Supported Formats
Audio: MP3, M4A, WAV, WebM, OGG, FLAC, AAC, OPUS
Video: MP4, MOV, AVI, MKV
Notes
- Category:
tools - Type:
stt - 20+ language options available