MyBotBoxMyBotBox

Speech-to-Text

Convert speech to text using AI

Overview

Transcribe audio and video files to text using leading AI providers. Supports multiple languages, timestamps, and speaker diarization.

Setup

  1. Add the Speech-to-Text block to your workflow
  2. Select your preferred provider
  3. Enter the provider's API key
  4. Upload an audio/video file or provide a URL

Providers

ProviderModelsFeatures
OpenAI Whisperwhisper-1Translation to English
Deepgramnova-3, nova-2, enhanced, baseSpeaker diarization
ElevenLabsscribe_v1High accuracy
AssemblyAIbest, nanoSentiment analysis, entity detection, PII redaction, summarization
Google Geminigemini-2.0-flash, gemini-2.5-proLarge file support

Configuration

ParameterTypeRequiredDescription
providerdropdownYesSTT provider
modeldropdownYesProvider-specific model
apiKeystringYesProvider API key
audioFilefileYesAudio/video file to transcribe
audioUrlstringNoPublicly accessible audio/video URL
languagedropdownYesLanguage code or auto-detect
timestampsdropdownYesTimestamp granularity: none, sentence, word
diarizationbooleanNoSpeaker diarization (Deepgram/AssemblyAI)
translateToEnglishbooleanNoTranslate to English (Whisper only)

AssemblyAI-specific Options

ParameterTypeDescription
sentimentbooleanEnable sentiment analysis
entityDetectionbooleanEnable entity detection
piiRedactionbooleanEnable PII redaction
summarizationbooleanEnable auto-summarization

Output

ParameterTypeDescription
transcriptstringFull transcribed text
segmentsarrayTimestamped segments with speaker labels
languagestringDetected or specified language
durationnumberAudio duration in seconds
confidencenumberConfidence score (Deepgram/AssemblyAI/Gemini)
sentimentarraySentiment results (AssemblyAI only)
entitiesarrayDetected entities (AssemblyAI only)
summarystringAuto-generated summary (AssemblyAI only)

Supported Formats

Audio: MP3, M4A, WAV, WebM, OGG, FLAC, AAC, OPUS

Video: MP4, MOV, AVI, MKV

Notes

  • Category: tools
  • Type: stt
  • 20+ language options available