MyBotBoxMyBotBox

AWS Textract

Extract text, tables, and forms from documents

Overview

Integrate AWS Textract into your workflow to extract text, tables, forms, and key-value pairs from documents. Supports single-page synchronous processing and multi-page asynchronous processing via S3.

Setup

  1. Add the AWS Textract block to your workflow
  2. Enter your AWS credentials (Access Key ID and Secret Access Key)
  3. Select the processing mode
  4. Upload a document or provide an S3 URI

Configuration

ParameterTypeRequiredDescription
processingModedropdownYesSingle-page (sync) or Multi-page (async via S3)
documentfile/URLConditionalFor single-page mode (JPEG, PNG, 1-page PDF; max 10MB)
s3UristringConditionalFor multi-page mode (s3://bucket/key format)
regionstringYesAWS region (e.g., us-east-1)
accessKeyIdstringYesAWS Access Key ID
secretAccessKeystringYesAWS Secret Access Key
extractTablesbooleanNoExtract tables from documents
extractFormsbooleanNoExtract form key-value pairs
detectSignaturesbooleanNoDetect signatures
analyzeLayoutbooleanNoAnalyze document layout

Tools

textract_parser

Extracts text, tables, and forms from documents using AWS Textract.

Output

ParameterTypeDescription
blocksjsonArray of detected blocks (PAGE, LINE, WORD, TABLE, CELL, KEY_VALUE_SET, etc.)
documentMetadatajsonDocument metadata containing page count
modelVersionstringTextract model version used

Processing Modes

Single-page (Synchronous)

  • Supports JPEG, PNG, and single-page PDF
  • Maximum file size: 10MB
  • Upload directly or provide a URL

Multi-page (Asynchronous)

  • Supports multi-page PDF and TIFF
  • Files must be in S3 (provide s3://bucket/key URI)
  • Processes asynchronously and waits for results

Notes

  • Category: tools
  • Type: textract
  • Requires AWS IAM credentials with Textract permissions