AWS Textract
Extract text, tables, and forms from documents
Overview
Integrate AWS Textract into your workflow to extract text, tables, forms, and key-value pairs from documents. Supports single-page synchronous processing and multi-page asynchronous processing via S3.
Setup
- Add the AWS Textract block to your workflow
- Enter your AWS credentials (Access Key ID and Secret Access Key)
- Select the processing mode
- Upload a document or provide an S3 URI
Configuration
| Parameter | Type | Required | Description |
|---|---|---|---|
processingMode | dropdown | Yes | Single-page (sync) or Multi-page (async via S3) |
document | file/URL | Conditional | For single-page mode (JPEG, PNG, 1-page PDF; max 10MB) |
s3Uri | string | Conditional | For multi-page mode (s3://bucket/key format) |
region | string | Yes | AWS region (e.g., us-east-1) |
accessKeyId | string | Yes | AWS Access Key ID |
secretAccessKey | string | Yes | AWS Secret Access Key |
extractTables | boolean | No | Extract tables from documents |
extractForms | boolean | No | Extract form key-value pairs |
detectSignatures | boolean | No | Detect signatures |
analyzeLayout | boolean | No | Analyze document layout |
Tools
textract_parser
Extracts text, tables, and forms from documents using AWS Textract.
Output
| Parameter | Type | Description |
|---|---|---|
blocks | json | Array of detected blocks (PAGE, LINE, WORD, TABLE, CELL, KEY_VALUE_SET, etc.) |
documentMetadata | json | Document metadata containing page count |
modelVersion | string | Textract model version used |
Processing Modes
Single-page (Synchronous)
- Supports JPEG, PNG, and single-page PDF
- Maximum file size: 10MB
- Upload directly or provide a URL
Multi-page (Asynchronous)
- Supports multi-page PDF and TIFF
- Files must be in S3 (provide
s3://bucket/keyURI) - Processes asynchronously and waits for results
Notes
- Category:
tools - Type:
textract - Requires AWS IAM credentials with Textract permissions