AWS Textract

Extract text, tables, and forms from documents

Overview

Integrate AWS Textract into your workflow to extract text, tables, forms, and key-value pairs from documents. Supports single-page synchronous processing and multi-page asynchronous processing via S3.

Setup

Add the AWS Textract block to your workflow
Enter your AWS credentials (Access Key ID and Secret Access Key)
Select the processing mode
Upload a document or provide an S3 URI

Configuration

Parameter	Type	Required	Description
`processingMode`	dropdown	Yes	`Single-page` (sync) or `Multi-page` (async via S3)
`document`	file/URL	Conditional	For single-page mode (JPEG, PNG, 1-page PDF; max 10MB)
`s3Uri`	string	Conditional	For multi-page mode (`s3://bucket/key` format)
`region`	string	Yes	AWS region (e.g., `us-east-1`)
`accessKeyId`	string	Yes	AWS Access Key ID
`secretAccessKey`	string	Yes	AWS Secret Access Key
`extractTables`	boolean	No	Extract tables from documents
`extractForms`	boolean	No	Extract form key-value pairs
`detectSignatures`	boolean	No	Detect signatures
`analyzeLayout`	boolean	No	Analyze document layout

Tools

`textract_parser`

Extracts text, tables, and forms from documents using AWS Textract.

Output

Parameter	Type	Description
`blocks`	json	Array of detected blocks (PAGE, LINE, WORD, TABLE, CELL, KEY_VALUE_SET, etc.)
`documentMetadata`	json	Document metadata containing page count
`modelVersion`	string	Textract model version used

Processing Modes

Single-page (Synchronous)

Supports JPEG, PNG, and single-page PDF
Maximum file size: 10MB
Upload directly or provide a URL

Multi-page (Asynchronous)

Supports multi-page PDF and TIFF
Files must be in S3 (provide s3://bucket/key URI)
Processes asynchronously and waits for results

Notes

Category: tools
Type: textract
Requires AWS IAM credentials with Textract permissions

AWS Textract

On this page