- Forms with their fields and values
- Tables with their cells
The general prerequisites to use Amazon Textract are documented in Getting Started with Amazon Textract.
See the list of supported AWS regions where Amazon Textract is available.
- Amazon Textract synchronous operations (DetectDocumentText and AnalyzeDocument) support the PNG and JPEG image formats. The maximum document image (JPG/PNG) size is 5 MB.
- Asynchronous operations (StartDocumentTextDetection,
StartDocumentAnalysis) also support the PDF file format.
The maximum PDF file size is 500 MB, and a maximum of 3000 pages.
- To process PDF documents, we use asynchronous operations that go via an S3 bucket setup for Intelligence Services and Textract.
- The maximum number of concurrent jobs for all asynchronous operations is 1.
See the AWS site for more details on service limits: Limits in Amazon Textract.
You'll need to create an AWS Identity and Access Management (IAM) role with the correct permissions to control access to AWS services and resources.
There is a setting for the level of confidence that Amazon Textract has in the accuracy of the extracted content. This is defined as the minimum confidence level and has a default value of 80%.
See Setting up services in AWS for more.