You are here

Analyzing content with Amazon Textract

Amazon Textract makes it easy to add text detection and analysis to your applications. Use this information to review the prerequisites and to set up the service.

This service can detect text in a variety of documents (such as financial reports, medical records, and tax forms). For documents with structured data, the following can be detected:
  • Forms with their fields and values
  • Tables with their cells

Prerequisites

The general prerequisites to use Amazon Textract are documented in Getting Started with Amazon Textract.

Supported regions

See the list of supported AWS regions where Amazon Textract is available.

Limits

There are a number of limits that relate to Amazon Textract:
  • Amazon Textract synchronous operations (DetectDocumentText and AnalyzeDocument) support the PNG and JPEG image formats. The maximum document image (JPG/PNG) size is 5 MB.
  • Asynchronous operations (StartDocumentTextDetection, StartDocumentAnalysis) also support the PDF file format. The maximum PDF file size is 500 MB, and a maximum of 3000 pages.
    • To process PDF documents, we use asynchronous operations that go via an S3 bucket setup for Intelligence Services and Textract.
    • The maximum number of concurrent jobs for all asynchronous operations is 1.

See the AWS site for more details on service limits: Limits in Amazon Textract.

Configuration

You'll need to create an AWS Identity and Access Management (IAM) role with the correct permissions to control access to AWS services and resources.

There is a setting for the level of confidence that Amazon Textract has in the accuracy of the extracted content. This is defined as the minimum confidence level and has a default value of 80%.

See Setting up services in AWS for more.

Sending feedback to the Alfresco documentation team

You don't appear to have JavaScript enabled in your browser. With JavaScript enabled, you can provide feedback to us using our simple form. Here are some instructions on how to enable JavaScript in your web browser.