- entities (e.g. people, places, locations)
The general prerequisites for using Amazon Comprehend are documented in Getting Started with Amazon Comprehend. Since the Transform Engine has to use asynchronous jobs for large text files, some additional setup is required to get the service working correctly. This is covered in the later configuration section.
See the list of supported AWS regions where Amazon Comprehend is available.
Synchronous operations have a limit of 5KB (5000 bytes). The encoding of the content must be UTF-8. Note that Amazon Comprehend may store the analyzed content in order to continuously improve the quality of its analysis models.
To bypass the limit for synchronous calls, we use batch operations which analyze a set of up to 25 documents (maximum). Each individual document has the same limit of 5KB, which means that the Transform Engine is able to work synchronously with documents up to 25x5 = 125KB.
To process documents larger than 125KB, we use asynchronous operations that go via an S3 bucket setup for Intelligence Services and Comprehend.
You'll need to create an AWS Identity and Access Management (IAM) role with the correct permissions to control access to AWS services and resources.
There is a setting for the level of confidence that Amazon Comprehend has in the accuracy of the extracted content. This is defined as the minimum confidence level and has a default value of 80%.
See Setting up services in AWS for more.