You are here

Analyzing text with Amazon Comprehend

Amazon Comprehend uses natural language processing (NLP) to extract insights from your content. Use this information to review the prerequisites and to set up the service.

It develops insights by recognizing common elements in your content into a number of content types, such as:
  • entities (e.g. people, places, locations)
  • language
Note: This release of Alfresco Intelligence Services supports English only.

Prerequisites

The general prerequisites for using Amazon Comprehend are documented in Getting Started with Amazon Comprehend. Since the Transform Engine has to use asynchronous jobs for large text files, some additional setup is required to get the service working correctly. This is covered in the later configuration section.

Supported regions

See the list of supported AWS regions where Amazon Comprehend is available.

Limits

Synchronous operations have a limit of 5KB (5000 bytes). The encoding of the content must be UTF-8. Note that Amazon Comprehend may store the analyzed content in order to continuously improve the quality of its analysis models.

To bypass the limit for synchronous calls, we use batch operations which analyze a set of up to 25 documents (maximum). Each individual document has the same limit of 5KB, which means that the Transform Engine is able to work synchronously with documents up to 25x5 = 125KB.

To process documents larger than 125KB, we use asynchronous operations that go via an S3 bucket setup for Intelligence Services and Comprehend.

See the AWS site for more details: Guidelines and limits, Amazon Comprehend Limits.

Configuration

You'll need to create an AWS Identity and Access Management (IAM) role with the correct permissions to control access to AWS services and resources.

There is a setting for the level of confidence that Amazon Comprehend has in the accuracy of the extracted content. This is defined as the minimum confidence level and has a default value of 80%.

See Setting up services in AWS for more.

Sending feedback to the Alfresco documentation team

You don't appear to have JavaScript enabled in your browser. With JavaScript enabled, you can provide feedback to us using our simple form. Here are some instructions on how to enable JavaScript in your web browser.