You are here

Troubleshooting Intelligence Services

Use this information to help troubleshoot Intelligence Services.

Make sure that Alfresco Transform Service is working before testing Alfresco Intelligence Services. See Troubleshooting Transform Service for more.

Metadata extraction

AWS setup

General

Why don't I see any extracted metadata (AI properties)?

Check that your configured rule adds the desired AI aspects to extract and Requests AI renditions.
  • For Rekognition, you need to add the AI Labels aspect.
  • For Comprehend, you need to add one or more of the other AI aspects (AI People, AI Organizations, etc.) or all nine of them, if desired.

back to top

I've added AI aspects and requested AI renditions but I'm still not seeing any extracted AI data. Which logs can I refer to see if anything is failing?

First, check the Alfresco Transform Service is running for a document transform (e.g. docx to pdf). See Troubleshooting Transform Service for more.

Next, check the logs for each microservice (container) including the Transform Router and AI Transform Engine. You can also check ActiveMQ queues to see the Transform Requests / Replies.

back to top

Why don't I see a Rekognition image analysis?

Check the service logs for the Transform Router and AI Transform Engine. You may see the following error:

Source File size or mime type is not within allowable limits

Here's a snippet from the logs:

aws-ai_1 | 2019-03-26 14:01:43.595 ERROR 1 --- [enerContainer-4] o.a.t.AbstractTransformerController : 
 Failed to perform transform (Exception), sending TransformReply{requestId='5575d683-5a7e-476c-9c1c-14e071cbc85b', 
 status=500, errorDetails='Failed at processing transformation - AIClientException: [AWS Rekognition Label Detection]
 Source file too large to perform operation on', sourceReference='b75a08b8-17b1-4612-ba65-5ecd582a23e0', 
 targetReference='null', clientData='workspace://SpacesStore/67ff9be4-7596-4ddb-bead-929366262e2f aiLabels 
 1337801886 mjackson 1553608903465 304 jpg json-ailabels', schema=1, internalContext=InternalContext{multiStep=null, 
 attemptedRetries=0, currentSourceMediaType='image/jpeg', currentTargetMediaType='application/vnd.alfresco.ai.labels.v1+json', 
 currentSourceSize=21202469, transformRequestOptions={maxLabels=1000, minConfidence=0.8, transform=TikaAuto, 
 includeContents=false, notExtractBookmarksText=true, targetMimetype=text/plain, targetEncoding=UTF-8}}}
aws-ai_1 | 
aws-ai_1 | org.alfresco.transformer.ai.exception.AIClientException: [AWS Rekognition Label Detection] Source file too large to 
 perform operation on
aws-ai_1 | at org.alfresco.transformer.ai.client.AwsRekognitionLabelDetectionRequestFactory.createDetectLabelsRequest(
 AwsRekognitionLabelDetectionRequestFactory.java:88) ~[alfresco-ai-image-analysis-0.4.0.jar!/:na]

If the image file size is larger than 15 MB then it will be skipped. See Amazon Rekognition limits for more.

back to top

Why don't I see a Comprehend text analysis?

Check the service logs for the Transform Router and AI Transform Engine.

If you see the request but not the response, then check if the text file is larger than 125 KB. If so, it may take some time for the asynchronous response. Check again after 5 to 15 minutes.

Verify that you have correctly configured the AWS Comprehend Role to allow the Comprehend service read/write access to the S3 bucket used to temporarily store source files and results.

See Amazon Comprehend limits and Amazon Comprehend role for more.

back to top

Why don't I see a Textract image / text analysis?

Check the service logs for the Transform Router and AI Transform Engine. You may see the following error:
Source File size or mime type is not within allowable limits

Here's a snippet from the logs:

aws-ai_1 | 2019-03-26 13:47:24.848 ERROR 1 --- [enerContainer-5] o.a.t.AbstractTransformerController :
 Failed to perform transform (Exception), sending TransformReply{requestId='5fda6718-27b0-4289-b968-88a6f3bc3fb8',
 status=500, errorDetails='Failed at processing transformation - AIApplicationParameterException: Source File size 
 or mime type is not within allowable limits', sourceReference='d36c0d14-06fb-4648-b72a-439af16d7f12', 
 targetReference='null', clientData='workspace://SpacesStore/26522de2-6aa5-400c-b670-130dccd010c5 aiTextract 
 -1927981676 mjackson 1553607988261 244 png json-aitextract', schema=1, internalContext=InternalContext{multiStep=null,
 attemptedRetries=2, currentSourceMediaType='image/png', currentTargetMediaType='application/vnd.alfresco.ai.textract.v1+json', 
 currentSourceSize=12398590, transformRequestOptions={minConfidence=0.8, timeout=910000, transform=TikaAuto, 
 includeContents=false, notExtractBookmarksText=true, targetMimetype=text/plain, targetEncoding=UTF-8}}}
aws-ai_1 | 
aws-ai_1 | org.alfresco.transformer.ai.exception.AIApplicationParameterException: Source File size or mime type is not 
 within allowable limits
aws-ai_1 | at org.alfresco.transformer.ai.service.AwsTextractTransformer.transformInternal(AwsTextractTransformer.java:87)
 ~[alfresco-ai-image-text-analysis-0.4.0.jar!/:na]

If the image file size is larger than 5 MB then it will be skipped. See Amazon Textract limits for more.

back to top

Why do I see an AWS error when I upload a test folder of test files (e.g. via drag and drop)?

com.amazonaws.services.textract.model.ProvisionedThroughputExceededException: Provisioned rate exceeded (Service: AmazonTextract; 
 Status Code: 400; Error Code: ProvisionedThroughputExceededException; Request ID: fef613f2-2e96-4f03-afd9-936a22dbff36)

This is likely to be a limitation of the Textract limits. Check the rate limits in the Amazon Textract documentation.

back to top

What file formats/media types can be passed to the AWS services for processing?

The following file types can be passed and processed by AWS services.

Images (jpg, png, gif, tiff) are processed by Rekognition and Textract:

  • jpg & png are sent directly to Rekognition & Textract
.
  • gif is converted to jpg (by the Alfresco Transform Engines), and then sent to Rekognition & Textract
.
  • tiff is converted to gif, then converted to jpg (by the Alfresco Transform Engines), and then sent to Rekognition & Textract.

Text (txt) is processed by Comprehend

Pdf is processed by Comprehend & Textract:
  • pdf is converted to txt (by the Alfresco Transform Engines), and then sent to Comprehend
.
  • pdf is also sent directly to Textract.
Office docs (word, excel, powerpoint & outlook msg) are processed by Comprehend & Textract:
  • doc is converted to txt (by the Alfresco Transform Engines), and then sent to Comprehend.
  • doc is also converted to pdf and then sent to Textract.

You'll find a summary of the Transform Router configuration properties in transformer-routes-with-ai.properties, in the Intelligence Services distribution zip file.

back to top

I'm getting a lot of inaccurate metadata, what can I do?

The current default on minimum confidence levels is set at 80%.

See Configuring the minimum confidence level for more.

back to top

Which Amazon AI/ML APIs are used for processing?

For Comprehend, we use the Language Detection and Entity Recognition APIs. See the Amazon Comprehend Features for more details.

For Rekognition, we use the Object and Scene Detection APIs. See the Amazon Rekognition Features for more details.

For Textract, we use the Synchronous Operations APIs. See the Amazon Textract Features for more details.

back to top

Where can I get more information on the content data model for the AI rendition?

The data model is in a private GitHub project. Alfresco customers can request access to this project by logging a ticket with Alfresco Support.

back to top

I might be having AWS credential errors, what can I check?

Check your AWS setup and configuration. This will also appear in AI Transform Engine service log.
Note: For security reasons, the AWS credentials in the logs will be masked (i.e. only a few characters will appear - similar to the AWS CLI).

back to top

After starting Intelligence Services, and trying to request AI renditions, what AWS connection errors / exceptions might I see in the AI Transform Engine logs?

You may see one of the following errors:
* The security token included in the request is invalid
* The request signature we calculated does not match the signature you provided. 
Check your AWS Secret Access Key and signing method

If you notice AWS connection errors for Rekognition, Comprehend and/or Textract, then check your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY carefully.

For example:
  • A mistyped AWS_ACCESS_KEY_ID may cause an error such as "The security token included in the request is invalid".
  • A mistyped AWS_SECRET_ACCESS_KEY may cause the error shown in the second example (above).
You may see other connection errors, for example, if neither AWS_ACCESS_KEY_ID nor AWS_SECRET_ACCESS_KEY has been set.

Also, if you're using Docker Compose, then you may find that setting or exporting environment variables overrides a local .env configuration file.

back to top

What do I need to set up to access Amazon AI services: Comprehend, Rekognition & Textract?

See Setting up services in AWS for more.

back to top

How do the Amazon AI services access the content to be processed?

For very small files, the AWS synchronous APIs are called. However, for many files (see limits above) we may need to call the AWS asynchronous APIs. In this case, you'll need to set up an S3 bucket for asynchronous AWS calls, such that the AWS services can read the files to be processed.

We recommend the following setup:

  • Use a different bucket than the one used by Alfresco Content Services (when using Alfresco Content Connector for AWS S3).
  • Create an S3 bucket in the same region as the Amazon AI services.
  • For Comprehend, you also need to enable Comprehend to have write access to your S3 bucket for returning the results. This can be done by setting up an IAM role.

See Configuring AWS Identity and Access Management and Setting up services in AWS for more details.

back to top

When is Amazon Elastic File System (EFS) used?

From the architecture diagram in the Intelligence Services overview, the Shared File Store (SFS) provides a mechanism for Alfresco Content Services to send source files to the Transform Engines and receive target files from the Transform Engines. This includes the new AI Transform Engine that's used to call the Amazon AI services. In order to scale SFS for performance and reliability, it must be able to access a shared volume storage, such as managed EFS or self-managed NFS.

back to top

I'm concerned about data privacy of the content that's processed by AWS. Do I have any control around this?

The AWS data privacy policies state that authorized AWS employees will have access to your content. Review the following AWS pages:

For Comprehend, Rekognition, and Textract, you can speak directly with your Account Manager at AWS and ask that when your content is passed for processing, it won't be used and stored for the improvement and development of their algorithms.

back to top

What can you tell me about the performance of Intelligence Services?

The performance of this module is mostly dependent on the limitations around processing times and rates for Comprehend, Rekognition, Textract, and less dependent on the Alfresco components. The architecture of the Alfresco components for Intelligence Services are easily scalable to support larger ingestion of content going to Amazon, and also coming back from Amazon. We have internal benchmarking data on the Transform Service which can be referenced, if further information is requested.

back to top

Sending feedback to the Alfresco documentation team

You don't appear to have JavaScript enabled in your browser. With JavaScript enabled, you can provide feedback to us using our simple form. Here are some instructions on how to enable JavaScript in your web browser.