Close
Alfresco Process Automation

AWS connectors

There are five connectors that can be used to invoke different Amazon Web Services (AWS):

All Amazon connectors are displayed on the process diagram with their respective AWS logos.

Important: All AWS connectors require an AWS account with permission to access the features provided by Amazon. This account is separate to the Alfresco hosted environment and should be created and managed by customers.

Lambda

The INVOKE action is used by the Lambda connector to invoke Amazon Web Services (AWS) Lambda functions.

The input parameters to invoke a Lambda function are:

Parameter Type Description
function String Required. The name of the Lambda function to invoke, for example lambda-2.
payload JSON Optional. The payload that will be passed to the Lambda function as a JSON object.

The output parameters from invoking a Lambda function are:

Parameter Type Description
lambdaPayload JSON Optional. The Lambda function results payload.
lambdaStatus Integer Optional. The Lambda function invocation status code.
lambdaLog String Optional. The log produced during the function invocation.

Lambda configuration parameters

The configuration parameters for the Lambda connector are:

Parameter Description
AWS_LAMBDA_AWS_ACCESS_KEY Required. The access key to authenticate against AWS.
AWS_LAMBDA_AWS_SECRET_KEY Required. The secret key to authenticate against AWS.
AWS_LAMBDA_AWS_REGION Required. The region of AWS to invoke the Lambda functions.

Lambda errors

The possible errors that can be handled by the Lambda connector are:

Error Description
MISSING_INPUT A mandatory input variable was not provided.
INVALID_INPUT The input variable has an invalid type.
SERVICE_ERROR The service encountered an internal error.
INVALID_REQUEST The request body could not be parsed as JSON.
REQUEST_TOO_LARGE The request payload exceeded the Invoke request body JSON input limit.
UNKNOWN_ERROR Unexpected runtime error.
BAD_REQUEST The server could not understand the request due to invalid syntax.
UNAUTHORIZED The request has not been applied because it lacks valid authentication.
FORBIDDEN The server understood the request but refuses to authorize it.
NOT_FOUND The server could not find what was requested.
METHOD_NOT_ALLOWED The request method is known by the server but is not supported.
NOT_ACCEPTABLE The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT The server would like to shut down this unused connection.
CONFLICT The request conflicts with current state of the server.
GONE No longer available.
UNPROCESSABLE_ENTITY The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED The resource that is being accessed is locked.
FAILED_DEPENDENCY The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED The request method is not supported by the server and cannot be handled.
BAD_GATEWAY The server got an invalid response.
SERVICE_UNAVAILABLE The server is not ready to handle the request.
GATEWAY_TIMEOUT The server is acting as a gateway and cannot get a response in time.

Comprehend

The Comprehend connector provides a standard mechanism to extract entities and Personally identifiable information (PII) entities from text in your documents. The ENTITY action is used by the Comprehend connector to execute Amazon Comprehend natural language processing (NLP) services and identify and analyze text from specific plain text files. The Comprehend connector supports default entity recognition, custom entity recognition, and custom document classification.

Note: The Comprehend connector can only receive either files or text but not both at the same time.

The Comprehend connector can extract entities and PII from the following file formats:

  • text/plain
  • application/x-tar
  • /zip
  • /vnd.ms-outlook
  • /pdf (max size in bytes: 26214400)
  • /msword
  • /vnd.ms-project
  • /vnd.ms-outlook
  • /vnd.ms-powerpoint
  • /vnd.visio
  • /vnd.ms-excel
  • /vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • /vnd.ms-word.document.macroenabled.12
  • /vnd.openxmlformats-officedocument.wordprocessingml.document
  • /vnd.ms-word.template.macroenabled.12
  • /vnd.openxmlformats-officedocument.wordprocessingml.template
  • /vnd.ms-powerpoint.template.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.template
  • /vnd.ms-powerpoint.addin.macroenabled.12
  • /vnd.ms-powerpoint.slideshow.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.slideshow
  • /vnd.ms-powerpoint.presentation.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.presentation
  • /vnd.ms-powerpoint.slide.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.slide
  • /vnd.ms-excel.addin.macroenabled.12
  • /vnd.ms-excel.sheet.binary.macroenabled.12
  • /vnd.ms-excel.sheet.macroenabled.12
  • /vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • /vnd.ms-excel.template.macroenabled.12
  • /vnd.openxmlformats-officedocument.spreadsheetml.template
  • /x-cpio
  • /java-archive
  • /x-netcdf
  • /msword
  • /vnd.ms-word.document.macroenabled.12
  • /vnd.openxmlformats-officedocument.wordprocessingml.document
  • /vnd.ms-word.template.macroenabled.12
  • /vnd.openxmlformats-officedocument.wordprocessingml.template
  • /x-gzip
  • /x-hdf
  • text/html
  • /vnd.apple.keynote
  • /vnd.ms-project
  • /vnd.apple.numbers
  • /vnd.oasis.opendocument.chart
  • /vnd.oasis.opendocument.image
  • /vnd.oasis.opendocument.text-master
  • /vnd.oasis.opendocument.presentation
  • /vnd.oasis.opendocument.spreadsheet
  • /vnd.oasis.opendocument.text
  • /ogg
  • /vnd.oasis.opendocument.text-web
  • /vnd.oasis.opendocument.presentation-template
  • /vnd.oasis.opendocument.spreadsheet-template
  • /vnd.oasis.opendocument.text-template
  • /vnd.apple.pages
  • /pdf "maxSourceSizeBytes": 26214400,
  • /vnd.ms-powerpoint.template.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.template
  • /vnd.ms-powerpoint.addin.macroenabled.12
  • /vnd.ms-powerpoint.slideshow.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.slideshow
  • /vnd.ms-powerpoint
  • /vnd.ms-powerpoint.presentation.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.presentation
  • /x-rar-compressed
  • /rss+xml
  • /rtf
  • /vnd.ms-powerpoint.slide.macroenabled.12
  • /vnd.openxmlformats-officedocument.presentationml.slide
  • /vnd.sun.xml.writer
  • text/xml
  • /vnd.visio
  • /xhtml+xml
  • /vnd.ms-excel.addin.macroenabled.12
  • /vnd.ms-excel
  • /vnd.ms-excel.sheet.binary.macroenabled.12
  • /vnd.ms-excel.sheet.macroenabled.12
  • /vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • /vnd.ms-excel.template.macroenabled.12
  • /vnd.openxmlformats-officedocument.spreadsheetml.template
  • /x-compress
  • text/csv
  • /msword

AWS Configuration

The Amazon Comprehend APIs that are called using the connector are:

To perform these calls it uses the AWS Comprehend SDK. This requires IAM users with the correct permissions to be created. The easiest way to do this is to give an IAM user the AWS managed policy ComprehendFullAccess. If you want to be stricter with access rights see the list of all comprehend API permissions.

The Asynchronous calls also require the ability to read and write to an Amazon S3 bucket so the IAM user must have access to the configured bucket. As well as the IAM user accessing the data the Comprehend service itself requires access, for more see Role-Based Permissions Required for Asynchronous Operations .

To allow the library to use this IAM user when communicating with the Comprehend service an AWS access key and secret key must be available, for more see Using the Default Credential Provider Chain

DetectDominantLanguage

You need to supply the calls that detect the language of the text document that is going to be processed. To do this, the connector calls the DetectDominantLanguage API. The DetectDominantLanguage call only works on text smaller than a configurable limit, the default is 5000 bytes. The connector uses the first bytes/characters of the document to determine what language to use when making calls to AWS Comprehend to determine which language is being used.

The DetectDominantLanguage service currently supports a greater set of languages than the entity detection services. It does this by checking the returned language against a configurable list of available languages.

Note: Currently only EN and ES are supported by AWS entity detection. If the detected language is not in this list a configurable default language is used instead, which is EN by default.

Entities

The following are the different types of entities:

Entity
DetectEntities
BatchDetectEntities
StartEntitiesDetectionJob
DescribeEntitiesDetectionJob

Depending on the size of the input the connector will process it in a different a way.

The DetectEntities operation will be called if the supplied text file is smaller than a configurable limit, by default 5000 bytes. If the input file is larger than this then a different API must be used.

The BatchDetectEntitie is used if the file is larger than the DetectEntities limit although it also has a configurable limit, by default 125000 bytes. When you use the Batch API call the input file is split into chunks of less than the configurable limit, by default 5000 bytes.

The StartEntitiesDetectionJob and DescribeEntititesDetectionJob are used if the input file is larger than the BatchDetectEntities limit, by default 5000 bytes. Similar to the batch approach, the input file will be divided into a set of smaller files of a certain configured size. When dividing the original file, the engine ensures that it only includes full words and does not split on a non whitespace character.

The divided files are then uploaded to Amazon S3 using the same key prefix for all files. When all of them have been uploaded an asynchronous entity detection job is started. This is then followed by a polling process to check the status of the job until it finishes or the timeout is reached.

If the asynchronous job finishes successfully a compressed output file (output.tar.gz) with the result will be written by Amazon Comprehend. The file will be saved to the same bucket within a directory that is using the same key prefix. For more see Asynchronous Batch Processing . The output file is downloaded from Amazon S3 and parsed into a BatchDetectEntitiesResult object. At the end of the process, all the resource files are cleaned, both locally and at Amazon S3.

PII Entities

Entity
DetectPiiEntities
StartPiiEntitiesDetectionJob
DescribePiiEntitiesDetectionJob

Depending on the size of the input the connector will process it in a different way.

The DetectPiiEntities operation will be called if the supplied text file is smaller than a configurable limit, by default 5000 bytes. If the input file is larger than this then a different API must be used.

The StartPiiEntitiesDetectionJob and DescribePiiEntitiesDetectionJob are used if the input file is larger than the AsynchDetectPIIEntities limit, by default 5000 bytes. The input file will be divided into a set of smaller files of a certain configured size. When dividing the original file, the engine ensures that it only includes full words and will not split on a non whitespace character.

The divided files are then uploaded to Amazon S3 using the same key prefix for all files. When all of them have been uploaded an asynchronous entity detection job is started. This is then followed by a polling process to check the status of the job until it finishes or the timeout is reached.

If the asynchronous job finishes successfully a compressed output file (output.tar.gz) with the result will be written by Amazon Comprehend. The file will be saved to the same bucket within a directory that is using the same key prefix. For more see Asynchronous Batch Processing . The output file is downloaded from Amazon S3 and parsed into a BatchDetectPiiResult object. At the end of the process, all the resource files are cleaned, both locally and at Amazon S3.

The StartDocumentClassificationJob operation is always performed asynchronously. It requires a custom model and the classifier ARN must be provided. You can provide the custom classification ARN in two ways:

  1. Use the AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN environment variable when deploying the application.

  2. Use the customClassificationArn input variable in the connector action. If the variable is not provided the AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN value is used.

BPMN Tasks Configuration

The following describes an example of how the text analysis connector is setup in AAE:

BPMN

As part of the BPMN definition process, any service task responsible for triggering the text analysis comprehend.ENTITY has to be set as the value for its implementation attribute.

The following variables must be configured for the text analysis to function.

The input parameters of Entity detection are:

Parameter Type Description
files Array Optional. The file to be analysed. If multiple files are passed, then only the first one will be analysed.
text String Optional. The Text to be analysed. If the files parameter is set, then this should be left blank.
maxEntities Integer Optional. The maximum number of entities that is returned. The parameter defaults to ${aws.comprehend.defaultMaxResults}.
confidenceLevel Float Optional. The minimum confidence level for a entity expressed by a float number between 0 and 1. The parameter defaults to ${aws.comprehend.defaultConfidence}.
timeout Integer Optional. The timeout for the remote call to the Comprehend service in milliseconds. The parameter defaults to ${aws.comprehend.asynchTimeout}.
customRecognizerArn String Optional. The custom recognizer ARN endpoint. If left blank, the Comprehend service will use the value given to the AWS_COMPREHEND_CUSTOM_RECOGNIZER_ARN environment variable.

Note: The connector must receive either files or text but not both at the same time.

The following is an example of the POST body for the Activiti REST API http:////rb/v1/process-instances endpoint:

{ "processDefinitionKey": "TextAnalysisProcessTest", "processInstanceName": "processTextAnalysisTest_Simple", "businessKey": "MyBusinessKey", "variables": { "file" : [ { "nodeId":"ad844189-9afb-4afb-965b-380db73022aa" } ], "maxEntities" : 50, "confidenceLevel" : "0.95", "timeout" : 10000 }, "payloadType":"StartProcessPayload" }

In the business process definition, the text analysis service task called textAnalysisTask has the implementation attribute configured to use the connector.

<bpmn2:serviceTask id="ServiceTask_0j2v2yc" name="textAnalysisTask" implementation="comprehend.ENTITY"> <bpmn2:incoming>SequenceFlow_0kibr65</bpmn2:incoming> <bpmn2:outgoing>SequenceFlow_1048knn</bpmn2:outgoing> </bpmn2:serviceTask>

The input parameters of PII Entity detection are:

Parameter Type Description
files Array Optional. The file to be analysed. If multiple files are passed, then only the first one will be analysed.
text String Optional. The Text to be analysed. If the files parameter is set, then this should be left blank.
maxEntities Integer Optional. The maximum number of entities that is returned. The parameter defaults to ${aws.comprehend.defaultMaxResults}.
confidenceLevel Float Optional. The minimum confidence level for a entity expressed by a float number between 0 and 1. The parameter defaults to ${aws.comprehend.defaultConfidence}.
timeout Integer Optional. The timeout for the remote call to the Comprehend service in milliseconds. The parameter defaults to ${aws.comprehend.asynchTimeout}.

Note: The connector must receive either files or text but not both at the same time.

The input parameters of Document classification are:

Parameter Type Description
files Array Optional. The file to be analysed. If multiple files are passed, then only the first one will be analysed.
text String Optional. The Text to be analysed. If the files parameter is set, then this should be left blank.
maxEntities Integer Optional. The maximum number of entities that is returned. The parameter defaults to ${aws.comprehend.defaultMaxResults}.
confidenceLevel Float Optional. The minimum confidence level for a entity expressed by a float number between 0 and 1. The parameter defaults to ${aws.comprehend.defaultConfidence}.
timeout Integer Optional. The timeout for the remote call to the Comprehend service in milliseconds. The parameter defaults to ${aws.comprehend.asynchTimeout}.
customClassificationArn String Optional. The custom recognizer ARN endpoint. If left blank, the Comprehend service will use the value given to the AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN environment variable. Note: The AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN environment variable does not have a default value. If it is being used then you must also set a value for it.

Note: The connector must receive either files or text but not both at the same time.

The output parameters from the entity detection are:

Parameter Type Description
awsResponse JSON Optional. The result of the analysis from the Comprehend service.
aisResponse JSON Optional. The result of the analysis in Alfresco Intelligence Service format.
entities JSON Optional. The result object containing the entities detected.

The output parameters from the PII entity detection are:

Parameter Type Description
awsResponse JSON Optional. The result of the analysis from the Comprehend service.
piiEntityTypes JSON Optional. The result object containing the PII entities detected.

The output parameters from the Document classification are:

Parameter Type Description
awsResponse Object Optional. The object that contains the original result of the text analysis performed by the Comprehend service.
documentClassificationClasses Array Optional. An array that contains the list of the different classes detected in the analysis.

Comprehend configuration parameters

The configuration parameters for the Comprehend connector are:

Parameter Description
AWS_ACCESS_KEY_ID Required. The access key to authenticate against AWS.
AWS_SECRET_KEY Required. The secret key to authenticate against AWS.
AWS_REGION Required. The region of AWS to use the Comprehend service.
AWS_S3_BUCKET Required. The name of the S3 bucket to use.
AWS_COMPREHEND_ROLE_ARN Required. The Amazon Resource Name for Comprehend to use.

Comprehend errors

The possible errors that can be handled by the Comprehend connector are:

Error Description
MISSING_INPUT A mandatory input variable was not provided.
INVALID_INPUT The input variable has an invalid type.
INVALID_RESULT_FORMAT The REST service result payload cannot be parsed.
TEXT_SIZE_LIMIT_EXCEEDED The size of the input text exceeds the limit.
TOO_MANY_REQUEST The request throughput limit was exceeded.
UNSUPPORTED_LANGUAGE The language of the input text can’t be processed.
CLIENT_EXECUTION_TIMEOUT The execution ends because of timeout.
UNKNOWN_ERROR Unexpected runtime error.
BAD_REQUEST The server could not understand the request due to invalid syntax.
UNAUTHORIZED The request has not been applied because it lacks valid authentication.
FORBIDDEN The server understood the request but refuses to authorize it.
NOT_FOUND The server could not find what was requested.
METHOD_NOT_ALLOWED The request method is known by the server but is not supported.
NOT_ACCEPTABLE The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT The server would like to shut down this unused connection.
CONFLICT The request conflicts with current state of the server.
GONE No longer available.
UNPROCESSABLE_ENTITY The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED The resource that is being accessed is locked.
FAILED_DEPENDENCY The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED The request method is not supported by the server and cannot be handled.
BAD_GATEWAY The server got an invalid response.
SERVICE_UNAVAILABLE The server is not ready to handle the request.
GATEWAY_TIMEOUT The server is acting as a gateway and cannot get a response in time.

Limitations

PDF files larger than 26214400 bytes are not supported.

Rekognition

The LABEL action is used by the Rekognition connector to execute Amazon Rekognition services to identify and label the objects in JPEG and PNG files that are less than 15mb in size.

The Amazon Rekognition API that is called is the Detect Labels API.

Files between 5mb and 15mb are uploaded to an Amazon S3 bucket before processing. The IAM user configured to run the Rekognition service requires access to this bucket, the Rekognition service itself and to have the rekognition:DetectLabels permission.

The input parameters of the Rekognition connector are:

Parameter Type Description
file File Required. A variable of type file to send for analysis.
mediaType String Optional. The media type of the file to be analyzed, for example /octect-stream.
maxLabels Integer Optional. The maximum number of labels to be return. The default value is 10.
confidenceLevel String Optional. The confidence level to use in the analysis between 0 and 1, for example 0.75.
timeout Integer Optional. The timeout period for calling the Rekognition service in milliseconds, for example 910000.

The output parameters from the Rekognition analysis are:

Parameter Type Description
awsResponse JSON Optional. The result of the analysis from the Rekognition service.
aisResponse JSON Optional. The result of the image analysis in Alfresco Intelligence Service format.
labels JSON Optional. The result object containing the labels detected.

Rekognition configuration parameters

The configuration parameters for the Rekognition connector are:

Parameter Description
AWS_ACCESS_KEY_ID Required. The access key to authenticate against AWS.
AWS_SECRET_KEY Required. The secret key to authenticate against AWS.
AWS_REGION Required. The region of AWS to use the Rekognition service in.
AWS_S3_BUCKET Required. The name of the S3 bucket to use.

Rekognition errors

The possible errors that can be handled by the Rekognition connector are:

Error Description
MISSING_INPUT A mandatory input variable was not provided.
INVALID_INPUT The input variable has an invalid type.
INVALID_RESULT_FORMAT The REST service result payload cannot be parsed.
PROVISIONED_THROUGHPUT_EXCEEDED The number of requests exceeded your throughput limit.
ACCESS_DENIED The user is not authorized to perform the action.
IMAGE_TOO_LARGE The input image size exceeds the allowed limit.
INVALID_IMAGE_FORMAT The provided image format is not supported.
LIMIT_EXCEEDED The service limit was exceeded.
THROTTLING_ERROR The service is temporarily unable to process the request.
UNKNOWN_ERROR Unexpected runtime error.
BAD_REQUEST The server could not understand the request due to invalid syntax.
UNAUTHORIZED The request has not been applied because it lacks valid authentication.
FORBIDDEN The server understood the request but refuses to authorize it.
NOT_FOUND The server could not find what was requested.
METHOD_NOT_ALLOWED The request method is known by the server but is not supported.
NOT_ACCEPTABLE The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT The server would like to shut down this unused connection.
CONFLICT The request conflicts with current state of the server.
GONE No longer available.
UNPROCESSABLE_ENTITY The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED The resource that is being accessed is locked.
FAILED_DEPENDENCY The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED The request method is not supported by the server and cannot be handled.
BAD_GATEWAY The server got an invalid response.
SERVICE_UNAVAILABLE The server is not ready to handle the request.
GATEWAY_TIMEOUT The server is acting as a gateway and cannot get a response in time.

Textract

The EXTRACT action is used by the Textract connector to execute Amazon Textract to extract text and metadata from JPEG and PNG files that are less than 5mb in size.

The Amazon Textract APIs called are the Detect Document Text API which joins all LINE block objects with a line separator between them and the Analyze Document API with FORM and TABLES analysis.

The IAM user configured to run the Textract services needs to have the textract:DetectDocumentText and textract:AnalyzeDocument permissions.

The input parameters of the Textract connector are:

Parameter Type Description
file File Required. A variable of type file to send for extraction.
outputFormat String Optional. The format of the output file. Possible values are JSON and TXT. The default value is JSON.
confidenceLevel String Optional. The confidence level to use in the analysis between 0 and 1, for example 0.75.
timeout Integer Optional. The timeout period for calling the Textract service in milliseconds, for example 910000.

The output parameters from the Textract analysis are:

Parameter Type Description
awsResult JSON Optional. The result of the analysis from the Textract service.

Textract configuration parameters

The configuration parameters for the Textract connector are:

Parameter Description
AWS_ACCESS_KEY_ID Required. The access key to authenticate against AWS.
AWS_SECRET_KEY Required. The secret key to authenticate against AWS.
AWS_REGION Required. The region of AWS to use the Textract service.
AWS_S3_BUCKET Required. The name of the S3 bucket to use.

Textract errors

The possible errors that can be handled by the Textract connector are:

Error Description
MISSING_INPUT A mandatory input variable was not provided.
INVALID_INPUT The input variable has an invalid type.
INVALID_RESULT_FORMAT The REST service result payload cannot be parsed.
PROVISIONED_THROUGHPUT_EXCEEDED The number of requests exceeded your throughput limit.
ACCESS_DENIED The user is not authorized to perform the action.
IMAGE_TOO_LARGE The input image size exceeds the allowed limit.
INVALID_IMAGE_FORMAT The provided image format is not supported.
LIMIT_EXCEEDED The service limit was exceeded.
THROTTLING_ERROR The service is temporarily unable to process the request.
UNKNOWN_ERROR Unexpected runtime error.
BAD_REQUEST The server could not understand the request due to invalid syntax.
UNAUTHORIZED The request has not been applied because it lacks valid authentication.
FORBIDDEN The server understood the request but refuses to authorize it.
NOT_FOUND The server could not find what was requested.
METHOD_NOT_ALLOWED The request method is known by the server but is not supported.
NOT_ACCEPTABLE The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT The server would like to shut down this unused connection.
CONFLICT The request conflicts with current state of the server.
GONE No longer available.
UNPROCESSABLE_ENTITY The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED The resource that is being accessed is locked.
FAILED_DEPENDENCY The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED The request method is not supported by the server and cannot be handled.
BAD_GATEWAY The server got an invalid response.
SERVICE_UNAVAILABLE The server is not ready to handle the request.
GATEWAY_TIMEOUT The server is acting as a gateway and cannot get a response in time.

Transcribe

The transcribe connector provides a standard mechanism to obtain speech to text information from audio and video files using Amazon Transcribe.

Installation

The connector is a Spring Boot application that is included as a separate service of your AAE deployment.

AWS Configuration

Alfresco recommends you access AWS using AWS Identity and Access Management (IAM). To use IAM to access AWS, create an IAM user, add the user to an IAM group with administrative permissions, and then grant administrative permissions to the IAM user. You can then access AWS using a special URL and the IAM user’s credentials.

BPMN Tasks Configuration

As part of BPMN definition process, any service task responsible for triggering speech to text needs to set transcribe.TRANSCRIBE as the value for its implementation attribute.

In addition to the above configuration, these variables are required to perform the audio analysis:

The input parameters of the transcribe connector are:

Parameter Type Description
file Array Required. File to be transcribed. If multiple files are passed, only the first one will be processed.
timeout Integer Optional. Timeout for the remote call to transcribe service in milliseconds. The default is ${aws.transcribe.asynchTimeout}.
generateWebVTT Boolean Optional The output webVTT is only populated if generateWebVTT is set to true.

The output parameters of the Transcribe connector are:

Parameter Type Description
awsResult JSON Optional. Result of the AWS Transcribe speech to text process.
transcription String Required. Transcription result.
webVTT JSON Optional Subtitles result in Web Video Text Tracks format.

Transcribe configuration parameters

The configuration parameters for the Transcribe connector are:

Parameter Description
AWS_ACCESS_KEY_ID Required. The access key to authenticate against AWS.
AWS_SECRET_KEY Required. The secret key to authenticate against AWS.
AWS_REGION Required. The region of AWS to use the Textract service.
AWS_S3_BUCKET Required. The name of the S3 bucket to use.
AWS_TRANSCRIBE_LANGUAGES List of comma separated languages that are spoken in the audio/video file.

Transcribe errors

The possible errors that can be handled by the Transcribe connector are:

Error Description
MISSING_INPUT A mandatory input variable was not provided.
INVALID_INPUT The input variable has an invalid type.
INVALID_RESULT_FORMAT The REST service result payload cannot be parsed.
LIMIT_EXCEEDED The service limit was exceeded.
ACCESS_DENIED The user is not authorized to perform the action.
INTERNAL_FAILURE An internal Amazon Lex error occurred.
UNKNOWN_ERROR Unexpected runtime error.
BAD_REQUEST The server could not understand the request due to invalid syntax.
UNAUTHORIZED The request has not been applied because it lacks valid authentication.
FORBIDDEN The server understood the request but refused to authorize it.
NOT_FOUND The server could not find what was requested.
METHOD_NOT_ALLOWED The request method is known by the server but is not supported.
NOT_ACCEPTABLE The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT The server is requesting to shut down this unused connection.
CONFLICT The request conflicts with the current state of the server.
GONE No longer available.
UNPROCESSABLE_ENTITY The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED The resource that is being accessed is locked.
FAILED_DEPENDENCY The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR The server has encountered a situation and does not know how to handle it.
NOT_IMPLEMENTED The request method is not supported by the server and cannot be handled.
BAD_GATEWAY The server got an invalid response.
SERVICE_UNAVAILABLE The server is not ready to handle the request.
GATEWAY_TIMEOUT The server is acting as a gateway and cannot get a response in time.

Limitations

Minimum confidence is not currently supported. The confidence is however included as part of the response.

Edit this page

Suggest an edit on GitHub
By clicking "Accept Cookies", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View Cookie Policy.