Alfresco Docs - AWS connectors

There are five connectors that can be used to invoke different Amazon Web Services (AWS):

Lambda
Comprehend
Rekognition
Textract
Transcribe

All Amazon connectors are displayed on the process diagram with their respective AWS logos.

Important: All AWS connectors require an AWS account with permission to access the features provided by Amazon. This account is separate to the Alfresco hosted environment and should be created and managed by customers.

Lambda

The INVOKE action is used by the Lambda connector to invoke Amazon Web Services (AWS) Lambda functions.

The input parameters to invoke a Lambda function are:

Parameter	Type	Description
function	String	Required. The name of the Lambda function to invoke, for example `lambda-2`.
payload	JSON	Optional. The payload that will be passed to the Lambda function as a JSON object.

The output parameters from invoking a Lambda function are:

Parameter	Type	Description
lambdaPayload	JSON	Optional. The Lambda function results payload.
lambdaStatus	Integer	Optional. The Lambda function invocation status code.
lambdaLog	String	Optional. The log produced during the function invocation.

Lambda configuration parameters

The configuration parameters for the Lambda connector are:

Parameter	Description
AWS_LAMBDA_AWS_ACCESS_KEY	Required. The access key to authenticate against AWS.
AWS_LAMBDA_AWS_SECRET_KEY	Required. The secret key to authenticate against AWS.
AWS_LAMBDA_AWS_REGION	Required. The region of AWS to invoke the Lambda functions.

Lambda errors

The possible errors that can be handled by the Lambda connector are:

Error	Description
MISSING_INPUT	A mandatory input variable was not provided.
INVALID_INPUT	The input variable has an invalid type.
SERVICE_ERROR	The service encountered an internal error.
INVALID_REQUEST	The request body could not be parsed as JSON.
REQUEST_TOO_LARGE	The request payload exceeded the Invoke request body JSON input limit.
UNKNOWN_ERROR	Unexpected runtime error.
BAD_REQUEST	The server could not understand the request due to invalid syntax.
UNAUTHORIZED	The request has not been applied because it lacks valid authentication.
FORBIDDEN	The server understood the request but refuses to authorize it.
NOT_FOUND	The server could not find what was requested.
METHOD_NOT_ALLOWED	The request method is known by the server but is not supported.
NOT_ACCEPTABLE	The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT	The server would like to shut down this unused connection.
CONFLICT	The request conflicts with current state of the server.
GONE	No longer available.
UNPROCESSABLE_ENTITY	The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED	The resource that is being accessed is locked.
FAILED_DEPENDENCY	The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR	The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED	The request method is not supported by the server and cannot be handled.
BAD_GATEWAY	The server got an invalid response.
SERVICE_UNAVAILABLE	The server is not ready to handle the request.
GATEWAY_TIMEOUT	The server is acting as a gateway and cannot get a response in time.

Comprehend

The Comprehend connector provides a standard mechanism to extract entities and Personally identifiable information (PII) entities from text in your documents. The ENTITY action is used by the Comprehend connector to execute Amazon Comprehend natural language processing (NLP) services and identify and analyze text from specific plain text files. The Comprehend connector supports default entity recognition, custom entity recognition, and custom document classification.

Note: The Comprehend connector can only receive either files or text but not both at the same time.

The Comprehend connector can extract entities and PII from the following file formats:

text/plain
application/x-tar
/zip
/vnd.ms-outlook
/pdf (max size in bytes: 26214400)
/msword
/vnd.ms-project
/vnd.ms-outlook
/vnd.ms-powerpoint
/vnd.visio
/vnd.ms-excel
/vnd.openxmlformats-officedocument.spreadsheetml.sheet
/vnd.ms-word.document.macroenabled.12
/vnd.openxmlformats-officedocument.wordprocessingml.document
/vnd.ms-word.template.macroenabled.12
/vnd.openxmlformats-officedocument.wordprocessingml.template
/vnd.ms-powerpoint.template.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.template
/vnd.ms-powerpoint.addin.macroenabled.12
/vnd.ms-powerpoint.slideshow.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.slideshow
/vnd.ms-powerpoint.presentation.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.presentation
/vnd.ms-powerpoint.slide.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.slide
/vnd.ms-excel.addin.macroenabled.12
/vnd.ms-excel.sheet.binary.macroenabled.12
/vnd.ms-excel.sheet.macroenabled.12
/vnd.openxmlformats-officedocument.spreadsheetml.sheet
/vnd.ms-excel.template.macroenabled.12
/vnd.openxmlformats-officedocument.spreadsheetml.template
/x-cpio
/java-archive
/x-netcdf
/msword
/vnd.ms-word.document.macroenabled.12
/vnd.openxmlformats-officedocument.wordprocessingml.document
/vnd.ms-word.template.macroenabled.12
/vnd.openxmlformats-officedocument.wordprocessingml.template
/x-gzip
/x-hdf
text/html
/vnd.apple.keynote
/vnd.ms-project
/vnd.apple.numbers
/vnd.oasis.opendocument.chart
/vnd.oasis.opendocument.image
/vnd.oasis.opendocument.text-master
/vnd.oasis.opendocument.presentation
/vnd.oasis.opendocument.spreadsheet
/vnd.oasis.opendocument.text
/ogg
/vnd.oasis.opendocument.text-web
/vnd.oasis.opendocument.presentation-template
/vnd.oasis.opendocument.spreadsheet-template
/vnd.oasis.opendocument.text-template
/vnd.apple.pages
/pdf "maxSourceSizeBytes": 26214400,
/vnd.ms-powerpoint.template.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.template
/vnd.ms-powerpoint.addin.macroenabled.12
/vnd.ms-powerpoint.slideshow.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.slideshow
/vnd.ms-powerpoint
/vnd.ms-powerpoint.presentation.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.presentation
/x-rar-compressed
/rss+xml
/rtf
/vnd.ms-powerpoint.slide.macroenabled.12
/vnd.openxmlformats-officedocument.presentationml.slide
/vnd.sun.xml.writer
text/xml
/vnd.visio
/xhtml+xml
/vnd.ms-excel.addin.macroenabled.12
/vnd.ms-excel
/vnd.ms-excel.sheet.binary.macroenabled.12
/vnd.ms-excel.sheet.macroenabled.12
/vnd.openxmlformats-officedocument.spreadsheetml.sheet
/vnd.ms-excel.template.macroenabled.12
/vnd.openxmlformats-officedocument.spreadsheetml.template
/x-compress
text/csv
/msword

AWS Configuration

The Amazon Comprehend APIs that are called using the connector are:

To perform these calls it uses the AWS Comprehend SDK. This requires IAM users with the correct permissions to be created. The easiest way to do this is to give an IAM user the AWS managed policy ComprehendFullAccess. If you want to be stricter with access rights see the list of all comprehend API permissions.

The Asynchronous calls also require the ability to read and write to an Amazon S3 bucket so the IAM user must have access to the configured bucket. As well as the IAM user accessing the data the Comprehend service itself requires access, for more see Role-Based Permissions Required for Asynchronous Operations .

To allow the library to use this IAM user when communicating with the Comprehend service an AWS access key and secret key must be available, for more see Using the Default Credential Provider Chain

DetectDominantLanguage

You need to supply the calls that detect the language of the text document that is going to be processed. To do this, the connector calls the DetectDominantLanguage API. The DetectDominantLanguage call only works on text smaller than a configurable limit, the default is 5000 bytes. The connector uses the first bytes/characters of the document to determine what language to use when making calls to AWS Comprehend to determine which language is being used.

The DetectDominantLanguage service currently supports a greater set of languages than the entity detection services. It does this by checking the returned language against a configurable list of available languages.

Note: Currently only EN and ES are supported by AWS entity detection. If the detected language is not in this list a configurable default language is used instead, which is EN by default.

Entities

The following are the different types of entities:

Entity
DetectEntities
BatchDetectEntities
StartEntitiesDetectionJob
DescribeEntitiesDetectionJob

Depending on the size of the input the connector will process it in a different a way.

The DetectEntities operation will be called if the supplied text file is smaller than a configurable limit, by default 5000 bytes. If the input file is larger than this then a different API must be used.

The BatchDetectEntitie is used if the file is larger than the DetectEntities limit although it also has a configurable limit, by default 125000 bytes. When you use the Batch API call the input file is split into chunks of less than the configurable limit, by default 5000 bytes.

The StartEntitiesDetectionJob and DescribeEntititesDetectionJob are used if the input file is larger than the BatchDetectEntities limit, by default 5000 bytes. Similar to the batch approach, the input file will be divided into a set of smaller files of a certain configured size. When dividing the original file, the engine ensures that it only includes full words and does not split on a non whitespace character.

The divided files are then uploaded to Amazon S3 using the same key prefix for all files. When all of them have been uploaded an asynchronous entity detection job is started. This is then followed by a polling process to check the status of the job until it finishes or the timeout is reached.

If the asynchronous job finishes successfully a compressed output file (output.tar.gz) with the result will be written by Amazon Comprehend. The file will be saved to the same bucket within a directory that is using the same key prefix. For more see Asynchronous Batch Processing . The output file is downloaded from Amazon S3 and parsed into a BatchDetectEntitiesResult object. At the end of the process, all the resource files are cleaned, both locally and at Amazon S3.

PII Entities

Entity
DetectPiiEntities
StartPiiEntitiesDetectionJob
DescribePiiEntitiesDetectionJob

Depending on the size of the input the connector will process it in a different way.

The DetectPiiEntities operation will be called if the supplied text file is smaller than a configurable limit, by default 5000 bytes. If the input file is larger than this then a different API must be used.

The StartPiiEntitiesDetectionJob and DescribePiiEntitiesDetectionJob are used if the input file is larger than the AsynchDetectPIIEntities limit, by default 5000 bytes. The input file will be divided into a set of smaller files of a certain configured size. When dividing the original file, the engine ensures that it only includes full words and will not split on a non whitespace character.

If the asynchronous job finishes successfully a compressed output file (output.tar.gz) with the result will be written by Amazon Comprehend. The file will be saved to the same bucket within a directory that is using the same key prefix. For more see Asynchronous Batch Processing . The output file is downloaded from Amazon S3 and parsed into a BatchDetectPiiResult object. At the end of the process, all the resource files are cleaned, both locally and at Amazon S3.

The StartDocumentClassificationJob operation is always performed asynchronously. It requires a custom model and the classifier ARN must be provided. You can provide the custom classification ARN in two ways:

Use the AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN environment variable when deploying the application.
Use the customClassificationArn input variable in the connector action. If the variable is not provided the AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN value is used.

BPMN Tasks Configuration

The following describes an example of how the text analysis connector is setup in AAE:

BPMN

As part of the BPMN definition process, any service task responsible for triggering the text analysis comprehend.ENTITY has to be set as the value for its implementation attribute.

The following variables must be configured for the text analysis to function.

The input parameters of Entity detection are:

Parameter	Type	Description
files	Array	Optional. The file to be analysed. If multiple files are passed, then only the first one will be analysed.
text	String	Optional. The Text to be analysed. If the `files` parameter is set, then this should be left blank.
maxEntities	Integer	Optional. The maximum number of entities that is returned. The parameter defaults to `${aws.comprehend.defaultMaxResults}`.
confidenceLevel	Float	Optional. The minimum confidence level for a entity expressed by a float number between 0 and 1. The parameter defaults to `${aws.comprehend.defaultConfidence}`.
timeout	Integer	Optional. The timeout for the remote call to the Comprehend service in milliseconds. The parameter defaults to `${aws.comprehend.asynchTimeout}`.
customRecognizerArn	String	Optional. The custom recognizer ARN endpoint. If left blank, the Comprehend service will use the value given to the `AWS_COMPREHEND_CUSTOM_RECOGNIZER_ARN` environment variable.

Note: The connector must receive either files or text but not both at the same time.

The following is an example of the POST body for the Activiti REST API http:////rb/v1/process-instances endpoint:


{
  "processDefinitionKey": "TextAnalysisProcessTest",
  "processInstanceName": "processTextAnalysisTest_Simple",
  "businessKey": "MyBusinessKey",
  "variables": {
    "file" : [
        {
            "nodeId":"ad844189-9afb-4afb-965b-380db73022aa"
    	}
    ],
  	"maxEntities" : 50,
  	"confidenceLevel" : "0.95",
	"timeout" : 10000
  },
  "payloadType":"StartProcessPayload"
}

In the business process definition, the text analysis service task called textAnalysisTask has the implementation attribute configured to use the connector.


<bpmn2:serviceTask id="ServiceTask_0j2v2yc" name="textAnalysisTask" implementation="comprehend.ENTITY">
  <bpmn2:incoming>SequenceFlow_0kibr65</bpmn2:incoming>
  <bpmn2:outgoing>SequenceFlow_1048knn</bpmn2:outgoing>
</bpmn2:serviceTask>

The input parameters of PII Entity detection are:

Parameter	Type	Description
files	Array	Optional. The file to be analysed. If multiple files are passed, then only the first one will be analysed.
text	String	Optional. The Text to be analysed. If the `files` parameter is set, then this should be left blank.
maxEntities	Integer	Optional. The maximum number of entities that is returned. The parameter defaults to `${aws.comprehend.defaultMaxResults}`.
confidenceLevel	Float	Optional. The minimum confidence level for a entity expressed by a float number between 0 and 1. The parameter defaults to `${aws.comprehend.defaultConfidence}`.
timeout	Integer	Optional. The timeout for the remote call to the Comprehend service in milliseconds. The parameter defaults to `${aws.comprehend.asynchTimeout}`.

Note: The connector must receive either files or text but not both at the same time.

The input parameters of Document classification are:

Parameter	Type	Description
files	Array	Optional. The file to be analysed. If multiple files are passed, then only the first one will be analysed.
text	String	Optional. The Text to be analysed. If the `files` parameter is set, then this should be left blank.
maxEntities	Integer	Optional. The maximum number of entities that is returned. The parameter defaults to `${aws.comprehend.defaultMaxResults}`.
confidenceLevel	Float	Optional. The minimum confidence level for a entity expressed by a float number between 0 and 1. The parameter defaults to `${aws.comprehend.defaultConfidence}`.
timeout	Integer	Optional. The timeout for the remote call to the Comprehend service in milliseconds. The parameter defaults to `${aws.comprehend.asynchTimeout}`.
customClassificationArn	String	Optional. The custom recognizer ARN endpoint. If left blank, the Comprehend service will use the value given to the `AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN` environment variable. Note: The `AWS_COMPREHEND_CUSTOM_CLASSIFICATION_ARN` environment variable does not have a default value. If it is being used then you must also set a value for it.

Note: The connector must receive either files or text but not both at the same time.

The output parameters from the entity detection are:

Parameter	Type	Description
awsResponse	JSON	Optional. The result of the analysis from the Comprehend service.
aisResponse	JSON	Optional. The result of the analysis in Alfresco Intelligence Service format.
entities	JSON	Optional. The result object containing the entities detected.

The output parameters from the PII entity detection are:

Parameter	Type	Description
awsResponse	JSON	Optional. The result of the analysis from the Comprehend service.
piiEntityTypes	JSON	Optional. The result object containing the PII entities detected.

The output parameters from the Document classification are:

Parameter	Type	Description
awsResponse	Object	Optional. The object that contains the original result of the text analysis performed by the Comprehend service.
documentClassificationClasses	Array	Optional. An array that contains the list of the different classes detected in the analysis.

Comprehend configuration parameters

The configuration parameters for the Comprehend connector are:

Parameter	Description
AWS_ACCESS_KEY_ID	Required. The access key to authenticate against AWS.
AWS_SECRET_KEY	Required. The secret key to authenticate against AWS.
AWS_REGION	Required. The region of AWS to use the Comprehend service.
AWS_S3_BUCKET	Required. The name of the S3 bucket to use.
AWS_COMPREHEND_ROLE_ARN	Required. The Amazon Resource Name for Comprehend to use.

Comprehend errors

The possible errors that can be handled by the Comprehend connector are:

Error	Description
MISSING_INPUT	A mandatory input variable was not provided.
INVALID_INPUT	The input variable has an invalid type.
INVALID_RESULT_FORMAT	The REST service result payload cannot be parsed.
TEXT_SIZE_LIMIT_EXCEEDED	The size of the input text exceeds the limit.
TOO_MANY_REQUEST	The request throughput limit was exceeded.
UNSUPPORTED_LANGUAGE	The language of the input text can’t be processed.
CLIENT_EXECUTION_TIMEOUT	The execution ends because of timeout.
UNKNOWN_ERROR	Unexpected runtime error.
BAD_REQUEST	The server could not understand the request due to invalid syntax.
UNAUTHORIZED	The request has not been applied because it lacks valid authentication.
FORBIDDEN	The server understood the request but refuses to authorize it.
NOT_FOUND	The server could not find what was requested.
METHOD_NOT_ALLOWED	The request method is known by the server but is not supported.
NOT_ACCEPTABLE	The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT	The server would like to shut down this unused connection.
CONFLICT	The request conflicts with current state of the server.
GONE	No longer available.
UNPROCESSABLE_ENTITY	The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED	The resource that is being accessed is locked.
FAILED_DEPENDENCY	The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR	The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED	The request method is not supported by the server and cannot be handled.
BAD_GATEWAY	The server got an invalid response.
SERVICE_UNAVAILABLE	The server is not ready to handle the request.
GATEWAY_TIMEOUT	The server is acting as a gateway and cannot get a response in time.

Limitations

PDF files larger than 26214400 bytes are not supported.

Rekognition

The LABEL action is used by the Rekognition connector to execute Amazon Rekognition services to identify and label the objects in JPEG and PNG files that are less than 15mb in size.

The Amazon Rekognition API that is called is the Detect Labels API.

Files between 5mb and 15mb are uploaded to an Amazon S3 bucket before processing. The IAM user configured to run the Rekognition service requires access to this bucket, the Rekognition service itself and to have the rekognition:DetectLabels permission.

The input parameters of the Rekognition connector are:

Parameter	Type	Description
file	File	Required. A variable of type file to send for analysis.
mediaType	String	Optional. The media type of the file to be analyzed, for example `/octect-stream`.
maxLabels	Integer	Optional. The maximum number of labels to be return. The default value is `10`.
confidenceLevel	String	Optional. The confidence level to use in the analysis between 0 and 1, for example `0.75`.
timeout	Integer	Optional. The timeout period for calling the Rekognition service in milliseconds, for example `910000`.

The output parameters from the Rekognition analysis are:

Parameter	Type	Description
awsResponse	JSON	Optional. The result of the analysis from the Rekognition service.
aisResponse	JSON	Optional. The result of the image analysis in Alfresco Intelligence Service format.
labels	JSON	Optional. The result object containing the labels detected.

Rekognition configuration parameters

The configuration parameters for the Rekognition connector are:

Parameter	Description
AWS_ACCESS_KEY_ID	Required. The access key to authenticate against AWS.
AWS_SECRET_KEY	Required. The secret key to authenticate against AWS.
AWS_REGION	Required. The region of AWS to use the Rekognition service in.
AWS_S3_BUCKET	Required. The name of the S3 bucket to use.

Rekognition errors

The possible errors that can be handled by the Rekognition connector are:

Error	Description
MISSING_INPUT	A mandatory input variable was not provided.
INVALID_INPUT	The input variable has an invalid type.
INVALID_RESULT_FORMAT	The REST service result payload cannot be parsed.
PROVISIONED_THROUGHPUT_EXCEEDED	The number of requests exceeded your throughput limit.
ACCESS_DENIED	The user is not authorized to perform the action.
IMAGE_TOO_LARGE	The input image size exceeds the allowed limit.
INVALID_IMAGE_FORMAT	The provided image format is not supported.
LIMIT_EXCEEDED	The service limit was exceeded.
THROTTLING_ERROR	The service is temporarily unable to process the request.
UNKNOWN_ERROR	Unexpected runtime error.
BAD_REQUEST	The server could not understand the request due to invalid syntax.
UNAUTHORIZED	The request has not been applied because it lacks valid authentication.
FORBIDDEN	The server understood the request but refuses to authorize it.
NOT_FOUND	The server could not find what was requested.
METHOD_NOT_ALLOWED	The request method is known by the server but is not supported.
NOT_ACCEPTABLE	The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT	The server would like to shut down this unused connection.
CONFLICT	The request conflicts with current state of the server.
GONE	No longer available.
UNPROCESSABLE_ENTITY	The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED	The resource that is being accessed is locked.
FAILED_DEPENDENCY	The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR	The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED	The request method is not supported by the server and cannot be handled.
BAD_GATEWAY	The server got an invalid response.
SERVICE_UNAVAILABLE	The server is not ready to handle the request.
GATEWAY_TIMEOUT	The server is acting as a gateway and cannot get a response in time.

Textract

The EXTRACT action is used by the Textract connector to execute Amazon Textract to extract text and metadata from JPEG and PNG files that are less than 5mb in size.

The Amazon Textract APIs called are the Detect Document Text API which joins all LINE block objects with a line separator between them and the Analyze Document API with FORM and TABLES analysis.

The IAM user configured to run the Textract services needs to have the textract:DetectDocumentText and textract:AnalyzeDocument permissions.

The input parameters of the Textract connector are:

Parameter	Type	Description
file	File	Required. A variable of type file to send for extraction.
outputFormat	String	Optional. The format of the output file. Possible values are `JSON` and `TXT`. The default value is `JSON`.
confidenceLevel	String	Optional. The confidence level to use in the analysis between 0 and 1, for example `0.75`.
timeout	Integer	Optional. The timeout period for calling the Textract service in milliseconds, for example `910000`.

The output parameters from the Textract analysis are:

Parameter	Type	Description
awsResult	JSON	Optional. The result of the analysis from the Textract service.

Textract configuration parameters

The configuration parameters for the Textract connector are:

Parameter	Description
AWS_ACCESS_KEY_ID	Required. The access key to authenticate against AWS.
AWS_SECRET_KEY	Required. The secret key to authenticate against AWS.
AWS_REGION	Required. The region of AWS to use the Textract service.
AWS_S3_BUCKET	Required. The name of the S3 bucket to use.

Textract errors

The possible errors that can be handled by the Textract connector are:

Error	Description
MISSING_INPUT	A mandatory input variable was not provided.
INVALID_INPUT	The input variable has an invalid type.
INVALID_RESULT_FORMAT	The REST service result payload cannot be parsed.
PROVISIONED_THROUGHPUT_EXCEEDED	The number of requests exceeded your throughput limit.
ACCESS_DENIED	The user is not authorized to perform the action.
IMAGE_TOO_LARGE	The input image size exceeds the allowed limit.
INVALID_IMAGE_FORMAT	The provided image format is not supported.
LIMIT_EXCEEDED	The service limit was exceeded.
THROTTLING_ERROR	The service is temporarily unable to process the request.
UNKNOWN_ERROR	Unexpected runtime error.
BAD_REQUEST	The server could not understand the request due to invalid syntax.
UNAUTHORIZED	The request has not been applied because it lacks valid authentication.
FORBIDDEN	The server understood the request but refuses to authorize it.
NOT_FOUND	The server could not find what was requested.
METHOD_NOT_ALLOWED	The request method is known by the server but is not supported.
NOT_ACCEPTABLE	The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT	The server would like to shut down this unused connection.
CONFLICT	The request conflicts with current state of the server.
GONE	No longer available.
UNPROCESSABLE_ENTITY	The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED	The resource that is being accessed is locked.
FAILED_DEPENDENCY	The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR	The server has encountered a situation it doesn’t know how to handle.
NOT_IMPLEMENTED	The request method is not supported by the server and cannot be handled.
BAD_GATEWAY	The server got an invalid response.
SERVICE_UNAVAILABLE	The server is not ready to handle the request.
GATEWAY_TIMEOUT	The server is acting as a gateway and cannot get a response in time.

Transcribe

The transcribe connector provides a standard mechanism to obtain speech to text information from audio and video files using Amazon Transcribe.

Installation

The connector is a Spring Boot application that is included as a separate service of your AAE deployment.

AWS Configuration

Alfresco recommends you access AWS using AWS Identity and Access Management (IAM). To use IAM to access AWS, create an IAM user, add the user to an IAM group with administrative permissions, and then grant administrative permissions to the IAM user. You can then access AWS using a special URL and the IAM user’s credentials.

BPMN Tasks Configuration

As part of BPMN definition process, any service task responsible for triggering speech to text needs to set transcribe.TRANSCRIBE as the value for its implementation attribute.

In addition to the above configuration, these variables are required to perform the audio analysis:

The input parameters of the transcribe connector are:

Parameter	Type	Description
file	Array	Required. File to be transcribed. If multiple files are passed, only the first one will be processed.
timeout	Integer	Optional. Timeout for the remote call to transcribe service in milliseconds. The default is `${aws.transcribe.asynchTimeout}`.
generateWebVTT	Boolean	Optional The output webVTT is only populated if generateWebVTT is set to `true`.

The output parameters of the Transcribe connector are:

Parameter	Type	Description
awsResult	JSON	Optional. Result of the AWS Transcribe speech to text process.
transcription	String	Required. Transcription result.
webVTT	JSON	Optional Subtitles result in Web Video Text Tracks format.

Transcribe configuration parameters

The configuration parameters for the Transcribe connector are:

Parameter	Description
AWS_ACCESS_KEY_ID	Required. The access key to authenticate against AWS.
AWS_SECRET_KEY	Required. The secret key to authenticate against AWS.
AWS_REGION	Required. The region of AWS to use the Textract service.
AWS_S3_BUCKET	Required. The name of the S3 bucket to use.
AWS_TRANSCRIBE_LANGUAGES	List of comma separated languages that are spoken in the audio/video file.

Transcribe errors

The possible errors that can be handled by the Transcribe connector are:

Error	Description
MISSING_INPUT	A mandatory input variable was not provided.
INVALID_INPUT	The input variable has an invalid type.
INVALID_RESULT_FORMAT	The REST service result payload cannot be parsed.
LIMIT_EXCEEDED	The service limit was exceeded.
ACCESS_DENIED	The user is not authorized to perform the action.
INTERNAL_FAILURE	An internal Amazon Lex error occurred.
UNKNOWN_ERROR	Unexpected runtime error.
BAD_REQUEST	The server could not understand the request due to invalid syntax.
UNAUTHORIZED	The request has not been applied because it lacks valid authentication.
FORBIDDEN	The server understood the request but refused to authorize it.
NOT_FOUND	The server could not find what was requested.
METHOD_NOT_ALLOWED	The request method is known by the server but is not supported.
NOT_ACCEPTABLE	The server cannot produce a response matching the list of acceptable values.
REQUEST_TIMEOUT	The server is requesting to shut down this unused connection.
CONFLICT	The request conflicts with the current state of the server.
GONE	No longer available.
UNPROCESSABLE_ENTITY	The server understands the content type of the request entity, and the syntax of the request entity is correct, but it was unable to process the contained instructions.
LOCKED	The resource that is being accessed is locked.
FAILED_DEPENDENCY	The request failed due to failure of a previous request.
INTERNAL_SERVER_ERROR	The server has encountered a situation and does not know how to handle it.
NOT_IMPLEMENTED	The request method is not supported by the server and cannot be handled.
BAD_GATEWAY	The server got an invalid response.
SERVICE_UNAVAILABLE	The server is not ready to handle the request.
GATEWAY_TIMEOUT	The server is acting as a gateway and cannot get a response in time.

Limitations

Minimum confidence is not currently supported. The confidence is however included as part of the response.

AWS connectors

Lambda

Lambda configuration parameters

Lambda errors

Comprehend

AWS Configuration

DetectDominantLanguage

Entities

PII Entities

BPMN Tasks Configuration

Comprehend configuration parameters

Comprehend errors

Limitations

Rekognition

Rekognition configuration parameters

Rekognition errors

Textract

Textract configuration parameters

Textract errors

Transcribe

Installation

AWS Configuration

BPMN Tasks Configuration

Transcribe configuration parameters

Transcribe errors

Limitations

Edit this page