Alfresco Content Services

Scheduled Jobs Extension Point

Content Services automatically runs a number of scheduled jobs, for example the content store cleaner job and temporary file cleaner job. It is possible to configure new scheduled jobs.

Architecture Information: Platform Architecture

Description

A scheduled job in Content Services can be compared to a Unix cron job. It is kicked off based on a cron expression and can then execute a piece of Java code or JavaScript code. The repository embeds the Quartz job scheduler, which is part of the Spring Framework. It works with triggers, jobs, and job details to enable definition of all kinds of scheduled jobs. To define a new job we start with the job implementation, create a class with an execute method as follows:

public class ScheduledJobExecuter {
    private static final Logger LOG = LoggerFactory.getLogger(ScheduledJobExecuter.class);

    /**
     * Public API access
     */
    private ServiceRegistry serviceRegistry;

    public void setServiceRegistry(ServiceRegistry serviceRegistry) {
        this.serviceRegistry = serviceRegistry;
    }

    /**
     * Executer implementation
     */
    public void execute() {
        LOG.info("Running the scheduled job");

        // Work/Job implementation goes here...
    }
}

The class can be called anything you like, but it is good practice to name it after the job it is executing. In this case it is just a template for how it should be done and it just prints a log statement. Use the ServiceRegistry to get to any Public API services that are needed for the implementation, such as the NodeService.

We then create the Job details class as follows:

public class ScheduledJob extends AbstractScheduledLockedJob implements StatefulJob {
    @Override
    public void executeJob(JobExecutionContext context) throws JobExecutionException {
        JobDataMap jobData = context.getJobDetail().getJobDataMap();

        // Extract the Job executer to use
        Object executerObj = jobData.get("jobExecuter");
        if (executerObj == null || !(executerObj instanceof ScheduledJobExecuter)) {
            throw new AlfrescoRuntimeException(
                    "ScheduledJob data must contain valid 'Executer' reference");
        }

        final ScheduledJobExecuter jobExecuter = (ScheduledJobExecuter) executerObj;

        AuthenticationUtil.runAs(new AuthenticationUtil.RunAsWork<Object>() {
            public Object doWork() throws Exception {
                jobExecuter.execute();
                return null;
            }
        }, AuthenticationUtil.getSystemUserName());
    }
}

The Job details class extends the AbstractScheduledLockedJob class, which has job lock service functionality to lock job, so it can run safely in a cluster. It is also important that it implements the StatefulJob interface so the job is not triggered concurrently on different nodes. The Job details class expects the Job executer to be passed into it so it can use it to execute the scheduled job. The runAs method of the code makes it possible to set what user that should be used when executing the job, in this case it has been set up to use the System user. If you wanted to use the Admin user you would have to change AuthenticationUtil.getSystemUserName() to AuthenticationUtil.getAdminUserName(). This is the only implementation needed, the rest is Spring configuration.

Start defining a Spring bean for the Job executer as follows:

<bean id="org.alfresco.tutorial.scheduledjob.actions.ScheduledJobExecuter"
    class="org.alfresco.tutorial.scheduledjob.actions.ScheduledJobExecuter">
  <property name="serviceRegistry">
      <ref bean="ServiceRegistry" />
  </property>
</bean>

Then the Job detail bean:

<bean id="org.alfresco.tutorial.scheduledjob.jobDetail" 
      class="org.springframework.scheduling.quartz.JobDetailFactoryBean">
  <property name="jobClass">
      <value>org.alfresco.tutorial.scheduledjob.jobs.ScheduledJob</value>
  </property>
  <property name="jobDataAsMap">
      <map>
          <entry key="jobExecuter" value-ref="org.alfresco.tutorial.scheduledjob.actions.ScheduledJobExecuter" />
          <entry key="jobLockService" value-ref="jobLockService" />
      </map>
  </property>
</bean>

The Job detail bean takes a jobClass representing the Job details implementation and a jobExecuter class that will do the actual work. The jobLockService is passed in to handle locking in a cluster environment.

Next step is to define a Job trigger bean:

<bean id="org.alfresco.tutorial.scheduledjob.trigger" 
       class="org.springframework.scheduling.quartz.CronTriggerFactoryBean">
    <property name="jobDetail" ref="org.alfresco.tutorial.scheduledjob.jobDetail" />
    <property name="cronExpression" value="${org.alfresco.tutorial.scheduledjob.cronexpression}" />
    <property name="startDelay" value="${org.alfresco.tutorial.scheduledjob.cronstartdelay}" />
</bean>

In this case we have defined a Cron trigger, there are other triggers like SimpleTriggerFactoryBean that can be used too. The trigger bean takes a reference to the jobDetail bean so it knows what job to kick off. It also takes two parameters with the cron expression and cron start delay. In this case we have defined these parameters as being set via external properties. This is good practice so a System Administrator can manage the scheduled jobs. These properties will go into the modules alfresco-global.properties as follows:

org.alfresco.tutorial.scheduledjob.cronexpression=0 0/2 * * * ?
org.alfresco.tutorial.scheduledjob.cronstartdelay=240000
org.alfresco.tutorial.scheduledjob.enabled=true

In this case the scheduled job is set up to be run every second minute. And there will be a start delay of 4 minutes for the job. The start delay is important as this makes it possible to delay all scheduled jobs until the Content Services server has started up properly, otherwise search might not work properly for example.

The last thing needed to get this scheduled job going is to pass in the job trigger to the scheduler, this can be done as follows:

<bean id="org.alfresco.tutorial.scheduledjob.SchedulerAccessor" 
      class="org.alfresco.schedule.AlfrescoSchedulerAccessorBean">
    <property name="scheduler" ref="schedulerFactory"/>
    <property name="triggers">
      <list>
          <ref bean="org.alfresco.tutorial.scheduledjob.trigger"/>
      </list>
    </property>
    <property name="enabled" value="${org.alfresco.tutorial.scheduledjob.enabled}" />
</bean>

Note that neither Spring’s org.springframework.scheduling.quartz.SchedulerAccessorBean nor the trigger bean definition has an enabled flag. If it’s required to control the scheduling of the trigger via properties we can use the org.alfresco.schedule.AlfrescoSchedulerAccessorBean instead of Spring’s accessor, as in the example above.

Sometimes we might want the scheduled job to be applied to a set of nodes determined by a query and the job implementation to be in the form of a Repository Action. Having the job implemented as a repository action is handy as it can then be re-used in other places. Implementing this kind of scheduled job usually starts off with the repository action class:

public class SimpleRepoActionExecuter extends ActionExecuterAbstractBase {
    private static final Logger LOG = LoggerFactory.getLogger(SimpleRepoActionExecuter.class);

    public static final String PARAM_SIMPLE = "simpleParam";

    /**
     * The Alfresco Service Registry that gives access to all public content services in Alfresco.
     */
    private ServiceRegistry serviceRegistry;

    public void setServiceRegistry(ServiceRegistry serviceRegistry) {
        this.serviceRegistry = serviceRegistry;
    }

    @Override
    protected void addParameterDefinitions(List<ParameterDefinition> paramList) {
        paramList.add(new ParameterDefinitionImpl(
                PARAM_SIMPLE,
                DataTypeDefinition.TEXT,
                true,
                getParamDisplayLabel(PARAM_SIMPLE)));
    }

    @Override
    protected void executeImpl(Action action, NodeRef actionedUponNodeRef) {
        // Get parameter values
        String simpleParam = (String) action.getParameterValue(PARAM_SIMPLE);

        LOG.info("Simple Repo Action called from scheduled Job, [" + PARAM_SIMPLE + "=" + simpleParam + "]");

        if (serviceRegistry.getNodeService().exists(actionedUponNodeRef) == true) {
            // The implementation of the Repo Action goes here...
            String nodeName = (String)serviceRegistry.getNodeService().getProperty(
                    actionedUponNodeRef, ContentModel.PROP_NAME);

            LOG.info("Simple Repo Action invoked on node [name=" + nodeName + "]");
        }
    }
}

The repository action class extends the ActionExecuterAbstractBase as usual and implements the addParameterDefinitions and executeImpl methods that are part of the action interface. See more information about how to implement repository actions here. We use the ServiceRegistry to get to the public API, such as the NodeService. The actionedUponNodeRef will contain a node that is part of a result from a query set up in a Spring bean (we will configure this bean in a bit).

This is the only Java code that needs to be implemented, the rest is Spring bean configuration, let’s start with the repository action:

<bean id="simple-action"
	  class="org.alfresco.tutorial.scheduledjob.actions.SimpleRepoActionExecuter"
	  parent="action-executer">
	<property name="serviceRegistry">
		<ref bean="ServiceRegistry" />
	</property>
</bean>

This simple repository action has been given the ID simple-action. Next bean up is:

<bean id="templateActionModelFactory"
      class="org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory">
    <property name="serviceRegistry">
        <ref bean="ServiceRegistry" />
    </property>
</bean>

This defines a factory implementation that builds suitable models for the FreeMarker templating language. Next is the template action bean definition that will refer to the repository action bean and pass in any needed parameters to the action:

<bean id="org.alfresco.tutorial.scheduledjob.repoaction.simpleTemplateActionDefinition"
          class="org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition">
        <property name="actionName">
            <value>simple-action</value>
        </property>
        <property name="parameterTemplates">
            <map>
                <entry>
                    <key><value>simpleParam</value></key>
                    <value>Simple param value</value>
                </entry>
            </map>
        </property>
        <property name="templateActionModelFactory">
            <ref bean="templateActionModelFactory" />
        </property>
        <property name="dictionaryService">
            <ref bean="DictionaryService" />
        </property>
        <property name="actionService">
            <ref bean="ActionService" />
        </property>
        <property name="templateService">
            <ref bean="TemplateService" />
        </property>
    </bean>

Here the action-name will reference the repository action ID, which is simple-action in our case. The parameterTemplates map contain any parameters that the repository action is expecting, such as the simpleParam. The last bean we need to define is the one that specifies the node query and the cron expression for when to run the scheduled job:

<bean id="org.alfresco.tutorial.scheduledjob.repoaction.simpleRepoActionCronJob"
          class="org.alfresco.repo.action.scheduled.CronScheduledQueryBasedTemplateActionDefinition">
        <property name="transactionMode">
            <value>UNTIL_FIRST_FAILURE</value>
        </property>
        <property name="compensatingActionMode">
            <value>IGNORE</value>
        </property>
        <property name="searchService">
            <ref bean="SearchService" />
        </property>
        <property name="templateService">
            <ref bean="TemplateService" />
        </property>
        <property name="queryLanguage">
            <value>lucene</value>
        </property>
        <property name="stores">
            <list>
                <value>workspace://SpacesStore</value>
            </list>
        </property>
        <property name="queryTemplate">
            <value>PATH:"/app:company_home/*"</value>
        </property>
        <property name="cronExpression">
            <value>${org.alfresco.tutorial.scheduledjob.repoaction.cronexpression}</value>
        </property>
        <property name="jobName">
            <value>SimpleRepoActionJob</value>
        </property>
        <property name="jobGroup">
            <value>AlfrescoTutorialsJobGroup</value>
        </property>
        <property name="triggerName">
            <value>triggerSimpleRepoAction</value>
        </property>
        <property name="triggerGroup">
            <value>AlfrescoTutorialsTriggers</value>
        </property>
        <property name="scheduler">
            <ref bean="schedulerFactory" />
        </property>
        <property name="actionService">
            <ref bean="ActionService" />
        </property>
        <property name="templateActionModelFactory">
            <ref bean="templateActionModelFactory" />
        </property>
        <property name="templateActionDefinition">
            <ref bean="org.alfresco.tutorial.scheduledjob.repoaction.simpleTemplateActionDefinition" />
        </property>
        <property name="transactionService">
            <ref bean="TransactionService" />
        </property>
        <property name="runAsUser">
            <value>System</value>
        </property>
    </bean>

The queryTemplate property should contain the node query. In this case we have specified a Lucene PATH query PATH:"/app:company_home/*" that will return all nodes under /Company Home. The repository action, which is indirectly specified via the templateActionDefinition property, will be called ones for each one of these nodes. And the whole job will be kicked off based on the cronExpression that we specify, in this case we pass it in via external property that is specified in the alfresco-global.properties file.

When the job is kicked off and the repository action is called for each node matching the query there are a couple of parameters that can be used to control the behavior, first the transactionMode:

ISOLATED_TRANSACTIONS - for each node the action is run in an isolated transaction. Failures are logged.
UNTIL_FIRST_FAILURE - for each node the action is run in an isolated transaction. The first failure stops this.
ONE_TRANSACTION- the actions for all nodes are run in one transaction. One failure will roll back all.

Then we got the compensatingActionMode parameter:

IGNORE - This parameter is not used when the action is implemented as a SimpleTemplateActionDefinition
RUN_COMPENSATING_ACTIONS_ON_FAILURE - This parameter can be used to indicate that in case of the action failing call another compensation action. This requires the use of the compensatingTemplateActionDefinition property in the SimpleTemplateActionDefinition definition.

Sometimes it is convenient to be able to have the action implemented as a server side JavaScript. This can be done with the following type of SimpleTemplateActionDefinition:

<bean id="runScriptAction" class="org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition">
    <property name="actionName">
        <value>script</value>
    </property>
    <property name="parameterTemplates">
        <map>
            <entry>
                <key>
                    <value>script-ref</value>
                </key>
                <value>\$\{selectSingleNode('workspace://SpacesStore', 'lucene', 'PATH:"/app:company_home/app:dictionary/app:scripts/cm:exampleScript.js"' )\}</value>
            </entry>
        </map>
    </property>
    <property name="templateActionModelFactory">
        <ref bean="templateActionModelFactory"/>
    </property>
    <property name="dictionaryService">
        <ref bean="DictionaryService"/>
    </property>
    <property name="actionService">
        <ref bean="ActionService"/>
    </property>
    <property name="templateService">
        <ref bean="TemplateService"/>
    </property>
</bean>

Here the action-name refers to one of the out-of-the-box repository actions called script that can be used to execute a JavaScript. The script is passed in as the script-ref parameter.

Out-of-the-box scheduled jobs definitions

Alfresco runs a number of scheduled jobs that assist in the maintenance of a production environment.

These jobs are defined in the scheduled-jobs-context.xml file.

Scheduled job	Description
`contentStoreCleanerTrigger`	Launches the `contentStoreCleaner` bean, which identifies, and deletes or purges orphaned content from the content store while the system is running. Content is said to be orphaned when all references to a content binary have been removed from the metadata. By default, this job is triggered at 4:00 am each day. In a clustered environment, this job could be enabled on a headless (non-public) node only, which will improve efficiency.
`nodeServiceCleanupTrigger`	Performs cleanup operations on DM node data, including old deleted nodes and old transactions. In a clustered environment, this job could be enabled on a headless (non-public) node only, which will improve efficiency.
`tempFileCleanerTrigger`	Cleans up all Alfresco temporary files that are older than the given number of hours. Subdirectories are also emptied and all directories below the primary temporary subdirectory are removed. The job data must include the `protectHours` property, which is the number of hours to protect a temporary file from deletion since its last modification. The `system.tempFileCleaner.maxFilesToDelete` parameter can be used to configure a maximum number of files that the `TempFileCleanerTrigger` can delete when it is triggered, the default value is `null` and it’s data type is `Long`. The `system.tempFileCleaner.maxTimeToRun` parameter can be used to determine a maximum duration for the `TempFileCleanerTrigger` to run after it starts, the default value is `null` and its data type is `Duration`.
`deadClusterNodesCleanupJobDetail`	Enterprise only feature. Schedules the removal of dead cluster nodes from the database so that live nodes do not spend time trying to reconnect with dead nodes. By default, this job deletes nodes which are inactive for more than 24 hours. This period is configurable, by using the property `alfresco.cluster.deadNodes.cleaner.inactivePeriodMilliseconds` which is available inside the `repository.properties` file. Currently this job is triggered at 3:00 am on Saturday of every week. You can customize the schedule using the cron expression property `alfresco.cluster.deadNodes.cleaner.cronExpression` which is also available in the `repository.properties` file.

Deployment - App Server

Most suitable for JavaScript backed jobs using the script template action. For Java backed jobs use a Repository JAR extension module.

Spring Beans: tomcat/shared/classes/alfresco/extension/my-content-model-context.xml (File name has to end in -context.xml to be picked up as Spring Bean context file)
JavaScript file: Upload to /Company Home/Data Dictionary/Scripts

These file locations are untouched by re-deployments and upgrades.

Deployment All-in-One SDK project

Java job implementations: aio/platform-jar/src/main/java/{custom package path}
Job default configuration: aio/platform-jar/src/main/resources/alfresco/module/platform-jar/alfresco-global.properties
Spring Beans: aio/platform-jar/src/main/resources/alfresco/module/platform-jar/context/scheduler-context.xml

Sample Code

Java based Job implementation and Repo action Job implementation. (Note. this is 5.x code and it differs a little bit as Spring was updated to version 5.x in ACS 6.x. Some stuff in implementing Quartz schedulers changed, but not much, see above and compare with tutorial code)

Edit this page

Suggest an edit on GitHub