Use this information to import files using the Bulk Import Tool, or transfer files using the File System Transfer Receiver (FSTR).
The Bulk Import tool provides a mechanism for Systems Administrators to import existing content in bulk into a repository from the Alfresco server’s file system.
It (optionally) replaces existing content items if they already exist in the repository, but does not delete. It is not designed to fully synchronize the repository with the local file system.
The basic on-disk file/folder structure is preserved as it is in the repository. It is possible to load metadata for the files and spaces being ingested, as well as a version history for files (each version consists of content, metadata, or both).
There are a number of restrictions:
- Only one bulk import can be running at a time. This is enforced by the JobLockService.
- Access to the Bulk Import tool is restricted to Community Edition administrators.
- There is a file name length limitation of 255 characters for imported files. This limitation reflects most file systems limits.
Prepare file system
There are a number of tasks you must do to prepare the file system before you do the bulk import.
Metadata files
The Bulk Import tool has the ability to load metadata (types, aspects, and their properties) into the repository. This is done using “shadow” Java property files in XML format as it has good support for Unicode characters. These shadow properties files must have exactly the same name and extension as the file for which it describes the metadata, but with the suffix .metadata.properties.xml. For example, if there is a file called IMG_1967.jpg, the “shadow” metadata file is called IMG_1967.jpg.metadata.properties.xml.
These shadow files can also be used for directories. For example, if you have a directory called MyDocuments, the shadow metadata file is called MyDocuments.metadata.properties.xml.
The metadata file itself follows the usual syntax for Java XML properties files:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
   <entry key="key1">value1</entry>
   <entry key="key2">value2</entry>
    ...
</properties>
There are two special keys:
- Type contains the qualified name of the content type to use for the file or folder
- Aspects contains a comma-delimited list of the qualified names of the aspect(s) to attach to the file or folder
The remaining entries in the file are treated as metadata properties, with the key being the qualified name of the property and the value being the value of that property. Multi-valued properties are comma-delimited. However, these values are not trimmed so it’s recommended you do not place a space character either before or after the comma, unless you want that in the value of the property.
Here is an example using IMG_1967.jpg.metadata.properties.xml:
 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
 <properties>
    <entry key="type">cm:content</entry>
    <entry key="aspects">cm:versionable,cm:dublincore</entry>
    <entry key="cm:title">A photo of a flower.</entry>
    <entry key="cm:description">A photo I took of a flower while walking around Bantry Bay.</entry>
    <entry key="cm:created">1901-01-01T12:34:56.789+10:00</entry>
    <!-- cm:dublincore properties -->
    <entry key="cm:author">Peter Monks</entry>
    <entry key="cm:publisher">Peter Monks</entry>
    <entry key="cm:contributor">Peter Monks</entry>
    <entry key="cm:type">Photograph</entry>
    <entry key="cm:identifier">IMG_1967.jpg</entry>
    <entry key="cm:dcsource">Canon Powershot G2</entry>
    <entry key="cm:coverage">Worldwide</entry>
    <entry key="cm:rights">Copyright (c) Peter Monks 2002, All Rights Reserved</entry>
    <entry key="cm:subject">A photo of a flower.</entry>
  </properties>
Additional notes on metadata loading:
- You can’t create a new node based on metadata only, you must have a content file (even if zero bytes) for the metadata to be loaded. Even so, you can “replace” an existing node in the repository with nothing but metadata. Despite the confusing name, this won’t replace the content; instead the new metadata is added.
- The metadata must conform to the type and aspect definitions configured in Community Edition (including mandatory fields, constraints, and data types). Any violations will terminate the bulk import process.
- Associations between content items loaded by the tool are not yet nicely supported. Associations to objects that are already in the repository can be created using the NodeRef of the target object as the value of the property.
- Non-string data types (including numeric and date types) have not been exhaustively tested. Date values have been tested and do work when specified using ISO8601 format.
- Updating the aspects or metadata on existing content won’t remove any existing aspects not listed in the new metadata file; this tool is not intended to provide a full file system synchronization mechanism.
- The metadata loading facility can be used to supplement content that’s already in the repository, without having to upload that content again. To use this, create a “naked” metadata file in the same path as the target content file. The tool will match it up with the file in the repository and add the new aspect(s) and/or metadata to that file.
Version History files
The import tool also supports loading a version history for each file. To do this, create a file with the same name as the main file, but append it with a v# extension. For example:
  IMG_1967.jpg.v1   <- version 1 content
  IMG_1967.jpg.v2   <- version 2 content
  IMG_1967.jpg      <- "head" (latest) revision of the content
This also applies to metadata files if you want to capture metadata history as well. For example:
  IMG_1967.jpg.metadata.properties.xml.v1   <- version 1 metadata
  IMG_1967.jpg.metadata.properties.xml.v2   <- version 2 metadata
  IMG_1967.jpg.metadata.properties.xml      <- "head" (latest) revision of the metadata
Additional notes on version history loading:
- You can’t create a new node based on a version history only. You must have a head revision of the file.
- Version numbers do not have to be contiguous. You can number your version files however you want, provided you use whole numbers (integers).
- The version numbers in your version files won’t be used in Community Edition. The version numbers in Community Edition will be contiguous, starting at 1.0 and increasing by 1.0 for every version (so 1.0, 2.0, 3.0, and so on). Community Edition doesn’t allow version labels to be set to arbitrary values, and the bulk import doesn’t provide any way to specify whether a given version should have a major or minor increment.
- Each version can contain a content update, a metadata update or both. You are not limited to updating everything for every version. If not included in a version, the prior version’s content or metadata will remain in place for the next version.
The following example shows all possible combinations of content, metadata, and version files:
  IMG_1967.jpg.v1                           <- version 1 content
  IMG_1967.jpg.metadata.properties.xml.v1   <- version 1 metadata
  IMG_1967.jpg.v2                           <- version 2 content
  IMG_1967.jpg.metadata.properties.xml.v2   <- version 2 metadata
  IMG_1967.jpg.v3                           <- version 3 content (content only version)
  IMG_1967.jpg.metadata.properties.xml.v4   <- version 4 metadata (metadata only version)
  IMG_1967.jpg.metadata.properties.xml      <- "head" (latest) revision of the metadata
  IMG_1967.jpg                              <- "head" (latest) revision of the content
Import with the Bulk Import tool
You can bulk import with a program.
Community Edition web scripts are used for bulk importing. If you choose to code the bulk import, code examples are provided to help you. In both cases, you can use the reference table to determine the fields and data that are required for a successful import.
If you need to troubleshoot or diagnose any issues with a bulk import, you can enable logging. To enable debugging for the Bulk Import tool, add the following command to the log4j2.properties file before deployment:
logger.alfresco-repo-bulkimport.name=org.alfresco.repo.bulkimport
logger.alfresco-repo-bulkimport.level=debug
Set the debug statements to at least INFO level:
logger.alfresco-repo-batch-BatchProcessor.name=org.alfresco.repo.batch.BatchProcessor
logger.alfresco-repo-batch-BatchProcessor.level=info
You can also enable logging for the transaction handler to identify any transactional issues during the import:
logger.alfresco-repo-transaction-RetryingTransactionHelper.name=org.alfresco.repo.transaction.RetryingTransactionHelper
logger.alfresco-repo-transaction-RetryingTransactionHelper.level=info
For more information about log4j2, see log4j2.properties file.
Bulk import using a program
Code examples show you how to complete a streaming bulk import programmatically.
Streaming example:
   UserTransaction txn = transactionService.getUserTransaction();
   txn.begin();
   AuthenticationUtil.setRunAsUser("admin");
   StreamingNodeImporterFactory streamingNodeImporterFactory = (StreamingNodeImporterFactory)ctx.getBean("streamingNodeImporterFactory");
   NodeImporter nodeImporter = streamingNodeImporterFactory.getNodeImporter(new File("importdirectory"));
   BulkImportParameters bulkImportParameters = new BulkImportParameters();
   bulkImportParameters.setTarget(folderNode);
   bulkImportParameters.setReplaceExisting(true);
   bulkImportParameters.setBatchSize(40);
   bulkImportParameters.setNumThreads(4);
   bulkImporter.bulkImport(bulkImportParameters, nodeImporter);
   txn.commit();
Fields and values
The Bulk Import tool has a number of entry and display fields that are shown in the user interface, but also referenced in the status.xml file that is used if you’re programming a bulk import. The labels, fields, possible values and a summary of each entry is explained below.
| Field label (from Bulk Import status web page) | Field entry (from status.xml file) | Possible values | Summary | 
|---|---|---|---|
| Current status | <CurrentStatus>Idle</CurrentStatus> | Idle | In Progress | Status of the bulk import | 
| Successful | <ResultOfLastExecution>Yes</ResultOfLastExecution> | Yes | No | n/a | Result of the bulk import | 
| Batch Size | <batchSize>20</batchSize> | Numeric | The batch size (number of directories and files to import at a time) specified for the bulk import | 
| Number of threads | <numThreads>4</numThreads> | Numeric | The number of threads specified for the bulk import | 
| Source Directory | <SourceDirectory>importdirectory</SourceDirectory> | Alphanumeric | The absolute path of the filesystem directory being imported | 
| Target Space | <TargetSpace>/Company Home</TargetSpace> | Alphanumeric | The path of the Alfresco space where the content is being loaded, starting with /Company Home | 
| Start Date | <StartDate>2014-05-15 01:30:11.912PM</StartDate> | Date and timestamp | Start of the bulk import. Format is YYYY-MM-DD HH:MM:SS.sss AM | PM | 
| End Date | <EndDate>2014-05-15 01:30:12.009PM</EndDate> | Date and timestamp | End of the bulk import. Format is YYYY-MM-DD HH:MM:SS.sss AM | PM | 
| Duration | <DurationInNS>0d 0h 0m 0s 96.941ms</DurationInNS> | Alphanumeric | Time taken for the bulk import to complete. Format is xd xh xm xxs xx.xxxmswherexis a number | 
| Number of Completed Batches | <CompletedBatches>0</CompletedBatches> | Numeric | Number of batches completed in the bulk import | 
| Source (read) Statistics | <SourceStatistics> | ||
| Scanned: Folders | <FoldersScanned>0</FoldersScanned> | Numeric | Number of source folders scanned | 
| Scanned: Files | <FilesScanned>0</FilesScanned> | Numeric | Number of source files scanned | 
| Scanned: Unreadable | <UnreadableEntries>0</UnreadableEntries> | Numeric | Number of unreadable source files | 
| Read: Content | <ContentFilesRead>0</ContentFilesRead> | Numeric | Amount of source content read. Format is numeric with size of content in parentheses | 
| Read: Metadata | <MetadataFilesRead>0</MetadataFilesRead> | Numeric | Amount of source metadata read. Format is numeric with size of metadata in parentheses | 
| Read: Content Versions | <ContentVersionFilesRead>0</ContentVersionFilesRead> | Numeric | Source content versions read. Format is numeric with size of content versions in parentheses | 
| Read: Metadata Versions | <MetadataVersionFilesRead>0</MetadataVersionFilesRead> | Numeric | Source metadata versions read. Format is numeric with size of metadata versions in parentheses | 
| Throughput | N/A | Numeric | Number of entries scanned per second, number of files read per second, and size of data read per second | 
| Target (write) Statistics | <TargetStatistics> | ||
| Space Nodes: # Created | <SpaceNodesCreated>0</SpaceNodesCreated> | Numeric | Number of target space nodes created | 
| Space Nodes: # Replaced | <SpaceNodesReplaced>0</SpaceNodesReplaced> | Numeric | Number of target space nodes replaced | 
| Space Nodes: # Skipped | <SpaceNodesSkipped>0</SpaceNodesSkipped> | Numeric | Number of target space nodes skipped | 
| Space Nodes: # Properties | <SpacePropertiesWritten>0</SpacePropertiesWritten> | Numeric | Number of properties written for target space nodes | 
| Content Nodes: # Created | <ContentNodesCreated>0</ContentNodesCreated> | Numeric | Number of target content nodes created | 
| Content Nodes: # Replaced | <ContentNodesReplaced>0</ContentNodesReplaced> | Numeric | Number of target content nodes replaced | 
| Content Nodes: # Skipped | <ContentNodesSkipped>0</ContentNodesSkipped> | Numeric | Number of target content nodes skipped | 
| Content Nodes: # Data Written | <ContentBytesWritten>0</ContentBytesWritten> | Numeric | Amount of target content node data written | 
| Content Nodes: # Properties | <ContentPropertiesWritten>0</ContentPropertiesWritten> | Numeric | Number of properties written for target content nodes | 
| Content Versions: # Created | <ContentVersionsCreated>0</ContentVersionsCreated> | Numeric | Number of target content versions created | 
| Content Versions: # Data Written | <ContentVersionsBytesWritten>0</ContentVersionsBytesWritten> | Numeric | Amount of target content version data written | 
| Content Versions: # Properties | <ContentVersionsPropertiesWritten>0</ContentVersionsPropertiesWritten> | Numeric | Number of properties written for target content versions | 
| Throughput (write) | N/A | Numeric | Number of nodes scanned per second and size of data written per second | 
| Error Information From Last Run | <ErrorInformation> | ||
| File that failed | <FileThatFailed>n/a</FileThatFailed> | Alphanumeric | The name of the file that failed during the bulk import | 
| Exception | <Exception>exceptionLog</Exception> | Alphanumeric | The stack trace of the exception that occurred during the bulk import | 
Configure File System Transfer Receiver
The File System Transfer Receiver (FSTR) transfers folders and content from a Community Edition core repository (the DM) to configured targets using the Transfer Service, for example, a remote file system.
The Transfer Service is accessible as a bean named TransferService, and it can be defined, along with other related beans, in the transfer-service-context.xml spring context file.
You’ll need to create new transfer targets for content replication, and manually change the type of the transfer target folder to the type trx:fileTransferTarget. This allows you to specify which folder node corresponds to the root folder of the file system receiver by associating the transfer target with a folder (i.e. the trx:fileTransferRootFolder association). See Create a new transfer target for file system content replication for more.
It supports sync mode transfer, so it can also be used by the replication service. It includes an embedded Derby database to keep track of data (NodeRef to file path mappings, for example), and it runs as a web application in an embedded Tomcat instance using the Web Script Framework and MyBatis.
Set up
The File System Transfer Receiver is delivered as a compressed zip file.
- 
    Download the following file from the Alfresco Community Edition download page: alfresco-file-transfer-receiver-6.2.1.zip
- 
    Extract the zip file into a relevant directory. The File System Transfer Receiver zip file extracts into the following directory structure: classes lib webapps file-transfer-receiver.jarThe following files are contained within the subdirectories. /classesftr-custom-context.xml ftr-custom.properties ftr-launcher-context.xml ftr-launcher.properties log4j2.properties/lib*various library files*/webappsfile-transfer-receiver.war
Start
Use this information to start the File System Transfer Receiver.
- 
    Ensure that you’ve expanded the File System Transfer Receiver zip file: alfresco-file-transfer-receiver-6.2.1.zip
- 
    To run the File System Transfer Receiver, enter the following command: java –jar file-transfer-receiver.jarYou can navigate to http://<FSTR-host-name>:<FSTR-port>/alfresco-ftr/service/indexto see if the FSTR is running. Information messages indicate that the web application server is starting.
Launch properties
The launch properties for the File System Transfer Receiver are available in the ftr-launcher.properties file.
This file contains the Tomcat base directory and the port number to startup on.
| Property | Description | 
|---|---|
| ftr.tomcat.baseDir | Specifies the base directory in which the embedded Tomcat web application server is installed. This can either be an absolute path or a path relative to where the server is being started from. The default value of ${user.dir}means that the Tomcat base directory is taken to be the user’s current working directory. | 
| ftr.tomcat.portNum | Specifies the port number on which the FSTR Tomcat web application server is to listen. The default is 9090. | 
Custom properties
The custom properties for the File System Transfer Receiver are available in the ftr-custom.properties file.
This file is used to configure the operation of FSTR. It contains the settings for the root directory, staging directory, derby database connection string, username, and password.
| Property | Description | 
|---|---|
| fileTransferReceiver.stagingDirectory | The staging directory is where the FSTR will temporarily store the files that it receives from the source repository during a transfer. These files include the manifest file that describes the metadata of the nodes being transferred as well as the actual content files associated with those nodes. All of these files are staged in the directory referenced by this property prior to being moved to their correct location below the root directory. The default is ./ftr-staging. | 
| fileTransferReceiver.rootDirectory | Specifies the location of the directory on the local file system that is the top level of the transferred tree of nodes. A node that is a child of the nominated root node of the transfer in the source repository will be placed in the directory referenced by this property when it’s transferred. The default it ./ftr-root. | 
| fileTransferReceiver.jdbcUrl=jdbc: derby:./derbyDB;create=true; user=alfresco;password=alfresco | The FSTR contains an embedded Apache Derby database that it uses to keep track of which nodes it receives and which file on the file system corresponds to which node. This property specifies the connection URL for this embedded database. It is unlikely that it’ll need to be changed. Note: It’s recommended that you do not store FSTR database on a network file system location. The database must be on a local disk to ensure data integrity. | 
| fileTransferReceiver.username | The user name that the source repository will have to declare when initiating a transfer to this FSTR. This property must correspond with the user name property stored on the transfer target in the source repository. The default is set to admin. | 
| fileTransferReceiver.password | The password that the source repository will have to declare when initiating a transfer to this FSTR. This property must correspond with the password property stored on the transfer target in the source repository. The default is set to admin. | 
Log file properties
You can debug the File System Transfer Receiver issues using the log4j2.properties file. This section describes the properties that you can set.
For example:
logger.alfresco-repo-transfer-fsr.name=org.alfresco.repo.transfer.fsr
logger.alfresco-repo-transfer-fsr.level=warn
logger.alfresco-repo-web-scripts-transfer.name=org.alfresco.repo.web.scripts.transfer
logger.alfresco-repo-web-scripts-transfer.level=warn
Create new transfer target for file system content replication
The transfer service stores files that control and monitor the operation of the transfer service in the Transfers space in the Data Dictionary.
The Transfer Target Groups space contains the transfer target definitions that specify where transfers go to. There is a group level below the folder which is used to classify different sets of transfer targets. This folder contains a group called Default Group.
You can add transfer targets by creating new transfer folders.
- 
    In the source repository, create a new folder in Company Home > Data Dictionary > Transfers > Transfer Target Groups > Default Group. - 
        In the New Folder window specify a name, for example, Replica. You can add a title and description, if you wish.A rule defined on the Default Group folder specializes the type of any folder created in it. The type is set automatically by the folder rule to trx:transferTarget. This allows you add the required properties to define the replication target through the user interface.
- 
        Manually change the type of the folder. In the Folder Details page, select Change Type, and then choose File Transfer Target for this new type. This allows you to also set a Root Folder that’s required by the File System Transfer Receiver system. 
- 
        Click Edit Properties on your new folder (Replica). 
- 
        Specify the required properties: - Specify the Endpoint Host, Endpoint Port, Username and Password.
- 
            Specify the rest of the properties to point to the FSTR server that you’ve setup using Start File System Transfer Receiver. Note: Here, you have the option to select the Root folder. Browse and select a sub-folder of theDocument Libraryin the site from which you plan to transfer the files. For example, if you want to transfer some files from a folder calledfolder1in a site calledsite1, select thatfolder1as theRoot Folderin the properties window.
- Click Enabled and Save.
 
- 
        Enable the replication service in your alfresco-global.propertiesfile:replication.enabled=trueand restart the source repository. 
 
- 
        
- 
    In the target repository, enable the replication server and content receiver in the alfresco-global.propertiesfile:replication.enabled=true transferservice.receiver.enabled=trueand restart the target repository. 
- 
    On the source repository, create a replication job to test the target setup. - 
        From the toolbar, click Admin Tools and select Replication Jobs from the menu. 
- 
        Click Create Job. 
- 
        Specify properties for Name, Payload, and Transfer Target. Name is a new folder name; for example, Replication Job. Payload is the source content directory, and Transfer Target is the folder name that you set up in step 1 (Replica).
- 
        Click Enabled. 
- 
        Click Create Job. 
- 
        Refresh the screen after a few minutes to see a status change. 
 
- 
        
- 
    Verify the replication job. Log in to Alfresco Share on the target repository, select a transferred file and click Open in Source Repository to check that content has replicated.