A specific configuration attribute, called shard.method defines the logic/strategy which controls how documents and ACLs are distributed across shards. Note this setting is configured in each Solr instance (i.e in each shard). So a shard will use that strategy for determining if the given incoming data belongs to it or not.
To use a specific sharding method, when creating a Solr node you must add the required configuration properties in solrcore.properties. The sharding method is set with the required property, shard_method. Additional properties may then be needed, depending on your chosen method. If an invalid shard_method is provided, then the system will fallback to DBID routing.
Solr 6 can use any of the following methods for routing documents and ACLs to shards.
- ACL (MOD_ACL_ID) v1: This sharding method is available in Alfresco Search Services 1.0 and
later versions.
Nodes and access control lists are grouped by their ACL ID. This places the nodes together with all the access control information required to determine the access to a node in the same shard. Both the nodes and access control information are sharded. The overall index size will be smaller than other methods. Also, the ACL count is usually much smaller than the node count.
This method is beneficial if you have lots of ACLs and the documents are evenly distributed over those ACLs. For example, if you have many Share sites, nodes and ACLs are assigned to shards randomly based on the ACL and the documents to which it applies.
The node distribution may be uneven as it depends how many nodes share ACLs.
To use this method when creating a shard, set the following configuration:shard.method=MOD_ACL_ID shard.instance=<shard.instance> shard.count=<shard.count>
- ACL (ACL_ID) v2: This method is available in Alfresco Search Services 1.0 and
later versions.This sharding method is the same as ACL ID v1 except that the murmur hash of the ACL ID is used in preference to its modulus. This gives better distribution of ACLs over shards. The distribution of documents over ACLs is not affected and so the shard sizes can still be skewed.
shard.method=ACL_ID shard.instance=<shard.instance> shard.count=<shard.count>
- DBID (DB_ID): This method is available in all versions of Alfresco Search Services and is
the default sharding option in Solr 6. Nodes are evenly distributed over the shards at
random based on the murmur hash of the DBID. The access control information is duplicated
in each shard. The distribution of nodes over each shard is very even and shards grow at
the same rate. Also, this is the fall back method if any other sharding information is
unavailable.To use this method when creating a shard, set the following configuration:
shard.method=DB_ID shard.instance=<shard.instance> shard.count=<shard.count>
- DBID range (DB_ID_RANGE): This method is available in Alfresco Search Services 1.1 and
later
versions.
This routes documents within specific DBID ranges to specific shards. It adds new shards
to the cluster without requiring a reindex.
DBID range sharding is the only option to offer auto-scaling as opposed to defining your exact shard count at the start. All the other sharding methods require repartitioning in some way.
For each shard, you specify the range of DBIDs to be included. As your repository grows you can add shards.
Example 1: You may aim for shards of 20M nodes in size and expect it to get to 100M over five years. You could create the first shard for nodes 0-20M. As you approach node 20M, you can create the next shard for nodes 20M-40M, and so on.
To use this method when creating a shard, set the following configuration:shard.method=DB_ID_RANGE shard.range=0-20000000 shard.instance=<shard.instance>
Example 2: If there are 100M (million) nodes and you want to split them into 10 shards with 10M nodes each. So, at the start you can specify:- 10 shards
- a shard to include 0-10M
- the second shard will have 10M - 20M nodes, third shard will have 20M - 30M nodes, and so on.
Date-based queries may produce results from only a subset of shards as DBID increases monotonically over time.
- Date/Datetime (DATE): This method is available in all versions of Alfresco Search Services. The
date-based sharding assigns dates sequentially through shards based on the month.
Example: If there are 12 shards, each month would be assigned sequentially to each shard, wrapping round and starting again for each year. The non-random assignment facilitates easier shard management - dropping shards or scaling out replication for some date range. Typical aging strategies could be based on the created date or destruction date.
If the property is not present on a node, sharding falls back to the DBID method to randomly distribute these nodes.
To use this method when creating a shard, set the following configuration:shard.key=exif:dateTimeOriginal shard.method=DATE shard.instance=<shard.instance> shard.count=<shard.count>
Months can be grouped together, for example, by quarter. Each quarter of data would be assigned sequentially through the available shards.shard.date.grouping=3
-
Metadata (PROPERTY): This method is available in all versions of Alfresco Search Services. In this method, the value of some property is hashed and
this hash is used to assign the node to a random shard. All nodes with the same property
value will be assigned to the same shard.
Only properties of type d:text, d:date and d:datetime can be used. For example, the recipient of an email, the creator of a node, some custom field set by a rule, or by the domain of an email recipient. The keys are randomly distributed over the shards using murmur hash.
If the property is not present on a node, sharding falls back to the DBID method to randomly distribute these nodes.
To use this method when creating a shard, set the following configuration:shard.key=cm:creator shard.method=PROPERTY shard.instance=<shard.instance> shard.count=<shard.count>
It is possible to extract a part of the property value to use for sharding using a regular expression, for example, a year at the start of a string:
shard.regex=^\d{4}
If the regular expression doesn't match the property (e.g. the string doesn't start with a four-digit year), then this causes a fallback to DBID sharding.
- Explicit Sharding (EXPLICIT_ID): This method is available in Alfresco Search Services 1.2 and
later versions. The node is assigned to a shard based on the value of a property (e.g.
cm:type), which should contain the "explicit" numeric shard ID.
This method is similar to sharding by metadata. Rather than hashing the property value, it explicitly defines the shard where the node should go. If the property is absent or an invalid number, sharding will fall back to using the DBID sharding method. Only text fields are supported. Nodes are allowed to move shards. You can add, remove or change the property that defines the shard.
To use this method when creating a shard, set the following configuration:shard.method=EXPLICIT_ID shard.key=cm:targetShardInstance shard.instance=<shard.instance> shard.count=<shard.count>
Availability matrix
Index Engine | ACL v1 | DB ID | Date/time | Metadata | ACL v2 | DBID range | Explicit |
---|---|---|---|---|---|---|---|
Alfresco Content Services 5.2.0+ Solr 4 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Alfresco Content Services 5.2.0+ Alfresco Search Services 1.0 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Alfresco Content Services 5.2.5+ Alfresco Search Services 1.1 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Alfresco Content Services 5.2.5+ Alfresco Search Services 1.2+ |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Comparison Overview
Index Engine | ACL v1 | DB ID | Date/time | Metadata | ACL v2 | DBID range | Explicit |
---|---|---|---|---|---|---|---|
All shards required |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
ACLs replicated on all shards |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Can add shards as the index grows |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Distribution of content over shards | Uneven | Very even | Quite even | Quite even | Quite even | Quite even | Quite even |
Falls back to DBID sharding |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
One shard gets new content |
![]() |
![]() |
Possible | Possible |
![]() |
![]() |
![]() |
Nodes can move shard |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |