The following sections describe the Alfresco Full Text Search (FTS) syntax.
The Alfresco Full Text Search (FTS) query text can be used standalone or it can be embedded in CMIS-SQL using the contains()
predicate function. The CMIS specification supports a subset of FTS. The full power of FTS can not be used and, at the same time, maintain portability between CMIS repositories.
FTS is exposed directly by the interface, which adds its own template, and is also used as its default field. The default template is:
%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT)
When FTS is embedded in CMIS-SQL, only the CMIS-SQL-style property identifiers (cmis:name
) and aliases, CMIS-SQL column aliases, and the special fields listed can be used to identify fields. The SQL query defines tables and table aliases after from
and join
clauses. If the SQL query references more than one table, the contains()
function must specify a single table to use by its alias. All properties in the embedded FTS query are added to this table and all column aliases used in the FTS query must refer to the same table. For a single table, the table alias is not required as part of the contains()
function.
When FTS is used standalone, fields can also be identified using prefix:local-name
and {uri}local-name
styles.
Query time boosts allow matches on certain parts of the query to influence the score more than others.
All query elements can be boosted: terms, phrases, exact terms, expanded terms, proximity (only in filed groups), ranges, and groups.
term^2.4
"phrase"^3
term~0.8^4
=term^3
~term^4
cm:name:(big * yellow)^4
1..2^2
[1 TO 2]^2
yellow AND (car OR bus)^3
Search using date math
The date field types in Solr support the date math expressions.
The date math expression makes it easy to create times relative to fixed moments in time and includes the current time which can be represented using the special value of NOW
.
Date math syntax
The date math expressions consist either adding some quantity of time in a specified unit, or rounding the current time by a specified unit. Expressions can be chained and are evaluated left to right.
For example, to represents a point in time two months from now, use:
NOW+2MONTHS
To represents a point in time one day ago, use:
NOW-1DAY
A slash is used to indicate rounding. To represents the beginning of the current hour, use:
NOW/HOUR
To represent a point in time six months and three days into the future and then rounds that time to the beginning of that day, use:
NOW+6MONTHS+3DAYS/DAY
While date math is most commonly used relative to NOW
, it can be applied to any fixed moment in time as well:
1972-05-20T17:33:18.772Z+6MONTHS+3DAYS/DAY
Note: Solr 6 date math supports
TODAY
.
Search for disjunctions
Single terms, phrases, and so on can be combined using OR
in upper, lower, or mixed case.
The OR
operator is interpreted as “at least one is required, more than one or all can be returned”.
If not otherwise specified, by default search fragments will be ORed
together.
big yellow banana
big OR yellow OR banana
TEXT:big TEXT:yellow TEXT:banana
TEXT:big OR TEXT:yellow OR TEXT:banana
These queries search for nodes that contain at least one of the terms big
, yellow
, or banana
in any content.
Escaping characters
Any character can be escaped using the backslash “” in terms, IDs (field identifiers), and phrases. Java unicode escape sequences are supported. Whitespace can be escaped in terms and IDs.
For example:
cm:my content:my name
Search for an exact term
To search for an exact term you must prefix it with “=”. This ensures that the term will not be tokenized, therefore you can search for stop words. If both FTS and ID base search are supported for a specified or implied property, then exact matching will be used where possible. For example, the following query will match running but will not be tokenized. If you are using stemming it might not match anything.
=running
For the cm:name
field, which is in the index as both tokenized and untokenized, it will use the untokenized field. For example, =part
will only match the exact term “part”. If you use =part*
it will match additional terms, like “partners”. If there is no untokenized field in the index, it will fall back to use the tokenized field, and then, with stemming/plurals, it would match.
Search in fields
Search specific fields rather than the default. Terms, phrases, etc. can all be preceded by a field. If not the default field TEXT is used.
field:term
field:"phrase"
=field:exact
~field:expand
Fields fall into three types: property fields, special fields, and fields for data types.
Property fields evaluate the search term against a particular property, special fields are described in the following table, and data type fields evaluate the search term against all properties of the given type.
Type | Description |
---|---|
Property | Fully qualified property, for example {http://www.alfresco.org/model/content/1.0}name:apple |
Property | Fully qualified property, for example @{http://www.alfresco.org/model/content/1.0}name:apple |
Property | CMIS style property, for example cm_name:apple . |
Property | Prefix style property, for example cm:name:apple . |
Property | Prefix style property, for example @cm:name:apple . |
Property | TEXT, for example TEXT:apple . |
Special | ID, for example ID:"NodeRef" |
Special | ISROOT, for example ISROOT:T |
Special | TX, for example TX:"TX" |
Special | PARENT, for example PARENT:"NodeRef" |
Special | PRIMARYPARENT, for example PRIMARYPARENT:"NodeRef" . |
Special | QNAME, for example QNAME:"app:company_home" . |
Special | CLASS, for example CLASS:"qname" . |
Special | EXACTCLASS, for example EXACTCLASS:"qname" . |
Special | TYPE, for example TYPE:"qname" . |
Special | EXACTTYPE, for example EXACTTYPE:"qname" . |
Special | ASPECT for example ASPECT:"qname" . |
Special | EXACTASPECT, for example EXACTASPECT:"qname" . |
Special | ISUNSET for example ISUNSET:"property-qname" |
Special | ISNULL, for example ISNULL:"property-qname" . |
Special | ISNOTNULL, for example ISNOTNULL:"property-qname" . |
Special | EXISTS for example EXISTS:"name of the property" . |
Special | SITE for example SITE:"shortname of the site" . |
Special | TAG. TAG: “name of the tag” Note: TAG must be in upper case. |
Fully qualified data type | Data Type, http://www.alfresco.org/model/dictionary/1.0}content:apple |
prefixed data type | Data Type, d:content:apple |
Mixed FTS ID behavior
This relates to the priority defined on properties in the data dictionary, which can be both tokenized or untokenized.
Explicit priority is set by prefixing the query with “=” for identifier pattern matches.
The tilde (~
) can be used to force tokenization.
Search for fuzzy matching
Alfresco supports fuzzy searches based on the Lucene default Levenshtein Distance.
To do a fuzzy search use the tilde (~
) symbol at the end of a single word term with a parameter between 0 and 1 to specify the required similarity. Use a value closer to 1 for higher similarity.
For example, to search for a term similar in spelling to roam use the fuzzy search:
roam~0.9
This search will find terms like foam, roaming, and roams.
Search for grouping
Use parentheses to encapsulate OR
statements for the search engine to execute them properly.
The OR
operator is executed as “I would like at least one of these terms”.
Groupings of terms are made using ( and )
. Groupings of all query elements are supported in general. Groupings are also supported after a field - field group.
The query elements in field groups all apply to the same field and cannot include a field.
(big OR large) AND banana
title:((big OR large) AND banana)
Search query literals
When you search, entries are generally a term or a phrase. The string representation you type in will be transformed to the appropriate type for each property when executing the query. For convenience, there are numeric literals but string literals can also be used.
Date formatting
You can specify either a particular date or a date literal. A date literal is a fixed expression that represents a relative range of time, for example last month, this week, or next year.
dateTime
field values are stored as Coordinated Universal Time (UTC). The date fields represent a point in time with millisecond precision. For date field formatting, Solr uses DateTimeFormatter.ISO_INSTANT. The ISO instant formatter formats an instant in Coordinated Universal Time (UTC), for example:
YYYY-MM-DDThh:mm:ssZ
where,
YYYY
is the year.MM
is the month.DD
is the day of the month.hh
is the hour of the day as on a 24-hour clock.mm
is minutes.ss
is seconds.Z
is a literalZ
character indicating that this string representation of the date is in UTC.
Note: No time zone can be specified. The string representation of dates is always expressed in UTC, for example:
1972-05-20T17:33:18Z
String literals
String literals for phrases can be enclosed in double quotes or single quotes. Java single character and uXXXX
-based escaping are supported within these literals.
Integer and decimal literals conform to the Java definitions.
Dates as any other literal can be expressed as a term or phrase. Dates are in the format ......
Any or all of the time can be truncated.
In range queries, strings, term, and phrases that do not parse to valid type instance for the property are treated as open ended.
test:integer[ 0 TO MAX] matches anything positive
Search for negation
You can narrow your search results by excluding words with the NOT
syntax.
Single terms, phrases, and so on can be combined using “NOT
” in upper, lower, or mixed case, or prefixed with “!
” or “-
”.
These queries search for nodes that contain the terms yellow
in any content.
yellow NOT banana
yellow !banana
yellow -banana
NOT yellow banana
-yellow banana
!yellow banana
The NOT
operator can only be used for string keywords; it doesn’t work for numerals or dates.
Prefixing any search qualifier with a -
excludes all results that are matched by that qualifier.
Search for optional, mandatory, and excluded elements of a query
Sometimes AND and OR are not enough. If you want to find documents that must contain the term “car”, score those with the term “red” higher, but do not match those just containing “red”.
Operator | Description |
---|---|
”,” | The field, phrase, group is optional; a match increases the score. |
”+” | The field, phrase, group is mandatory (Note: this differs from Google - see “=”) |
”-“, “!” | The field, phrase, group must not match. |
The following example finds documents that contain the term “car”, score those with the term “red” higher, but does not match those just containing “red”:
+car |red
Note: At least one element of a query must match (or not match) for there to be any results.
All AND
and OR
constructs can be expressed with these operators.
Search for a phrase
Phrases are enclosed in double quotes. Any embedded quotes can be escaped using ``. If no field is specified then the default TEXT field will be used, as with searches for a single term.
The whole phrase will be tokenized before the search according to the appropriate data dictionary definition(s).
"big yellow banana"
Search for operator precedence
Operator precedence is SQL-like (not Java-like). When there is more than one logical operator in a statement, and they are not explicitly grouped using parentheses, NOT
is evaluated first, then AND
, and finally OR
.
The following shows the operator precedence from highest to lowest:
"
[, ], <, >
()
~ (prefix and postfix), =
^
+, |, -
NOT,
AND
OR
AND
and OR
can be combined with +
, |
, -
with the following meanings:
AND (no prefix is the same as +) | Description |
---|---|
big AND dog |
big and dog must occur |
+big AND +dog |
big and dog must occur |
big AND +dog |
big and dog must occur |
+big AND dog |
big and dog must occur |
big AND \|dog |
big must occur and dog should occur |
\|big AND dog |
big should occur and dog must occur |
\|big AND \|dog |
both big and dog should occur, and at least one must match |
big AND -dog |
big must occur and dog must not occur |
-big AND dog |
big must not occur and dog must occur |
-big AND -dog |
both big and dog must not occur |
\|big AND -dog |
big should occur and dog must not occur |
OR (no prefix is the same as +) | Description |
---|---|
dog OR wolf |
dog and wolf should occur, and at least one must match |
+dog OR +wolf |
dog and wolf should occur, and at least one must match |
dog OR +wolf |
dog and wolf should occur, and at least one must match |
+dog OR wolf |
dog and wolf should occur, and at least one must match |
dog OR \|wolf |
dog and wolf should occur, and at least one must match |
\|dog OR wolf |
dog and wolf should occur, and at least one must match |
\|dog OR \|wolf |
dog and wolf should occur, and at least one must match |
dog OR -wolf |
dog should occur and wolf should not occur, one of the clauses must be valid for any result |
-dog OR wolf |
dog should not occur and wolf should occur, one of the clauses must be valid for any result |
-dog OR -wolf |
dog and wolf should not occur, one of the clauses must be valid for any result |
Search for proximity
Google-style proximity is supported.
To specify proximity for fields, use grouping.
big * apple
TEXT:(big * apple)
big *(3) apple
TEXT:(big *(3) apple)
Search query templates
The FTS query language supports query templates. These are intended to help when building application specific searches.
A template is a query but with additional support to specify template substitution.
-
%field
Insert the parse tree for the current
ftstest
and replace all references to fields in the current parse tree with the supplied field. -
%(field1, field2)%(field1 field2)
(The comma is optional.) Create a disjunction, and for each field, add the parse tree for the current
ftstest
to the disjunction, and then replace all references to fields in the current parse tree with the current field from the list.
Name | Template | Example Query | Expanded Query |
---|---|---|---|
t1 | %cm:name | t1:n1 | cm:name:n1 |
t1 | %cm:name | t1:”n1” | cm:name:”n1” |
t1 | %cm:name | ~t1:n1^4 | ~cm:name:n1^4 |
t2 | %(cm:name, cm:title) | t2:”woof” | (cm:name:”woof” OR cm:title:”woof”) |
t2 | %(cm:name, cm:title) | ~t2:woof^4 | (~cm:name:woof OR ~cm:title:woof)^4 |
t3 | %cm:name AND my:boolean:true | t3:banana | (cm:name:banana AND my:boolean:true) |
Templates can refer to other templates.
nameAndTitle -> %(cm:name, cm:title)
nameAndTitleAndDesciption -> %(nameAndTitle, cm:description)
Search for ranges
Inclusive ranges can be specified in Google-style. There is an extended syntax for more complex ranges. Unbounded ranges can be defined using MIN and MAX for numeric and date types and “u0000” and “FFFF” for text (anything that is invalid).
Lucene | Description | Example | |
---|---|---|---|
[#1 TO #2] |
#1..#2 |
The range #1 to #2 inclusive #1 <= x <= #2 |
0..5 [0 TO 5] |
<#1 TO #2] |
The range #1 to #2 including #2 but not #1.#1 < x <= #2 |
<0 TO 5] |
|
[#1 TO #2> |
The range #1 to #2 including #1 but not #2.#1 <= x < #2 |
[0 TO 5> |
|
<#1 TO #2> |
The range #1 to #2 exclusive.#1 < x < #2 |
<0 TO 5> |
TEXT:apple..banana
my:int:[0 TO 10]
my:float:2.5..3.5
my:float:0..MAX
mt:text:[l TO "uFFFF"]
Search for a single term
Single terms are tokenized before the search according to the appropriate data dictionary definition(s).
If you do not specify a field, it will search in the content and properties. This is a shortcut for searching all properties of type content. Terms can not contain a whitespace.
banana
TEXT:banana
Both of these queries will find any nodes with the word “banana” in any property of type d:content
.
If the appropriate data dictionary definition(s) for the field supports both FTS and untokenized search, then FTS search will be used. FTS will include synonyms if the analyzer generates them. Terms cannot contain whitespace.
Search for spans and positions
Spans and positions are not implemented. Positions will depend on tokenization.
Anything more detailed than one *(2) two are arbitrarily dependent on the tokenization. An identifier and pattern matching, or dual FTS and ID tokenization, might be the answer in these cases.
term[^] - start
term[$] - end
term[position]
These are of possible use but excluded for now. Lucene surround extensions:
and(terms etc)
99w(terms etc)
97n(terms etc)
Search for term expansion
To force tokenization and term expansion, prefix the term with ~
.
For a property with both ID and FTS indexes, where the ID index is the default, force the use of the FTS index.
~running
Search for wildcards
Wildcards are supported in terms, phrases, and exact phrases using *
to match zero, one, or more characters and ?
to match a single character.
The *
wildcard character can appear on its own and implies Google-style. The “anywhere after” wildcard pattern can be combined with the =
prefix for identifier based pattern matching. Search will return and highlight any word that begins with the root of the word truncated by the *
wildcard character.
The following will all find the term apple.
TEXT:app?e
TEXT:app*
TEXT:*pple
appl?
*ple
=*ple
"ap*le"
"***le"
"?????"
When performing a search that includes a wildcard character, it is best to wrap your search term in double quotation marks. This ensures all metadata and content are searched.
Embed queries in CMIS
These examples show how to embed queries in CMIS.
Embedded in CMIS contains()
- strict queries
SELECT * FROM Document WHERE CONTAINS("zebra")
SELECT * FROM Document WHERE CONTAINS("quick")
- Alfresco extensions
SELECT * FROM Document D WHERE CONTAINS(D, 'cmis:name:\'Tutorial\'')
SELECT cmis:name as BOO FROM Document D WHERE CONTAINS('BOO:\'Tutorial\'')
Search Service
ResultSet results = searchService.query(storeRef, SearchService.LANGUAGE_FTS_ALFRESCO, "quick");
SearchService.LANGUAGE_FTS_ALFRESCO = "fts-alfresco"
Node Browser
FTS is supported in the node browser.
JavaScript
search
{
query: string, mandatory, in appropriate format and encoded for the given language
store: string, optional, defaults to 'workspace://SpacesStore'
language: string, optional, one of: lucene, xpath, jcr-xpath, fts-alfresco - defaults to 'lucene'
templates: [], optional, Array of query language template objects (see below) - if supported by the language
sort: [], optional, Array of sort column objects (see below) - if supported by the language
page: object, optional, paging information object (see below) - if supported by the language
namespace: string, optional, the default namespace for properties
defaultField: string, optional, the default field for query elements when not explicit in the query
onerror: string optional, result on error - one of: exception, no-results - defaults to 'exception'
}
sort
{
column: string, mandatory, sort column in appropriate format for the language
ascending: boolean optional, defaults to false
}
page
{
maxItems: int, optional, max number of items to return in result set
skipCount: int optional, number of items to skip over before returning results
}
template
{
field: string, mandatory, custom field name for the template
template: string mandatory, query template replacement for the template
}
For example:
var def =
{
query: "cm:name:test*",
language: "fts-alfresco"
};
var results = search.query(def);
Templates
FTS is not supported in FreeMarker.