NAME
Elastic::Manual::Terminology - Explanation of terminology and concepts
VERSION
version 0.29_2
MAIN ELASTICSEARCH TERMS
Index
An "Index" is the equivalent of a "database" in a relational DB (not to be confused with an "index" in a relational DB). It has a "Mapping", which defines multiple Types.
Internally, an Index is a logical namespace which points to one or more primary shards, each of which may have zero or more replica shards. You can change the number of replica shards on an existing index, but the number of primary shards is fixed at index creation time.
Searches can be performed across multiple indices.
Note: an index name must be a lower case string, without any spaces.
See also "Alias", "Domain", Elastic::Model::Index and Elastic::Manual::Scaling.
Alias
An "Alias" is like a shortcut to one or more Indices. For instance, you could have Alias myapp
which points to Index myapp_v1
. Your code can talk just to the Alias.
When you want to change the structure of your index, you could reindex all your docs to the new Index myapp_v2
and, when ready, switch the myapp
Alias to point to myapp_v2
instead.
An Alias may also point to multiple indices. For example you might have indices logs_jan_2012
, logs_feb_2012
, ... logs_dec_2012
, and an alias logs_2012
which points to all 12 indices. This allows you to use a single alias name to search multiple indices.
Note: you can't index new docs to an alias that points to multiple indices. An alias used by a "Domain" must point to a single index only, but an alias used by a "View" can point to multiple indices.
Also see "Domain", Elastic::Model::Alias and Elastic::Manual::Scaling.
Type
A "Type" is like a "table" in a relational DB. For instance, you may have a user
type, a comment
type etc. An "Index" can have multiple types (just like a database can have multiple tables). In Elastic::Model, objects (Documents) of each type are handled by a single class, eg MyApp::User
, MyApp::Comment
. (See "Namespace").
Each Type has a "Mapping", which defines the list of Fields in that type. Searches can be performed across multiple types.
Also see "Namespace", "Mapping", "Document" and "Field".
Mapping
Each "Type" has a "Mapping" which is like a "schema definition" in a relational DB. It defines various type-wide settings, plus the field-type (eg integer
, object
, string
) for each "Field" (attribute) in the type, and specifics about how each field should be analyzed.
New fields can be added to a mapping, but generally existing fields may not be changed. Instead, you have to create a new index with the new mapping and reindex your data.
Elastic::Model generates the mapping for you using Moose's introspection. Attribute keywords are provided to give you control over the mapping process.
Document
A "Document" is like a "row" in a relational DB table. Elastic::Model converts your objects into a JSON object (essentially a hashref), which is the Document that is stored in Elasticsearch. We use the terms "Object" and "Document" interchangably.
Each Document is stored in a single primary shard in an "Index", has a single "Type", an "ID" and zero or more Fields.
The original JSON object is stored in the special _source
field, which is returned by default when you retrieve a document by ID, or when you perform a search.
Field
A "Field" is like a "column" in a table in a relational DB. Each field has a field-type, eg integer
, string
, datetime
etc. The attributes of your Moose classes are stored as fields.
Multi-level hashes can be stored, but internally these get flattened. For instance:
{
husband => {
firstname => 'Joe',
surname => 'Bloggs'
},
wife => {
firstname => 'Alice',
surname => 'Bloggs'
}
}
... is flattened to:
{
'husband.firstname' => 'Joe',
'husband.surname' => 'Bloggs',
'wife.firstname' => 'Alice',
'wife.surname' => 'Bloggs',
}
You could search on the firstname
field, which would search the firstname for both the husband and the wife, or by specifying the fieldname in full, you could search on just the husband.firstname
field.
ID
The "ID" of a document identifies a document uniquely in an "Index". If no ID is provided, then it will be auto-generated.
ELASTIC::MODEL TERMS
Model
A "Model" is the Boss Object, which ties an instance of your application to a particular Elasticsearch "Cluster". You can have multiple instances of your Model class which connect to different clusters.
See Elastic::Model and Elastic::Model::Role::Model for more.
Namespace
A "Model" can contain multiple "Namespaces". A Namespace has one or more Domains and, for those Domains, defines which of your classes should be used for a "Document" of a particular "Type".
For instance: in Domain myapp_current
, which belongs to Namespace myapp
, objects of class MyApp::User
should be stored in Elasticsearch as documents of Type user
.
A namespace is also used for administering (creating, deleting, updating) Indices or Aliases in Elasticsearch.
See Elastic::Model::Namespace and "Domain".
Domain
A "Domain" is like a database handle used for creating, updating or deleting individual objects or Documents. The $domain->name
can be the name of an "Index" or an Index Alias (which points to a single index) in Elasticsearch. A domain can only belong to a single namespace.
View
A "View" is used for querying documents/objects in Elasticsearch. A View can query single or multiple Domains (belonging to different Namespaces) and single or multiple Types.
See Elastic::Model::View, "Query" and "Filter".
UID
A "UID" is the unique identifier of a "Document". It is handled by Elastic::Model::UID. The "Namespace" / "Type" / "ID" combination of a document must be unique. While Elasticsearch will check for "uniqueness" in a single "Index" it is the reponsbility of the user to ensure uniqueness across all of the Domains in a "Namespace".
Also see "Routing".
SEARCH TERMS
Analysis
"Analysis" is the process of converting Full Text to Terms. For instance the english
analyzer will convert this phrase:
The QUICK brown Fox has been noted to JUMP over lazy dogs.
... into these terms/tokens:
quick, brown, fox, ha, been, note, jump, over, lazi, dog
... which is what is actually stored in the index.
A full text query (not a term query) for "brown FOXES and a Lazy dog"
will also be analyzed to the terms "brown, fox, lazi, dog"
, and will thus match the terms stored in the index.
It is this process of analysis (both at index time and at search time) that allows Elasticsearch to perform full text queries.
See also "Text" and "Term" and "Query".
Term
A term is an exact value that is indexed in Elasticsearch. The terms foo
, Foo
, FOO
are NOT equivalent. Terms (ie exact values) can be searched for using "term" queries.
See also "Text", "Analysis" and "Query".
Text
Text (or full text) is ordinary unstructured text, such as this paragraph. By default, text will by analyzed into terms, which is what is actually stored in the index.
Text fields need to be analyzed at index time in order to be searchable as full text, and keywords in full text queries must be analyzed at search time to produce (and search for) the same terms that were generated at index time.
See also "Term", "Analysis" and "Query".
Query
A "Query" is used to search for Documents in Elasticsearch, using Views. It can be expressed either in the native Elasticsearch Query DSL or using the more Perlish ElasticSearch::SearchBuilder syntax.
By default, a Query sorts the results by relevance (_score
).
There are two broad groups of queries: "Full Text Query" and "Term Query".
Term Query
A "Term Query" searches for exactly the Terms provided. For instance, a search for "FOO"
will not match the term "foo"
.
This is useful for values that are not full text, eg enums, dates, numbers, canonicalized post codes, etc.
Full Text Query
A "Full Text Query" is useful for searching text like this paragraph. The search keywords are first Analyzed into Terms so that they correspond to the actual values that are stored in Elasticsearch. Then the query itself is built up out of multiple Term Queries.
It is important to use the same analyzer on both (1) the values in the field(s) you are searching (index analyzer) and (2) the search keywords in the query (search analyzer), so that the both processes produce the same terms. Otherwise, they won't match.
Filter
A "Filter" is similar to a "Term Query" except that there is no "relevance scoring" phase. A Filter says: "Yes this document should be included", or "No this document should be excluded".
For instance, you may want to run a "Full Text Query" on your BlogPost documents, searching for the keywords "perl moose"
, but only for BlogPosts that have been published this year. This could be achieved by using a Range filter within a query.
Filters can be expressed either in the native Elasticsearch Query DSL or using the more Perlish ElasticSearch::SearchBuilder syntax.
OTHER ELASTICSEARCH TERMS
Cluster
A "Cluster" is a collection of Nodes which function together - they all share the same cluster.name. The cluster elects a single "master node" which controls the cluster. If the master node fails, another node is automatically elected.
Node
A "Node" is a running instance of Elasticsearch. Normally, you would only run one instance of Elasticsearch on one server, so a Node is roughly equivalent to a server. When a Node starts, it tries to join a "Cluster" which shares the same cluster name. If it fails to find an existing cluster, it will form a new one.
Shard
A "Shard" is a single instance of Lucene (what Elasticsearch uses internally to provide its search function). Shards are the building blocks of Indices - each index consists of at least one shard.
A shard can be a "primary shard" or a "replica shard". A primary shard is responsible for storing a newly indexed doc first. Once it has been indexed by the primary shard, the new doc is indexed on all of the replica shards (if there are any) in parallel to ensure that there are multiple copies of each document in the cluster.
If a primary shard fails, then a replica shard will be promoted to be a primary shard, and a new replica will be allocated on a different "Node", if there is one available.
A replica shard will never run on the same node as its primary shard, otherwise if that node were to go down, it would take both the primary and replica shard with it.
Routing
When you index a document, it is stored on a single primary shard. That shard is chosen by hashing the "Routing" value. By default, the Routing value is derived from the "ID" of the document or, if the document has a specified parent document, from the ID of the parent document (to ensure that child and parent documents are stored on the same shard).
This value can be overridden by specifying a routing
value at index time, a routing field in the mapping or by using an "Alias" with a built-in routing. See Elastic::Manual::Scaling for more.
AUTHOR
Clinton Gormley <drtech@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Clinton Gormley.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.