NAME
App::ElasticSearch::Utilities::QueryString - CLI query string fixer
VERSION
version 8.8
SYNOPSIS
This class provides a pluggable architecture to expand query strings on the command-line into complex Elasticsearch queries.
ATTRIBUTES
context
Defaults to 'query', but can also be set to 'filter' so the elements will be added to the 'must' or 'filter' parameter.
search_path
An array reference of additional namespaces to search for loading the query string processing plugins. Example:
$qs->search_path([qw(My::Company::QueryString)]);
This will search:
App::ElasticSearch::Utilities::QueryString::*
My::Company::QueryString::*
For query processing plugins.
default_join
When fixing up the query string, if two tokens are found next to eachother missing a joining token, join using this token. Can be either AND
or OR
, and defaults to AND
.
plugins
Array reference of ordered query string processing plugins, lazily assembled.
fields_meta
A hash reference with the field data from App::ElasticSearch::Utilities::es_index_fields.
METHODS
expand_query_string(@tokens)
This function takes a list of tokens, often from the command line via @ARGV. Uses a plugin infrastructure to allow customization.
Returns: App::ElasticSearch::Utilities::Query object
TOKENS
The token expansion plugins can return undefined, which is basically a noop on the token. The plugin can return a hash reference, which marks that token as handled and no other plugins receive that token. The hash reference may contain:
- query_string
-
This is the rewritten bits that will be reassembled in to the final query string.
- condition
-
This is usually a hash reference representing the condition going into the bool query. For instance:
{ terms => { field => [qw(alice bob charlie)] } }
Or
{ prefix => { user_agent => 'Go ' } }
These conditions will wind up in the must or must_not section of the bool query depending on the state of the the invert flag.
- invert
-
This is used by the bareword "not" to track whether the token invoked a flip from the must to the must_not state. After each token is processed, if it didn't set this flag, the flag is reset.
- dangles
-
This is used for bare words like "not", "or", and "and" to denote that these terms cannot dangle from the beginning or end of the query_string. This allows the final pass of the query_string builder to strip these words to prevent syntax errors.
Extended Syntax
The search string is pre-analyzed before being sent to ElasticSearch. The following plugins work to manipulate the query string and provide richer, more complete syntax for CLI applications.
App::ElasticSearch::Utilities::QueryString::Barewords
The following barewords are transformed:
or => OR
and => AND
not => NOT
App::ElasticSearch::Utilities::QueryString::Text
Provides field prefixes to manipulate the text search capabilities.
Terms Query via '='
Provide an '=' prefix to a query string parameter to promote that parameter to a term
filter.
This allows for exact matches of a field without worrying about escaping Lucene special character filters.
E.g.:
user_agent:"Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1"
Is evaluated into a weird query that doesn't do what you want. However:
=user_agent:"Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1"
Is translated into:
{ term => { user_agent => "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1" } }
Wildcard Query via '*'
Provide an '*' prefix to a query string parameter to promote that parameter to a wildcard
filter.
This uses the wild card match for text fields to making matching more intuitive.
E.g.:
*user_agent:"Mozilla*"
Is translated into:
{ wildcard => { user_agent => "Mozilla* } }
Regexp Query via '/'
Provide an '/' prefix to a query string parameter to promote that parameter to a regexp
filter.
If you want to use regexp matching for finding data, you can use:
/message:'\\bden(ial|ied|y)'
Is translated into:
{ regexp => { message => "\\bden(ial|ied|y)" } }
Fuzzy Matching via '~'
Provide an '~' prefix to a query string parameter to promote that parameter to a fuzzy
filter.
~message:deny
Is translated into:
{ fuzzy => { message => "deny" } }
Phrase Matching via '+'
Provide an '+' prefix to a query string parameter to promote that parameter to a match_phrase
filter.
+message:"login denied"
Is translated into:
{ match_phrase => { message => "login denied" } }
Automatic Match Queries for Text Fields
If the field meta data is provided and the field is a text
type, the query will automatically be mapped to a match
query.
# message field is text
message:"foo"
Is translated into:
{ match => { message => "foo" } }
App::ElasticSearch::Utilities::QueryString::IP
If a field is an IP address uses CIDR Notation, it's expanded to a range query.
src_ip:10.0/8 => src_ip:[10.0.0.0 TO 10.255.255.255]
App::ElasticSearch::Utilities::QueryString::Ranges
This plugin translates some special comparison operators so you don't need to remember them anymore.
Example:
price:<100
Will translate into a:
{ range: { price: { lt: 100 } } }
And:
price:>50,<100
Will translate to:
{ range: { price: { gt: 50, lt: 100 } } }
Supported Operators
gt via >, gte via >=, lt via <, lte via <=
App::ElasticSearch::Utilities::QueryString::Underscored
This plugin translates some special underscore surrounded tokens into the Elasticsearch Query DSL.
Implemented:
_prefix_
Example query string:
_prefix_:useragent:'Go '
Translates into:
{ prefix => { useragent => 'Go ' } }
App::ElasticSearch::Utilities::QueryString::FileExpansion
If the match ends in .dat, .txt, .csv, or .json then we attempt to read a file with that name and OR the condition:
$ cat test.dat
50 1.2.3.4
40 1.2.3.5
30 1.2.3.6
20 1.2.3.7
Or
$ cat test.csv
50,1.2.3.4
40,1.2.3.5
30,1.2.3.6
20,1.2.3.7
Or
$ cat test.txt
1.2.3.4
1.2.3.5
1.2.3.6
1.2.3.7
Or
$ cat test.json
{ "ip": "1.2.3.4" }
{ "ip": "1.2.3.5" }
{ "ip": "1.2.3.6" }
{ "ip": "1.2.3.7" }
We can source that file:
src_ip:test.dat => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
src_ip:test.json[ip] => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
This make it simple to use the --data-file output options and build queries based off previous queries. For .txt and .dat file, the delimiter for columns in the file must be either a tab or a null. For files ending in .csv, Text::CSV_XS is used to accurate parsing of the file format. Files ending in .json are considered to be newline-delimited JSON.
You can also specify the column of the data file to use, the default being the last column or (-1). Columns are zero-based indexing. This means the first column is index 0, second is 1, .. The previous example can be rewritten as:
src_ip:test.dat[1]
or: src_ip:test.dat[-1]
For newline delimited JSON files, you need to specify the key path you want to extract from the file. If we have a JSON source file with:
{ "first": { "second": { "third": [ "bob", "alice" ] } } }
{ "first": { "second": { "third": "ginger" } } }
{ "first": { "second": { "nope": "fred" } } }
We could search using:
actor:test.json[first.second.third]
Which would expand to:
{ "terms": { "actor": [ "alice", "bob", "ginger" ] } }
This option will iterate through the whole file and unique the elements of the list. They will then be transformed into an appropriate terms query.
Wildcards
We can also have a group of wildcard or regexp in a file:
$ cat wildcards.dat
*@gmail.com
*@yahoo.com
To enable wildcard parsing, prefix the filename with a *
.
es-search.pl to_address:*wildcards.dat
Which expands the query to:
{
"bool": {
"minimum_should_match":1,
"should": [
{"wildcard":{"to_outbound":{"value":"*@gmail.com"}}},
{"wildcard":{"to_outbound":{"value":"*@yahoo.com"}}}
]
}
}
No attempt is made to verify or validate the wildcard patterns.
Regular Expressions
If you'd like to specify a file full of regexp, you can do that as well:
$ cat regexp.dat
.*google\.com$
.*yahoo\.com$
To enable regexp parsing, prefix the filename with a ~
.
es-search.pl to_address:~regexp.dat
Which expands the query to:
{
"bool": {
"minimum_should_match":1,
"should": [
{"regexp":{"to_outbound":{"value":".*google\\.com$"}}},
{"regexp":{"to_outbound":{"value":".*yahoo\\.com$"}}}
]
}
}
No attempt is made to verify or validate the regexp expressions.
App::ElasticSearch::Utilities::QueryString::Nested
Implement the proposed nested query syntax early. Example:
nested_path:"field:match AND string"
AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2024 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License