NAME
file-to-elasticsearch.pl - A simple utility to tail a file and index each line as a document in ElasticSearch
VERSION
version 0.015
SYNOPSIS
To see available options, run:
file-to-elasticsearch.pl --help
Create a config file and run the utility:
file-to-elasticsearch.pl --config config.yaml --log4perl logging.conf --debug
This will run a single threaded POE instance that will tail the log files you've requested, performing the requested transformations and sending them to the elasticsearch cluster and index you've specified.
CONFIGURATION
Configuration
ElasticSearch Settings
The elasticsearch
section of the config controls the settings passed to the POE::Component::ElasticSearch::Indexer.
---
elasticsearch:
servers: [ "localhost:9200" ]
flush_interval: 30
flush_size: 1_000
index: logstash-%Y.%m.%d
type: log
The settings available are:
- servers
-
An array of servers used to send bulk data to ElasticSearch. The default is just localhost on port 9200.
- flush_interval
-
Every
flush_interval
seconds, the queued documents are send to the Bulk API of the cluster. - flush_size
-
If this many documents is received, regardless of the time since the last flush, force a flush of the queued documents to the Bulk API.
- index
-
A
strftime
compatible string to use as theDefaultIndex
parameter if a file doesn't pass one along. - type
-
Mostly useless as Elastic is abandoning "types", but this will be set as the
DefaultType
for documents being indexed.
Tail Section
The files
section contains the list of files to tail and the rules to use to index them.
---
tail:
- file: '/var/log/osquery/result.log'
index: "osquery-result-%Y.%m.%d"
decode: json
extract:
- by: split
from: name
when: '^pack'
into: 'pack'
split_on: '/'
split_parts: [ null, "name", "report" ]
mutate:
prune: true
remove: [ "calendarTime", "epoch", "counter", "_raw" ]
rename:
unixTime: _epoch
Each element is a hash containing the following information.
- file
-
Required: The path to the file on the filesystem.
- decode
-
This may be a single element, or an array, containing one or more of the implemented decoders.
- json
-
Decode the discovered JSON in the document to a hash reference. This finds the first occurrence of an
{
in the string and assumes everything to the end of the string is JSON.Decoding is done by JSON::MaybeXS.
- syslog
-
Parses each line as a standard UNIX syslog message. Parsing is provided via Parse::Syslog::Line which isn't a hard requirement of the this package, but will be loaded if available.
- index
-
A
strftime
compatible string to use as the index to put documents created from this file. If not specified, the defaults from the ElasticSearch section will be used, and failing that, the default as specified in POE::Component::ElasticSearch::Index. - type
-
The type to use for documents sourced from this file.
- extract
-
Extraction of fields from the document by one of the supported methods.
- by
-
Can be 'split' or 'regex'.
split supports:
- split_on
-
Regex or string to split the string on.
- split_parts
-
Name for each part of the split,
undef
positions in the split string will be discarded.
regex supports:
- regex
-
The regex to use to extract, using capture groups to designate:
- regex_parts
-
Name for reach captured group,
undef
positions in the list will be discarded.
- from
-
Name of the field to apply the extraction to.
- when
-
Limits applying the extraction to values matching the regex.
- into
-
Top level namespace for the collected keys to wind up inside of, ie:
extract: - by: split from: name when: '^pack' into: 'pack' split_on: '/' split_parts: [ null, "name", "report" ]
Will look at the field name and when it matches
^pack
it will split the name on/
and index the second element toname
and the third toreport
, so:name: pack/os/cpu_info
Becomes:
pack: name: os report: cpu_info
AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2018 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License