NAME

file-to-elasticsearch.pl - A simple utility to tail a file and index each line as a document in ElasticSearch

VERSION

version 0.015

SYNOPSIS

To see available options, run:

file-to-elasticsearch.pl --help

Create a config file and run the utility:

file-to-elasticsearch.pl --config config.yaml --log4perl logging.conf --debug

This will run a single threaded POE instance that will tail the log files you've requested, performing the requested transformations and sending them to the elasticsearch cluster and index you've specified.

CONFIGURATION

Configuration

ElasticSearch Settings

The elasticsearch section of the config controls the settings passed to the POE::Component::ElasticSearch::Indexer.

---
elasticsearch:
  servers: [ "localhost:9200" ]
  flush_interval: 30
  flush_size: 1_000
  index: logstash-%Y.%m.%d
  type: log

The settings available are:

servers

An array of servers used to send bulk data to ElasticSearch. The default is just localhost on port 9200.

flush_interval

Every flush_interval seconds, the queued documents are send to the Bulk API of the cluster.

flush_size

If this many documents is received, regardless of the time since the last flush, force a flush of the queued documents to the Bulk API.

index

A strftime compatible string to use as the DefaultIndex parameter if a file doesn't pass one along.

type

Mostly useless as Elastic is abandoning "types", but this will be set as the DefaultType for documents being indexed.

Tail Section

The files section contains the list of files to tail and the rules to use to index them.

---
tail:
  - file: '/var/log/osquery/result.log'
    index: "osquery-result-%Y.%m.%d"
    decode: json
    extract:
      - by: split
        from: name
        when: '^pack'
        into: 'pack'
        split_on: '/'
        split_parts: [ null, "name", "report" ]
    mutate:
      prune: true
      remove: [ "calendarTime", "epoch", "counter", "_raw" ]
      rename:
        unixTime: _epoch

Each element is a hash containing the following information.

file

Required: The path to the file on the filesystem.

decode

This may be a single element, or an array, containing one or more of the implemented decoders.

json

Decode the discovered JSON in the document to a hash reference. This finds the first occurrence of an { in the string and assumes everything to the end of the string is JSON.

Decoding is done by JSON::MaybeXS.

syslog

Parses each line as a standard UNIX syslog message. Parsing is provided via Parse::Syslog::Line which isn't a hard requirement of the this package, but will be loaded if available.

index

A strftime compatible string to use as the index to put documents created from this file. If not specified, the defaults from the ElasticSearch section will be used, and failing that, the default as specified in POE::Component::ElasticSearch::Index.

type

The type to use for documents sourced from this file.

extract

Extraction of fields from the document by one of the supported methods.

by

Can be 'split' or 'regex'.

split supports:

split_on

Regex or string to split the string on.

split_parts

Name for each part of the split, undef positions in the split string will be discarded.

regex supports:

regex

The regex to use to extract, using capture groups to designate:

regex_parts

Name for reach captured group, undef positions in the list will be discarded.

from

Name of the field to apply the extraction to.

when

Limits applying the extraction to values matching the regex.

into

Top level namespace for the collected keys to wind up inside of, ie:

extract:
  - by: split
    from: name
    when: '^pack'
    into: 'pack'
    split_on: '/'
    split_parts: [ null, "name", "report"  ]

Will look at the field name and when it matches ^pack it will split the name on / and index the second element to name and the third to report, so:

name: pack/os/cpu_info

Becomes:

pack:
  name: os
  report: cpu_info

AUTHOR

Brad Lhotsky <brad@divisionbyzero.net>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018 by Brad Lhotsky.

This is free software, licensed under:

The (three-clause) BSD License