NAME

BuzzSaw::Importer - Imports log entries of interest from data sources

VERSION

This documentation refers to BuzzSaw::Importer version 0.11.0

SYNOPSIS

use BuzzSaw::Importer;
use BuzzSaw::DataSource::Files;

my $source = BuzzSaw::DataSource::Files->new(
  parser      => "RFC3339",
  names       => [qr/^.*\.log(-\d+)?$/],
  directories => [@ARGV],
  recursive   => 1,
);

my $importer = BuzzSaw::Importer->new(
  sources   => [$source],
  filters   => ["Kernel"],
);

$importer->import_events;

DESCRIPTION

The BuzzSaw project provides a suite of tools for processing log file entries. Entries in files are parsed and filtered into a set of events of interest which are stored in a database. A report generation framework is also available which makes it easy to generate regular reports regarding the events discovered.

ATTRIBUTES

sources

This is a reference to a list of objects which implement the BuzzSaw::DataSource role. You must specify at least one data source. These sources will be queried for entries until all sources are exhausted.

Each element in the list can be expressed as an array reference where the first element is the name of the class of object to be created. Any subsequent elements are then passed in as arguments for the object creation. This makes it possible to do something like this:

my $importer = BuzzSaw::Importer->new( sources => [
                 [ Files => {
                         names       => ["*.log"],
                         directories => ["/var/log"],
                         recursive   => 0 } ],
                 ],
               );

This is primarily useful for saving the complete configuration into a file for later use with the new_with_config method.

filters

This is a reference to a list of objects which implement the BuzzSaw::Filter role. Each filter is called in sequence for each log entry to find events of interest. If any filter expresses interest in the event then it will be stored into the database. Note that it is possible to do fairly complex filtering by careful sequencing of the filter order, this allows a filter to rely on the results of those earlier in the stack.

There are 3 possible scenarios for results returned by a filter: (1) If the returned value is positive the entry and tags will be stored. (2) If the returned value is zero the entry will not be stored unless another filter in the stack expresses interest, any tags returned will be totally ignored. (3) If the returned value is negative then the entry will not be stored unless another filter in the stack expresses interest BUT the tags will be retained and stored if the final decision is to store the entry. This makes it possible to do additional post-processing which does not alter the results from the previous filters. For instance, the UserClassifier filter adds a user type tag for any filter which sets the userid field (e.g. SSH and Cosign).

If you do not specify any filters then ALL events will be automatically accepted and the entire data set will be stored into the database.

If an element in the list passed in is a string then it is considered to be a class name in the BuzzSaw::Filter namespace. Short names are allowed, e.g. passing in Kernel would result in a new BuzzSaw::Filter::Kernel object being created.

db

This is a reference to the BuzzSaw::DB object which will be used to store any events of interest. It will also be passed into the various data source objects so that it can be used to register parsed log sources, check for previously seen sources, etc.

It is possible to specify this as a string in which case that will be considered to be a configuration file name and it will be handed off to the new_with_config method for the BuzzSaw::DB class.

If you do not specify the BuzzSaw::DB object then a new one will be created by calling the new_with_config method (which will use the default configuration file name for that class).

readall

This is a boolean value which controls whether or not all files should be read. If it is set to true (i.e. a value of 1 - one) then the code which attempts to avoid re-reading previously seen files will not be used. The default value is false (i.e. a value of 0 - zero). When this value is set to true it will be set to true for all the data sources, this makes it easier to override globally. When it is false then the specific setting for the data source will be used.

SUBROUTINES/METHODS

This class has the following methods:

$importer = BuzzSaw::Importer->new()

This will create a new BuzzSaw::Importer object. You will need to specify, at least, a data source.

$importer = BuzzSaw::Importer->new_with_config()

This will create a new BuzzSaw::Impoter object using the attribute values stored in the configuration file. A filename maybe be specified, if not the default value will be used. The value for any attribute can be overridden.

$importer->import_events

This is the method which does all the work. It works through the streams of entries from each data source. Firstly the SHA-256 digest is calculated for each event, any event which has previously been seen will then be ignored. For new events they are parsed into their constituent parts using the relevant BuzzSaw::Parser. The parsed event is then passed through all the specified BuzzSaw::Filter objects. If the event is of interest it is then stored using the BuzzSaw::DB object.

DEPENDENCIES

This module is powered by Moose. You will also need MooseX::Types, MooseX::Log::Log4perl and MooseX::SimpleConfig.

This module also requires the DateTime and Readonly modules.

SEE ALSO

BuzzSaw, BuzzSaw::DB, BuzzSaw::DataSource, BuzzSaw::Parser, BuzzSaw::Filter

PLATFORMS

This is the list of platforms on which we have tested this software. We expect this software to work on any Unix-like platform which is supported by Perl.

ScientificLinux6

BUGS AND LIMITATIONS

Please report any bugs or problems (or praise!) to bugs@lcfg.org, feedback and patches are also always very welcome.

AUTHOR

Stephen Quinney <squinney@inf.ed.ac.uk>

LICENSE AND COPYRIGHT

Copyright (C) 2012-2013 University of Edinburgh. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the terms of the GPL, version 2 or later.