NAME

Net::NfDump - Perl API for manipulating with nfdump files

SYNOPSIS

use Net::NfDump;

#
#
# Example 1: reading nfdump file(s)
# 

$flow = new Net::NfDump(
            InputFiles => [ 'nfdump_file1', 'nfdump_file2' ], 
            Filter => 'icmp and src net 10.0.0.0/8',
            Fields => 'proto, bytes' ); 

$flow->query();

while (my ($proto, $bytes) = $flow->fetchrow_array() )  {
    $h{$proto} += $bytes;
}
$flow->finish();

foreach ( keys %h ) {
    printf "%s %d\n", $_, $h{$_};
}


#
#
# Example 2: creating and writing records into nfdump file
#

$flow = new Net::NfDump(
            OutputFile => 'output.nfcap',
            Fields => 'srcip,dstip' );

$flow->storerow_arrayref( [ txt2ip('147.229.3.10'), txt2ip('1.2.3.4') ] );

$flow->finish();


#
#
# Example 3: reading/writing (merging two input files) and swap
#            source and destination address if the destination port 
#            is 80/http (I know it doesn't make much sense).
#

$flow1 = new Net::NfDump( 
             InputFiles => [ 'nfdump_file1', 'nfdump_file2' ], 
             Fields => 'srcip, dstip, dstport' ); 

$flow2 = new Net::NfDump( 
             OutputFile => 'nfdump_file_out', 
             Fields => 'srcip, dstip, dstport' ); 

$flow1->query();
$flow2->create();

while (my $ref = $flow->fetchrow_arrayref() )  {

    if ( $ref->[2] == 80 ) { 
        ($ref->[0], $ref->[1]) = ($ref->[1], $ref->[0]);
    }

   $flow2->clonerow($flow1);
   $flow2->storerow_arrayref($ref);

}

$flow1->finish();
$flow2->finish();

DESCRIPTION

Nfdump http://nfdump.sourceforge.net/ is very polpular toolset for collecting, storing and processing NetFlow/SFlow/IPFIX data. The one of the key tool is command line utility bearing the same name as whole toolset (nfdump). Although this utility can process data very speed, it is cumbersome for some apllications.

This module implements basic operations on binary files produced with nfdump tool. It allows read, create and write flow records on thoose files. The modules tries to keep naming conventions for methods same as are used in DBI nodules/API, so developers that are used to use this interface should be famillar with the interface.

The module uses original nfdump sources to implement nescessary functions. The compatibility with the original nfdump should be eaisly keept and there should be a minimal effort to cope with future version of the nfdump tool.

The architecture is following:

       APPLICATION 
+------------------------+
|                        |  Implements all methods and functions 
| Net::NfDump API (perl) |  described in this document.
|                        |
+------------------------+
|                        |  The code converting internal nfdump 
| libnf - glue code (C)  |  structures into perl and back to C.
|                        |
+------------------------+
|                        |  All original nfdump source files. There  
|   nfdump sources (C)   |  are no changes in theese files and all  
|                        |  changes are placed into libnf code.
+------------------------+  
      NFDUMP FILES

This version of Net::NfDump module is based on nfdump-1.6.9 available on http://sourceforge.net/projects/nfdump/. Support for NSEL and NEL code is enabled.

METHODS, OPTIONS AND RELATED FUNCTIONS

Options

Options can be nahdled in varios methods. The basic options ses can be handled in the constructor and than modified in methods like $obj->query() or $obj->create().

The values after => indicates the default value for the item.

  • InputFiles => []

    List of files to read (arrayref).

  • Filter => 'any'

    Filter taht will be applied on input records. Uses nfdump/tcpdump syntax.

  • Fields => '*'

    List of fields to read or update any field from supported fields can be used here. See the chapter "Supported Fields" for the full list of supported fields. Special field * can be used for defining all fields.

  • TimeWindowStart, TimeWindowEnd => 0

    Filter flows that starts or ends in the specified fied time window. The options uses unix timestamp values or 0 if the filter should not be apllied.

  • OutputFile => undef

    Output file for storerow_* methods. Default: undef

  • Compressed => 1

    Flag whether the otput files should be compressed or not.

  • Anonymized => 0

    Flag indicating that output file contains anonymized data.

  • Ident => ''

    String identificator of files stored into the header of the file.

Constructor, status informations methods

  • $obj = new Net::NfDump( %opts )

    my $obj = new Net::NfDump( InputFiles => [ 'file1']  );

    The constructor. As the parameter options can be specified.

  • $ref = $obj->info()

    my $i = $obj->info();
    print Dumper($i);

    Returns the information the current state of processing input files. It returns information about already processed files, blocks, records. Those information can be usefull for guessing time of processing whole dataset. Hashref returs following items:

    total_files           - total number of files to process
    elapsed_time          - elapsed time 
    remaining_time        - guessed remaining time to process all records
    percent               - guessed percent of processed records
    
    processed_files       - total number of processed files
    processed_records     - total number of processed records
    processed_blocks      - total number of processed blocks
    processed_bytes       - total number of processed bytes 
                            number of bytes read from file 
                            system after uncompressing 
    
    current_filename      - the name of the file currently processed
    current_total_blocks  - the number of blocks in the currently 
                            processed file 
    current_processed_blocks -  the number of processd blocks in the 
                            currently processed file
  • $obj->finish()

    $obj->finish();

    Closes all openes file handles. It is nescessary to call that method specilly when a new file is created. The method flushes to file records that remains in the memory buffer and updates file statistics in the header. Withat calling this method the output file might be corupted.

Methods for reading data

  • $obj->query( %opts )

    $obj->query( Filter => 'src host 10.10.10.1' );

    Method that have to be executed before any of the fetchrow_* method is used. Options can be handled to the method.

  • $ref = $obj->fetchrow_arrayref()

    while (my $ref = $obj->fetchrow_arrayref() ) {
        print Dumper($ref);
    }

    Have to be used after query method. The method $obj->query() s called automatically if it wasn't called before.

    Method returns array reference with the record and skips to the next record. Returns true if there are more records to read or undef if end of the record set have been reached.

  • @array = $obj->fetchrow_array()

    while ( @array = $obj->fetchrow_arrayref() ) { 
      print Dumper(\@array);
    }

    Same functionality as fetchrow_arrayref however returns items in array instead.

  • $ref = $obj->fetchrow_hashref()

    while ( $ref = $obj->fetchrow_hashref() ) {
       print Dumper($ref);
    }

    Same as fetchrow_arrayref, however the items are returned in the hash reference as the key => vallue tuples.

    NOTE: This method can be very uneffective in some cases, please see PERFORMANCE section.

Methods for writing data

  • $obj->create( %opts )

    $obj->create( OutputFile => 'output.nfcapd' );

    Creates a new nfdump file. This method have to be called before any of $obj->storerow_* method is called.

  • $obj->storerow_arrayref( [ @array ] )

    $obj->storerow_arrayref( [ $srcip, $dstip ] );

    Insert data defined in arrayref to the file opened by create. The number of fields and their order have to respect order defined in the Fileds option handled during $obj->new() or $obj->create() method.

  • $obj->storerow_array( @array )

    $obj->storerow_array( $srcip, $dstip );

    Same as storerow_arrayref, however items are handled as the single array

  • $obj->storerow_hashref ( \%hash )

    $obj->storerow_hashref( { 'srcip' =>  $srcip, 'dstip' => $dstip } );

    Inserts structure defined as hash reference into output file.

    NOTE: This method can be very uneffective in some cases, please see PERFORMANCE section.

  • $obj->clonerow( $obj2 )

    $obj->clonerow( $obj2 );

    Copy the full content of the row from the source object (instance). This method is usefull for writing effective scripts (it's much faster that any of the prevous row).

Extra conversion and support functions

The module also provides extra convertion functions that allow convert binnary format of IP address, MAC address and MPLS labels tag into text format and back.

Those functions are not exported by default, so it have to be either called with full module name or imported when the module is loades. For importing all support function :all synonym can be used.

use Net::NfDump qw ':all';
  • $txt = ip2txt( $bin )

  • $bin = txt2ip( $txt )

    $ip = txt2ip('10.10.10.1');
    print ip2txt($ip);

    Converts both IPv4 and IPv6 address into text form and back. The standart inet_ntop/inet_pton functions can be used instead to provide same results.

    Function txt2ip returns binnary format of IP addres or undef if the conversion is impossible.

  • $txt = mac2txt( $bin )

  • $bin = txt2mac( $txt )

    $mac = txt2mac('aa:02:c2:2d:e0:12');
    print mac2txt($mac);

    Converts MAC addres to xx:yy:xx:yy:xx:yy format and back. The fuction to mac2txt accepts an address in any of following format:

    aabbccddeeff
    aa:bb:cc:dd:ee:ff
    aa-bb-cc-dd-ee-ff
    aabb-ccdd-eeff

    Returns the binnary format of the address or undef if confersion is impossible.

  • $txt = mpls2txt( $mpls )

  • $mpls = txt2mpls( $txt )

    $mpls = txt2mpls('1002-6-0 1003-6-0 1004-0-1');
    print mpls2txt($mpls);

    Converts label information to format Lbl-Exp-S and back.

    Where:

    Lbl - Value given to the MPLS label by the router. 
    Exp - Value of experimental bit. 
    S   - Value of the end-of-stack bit: Set to 1 for the oldest 
          entry in the stack and to zero for all other entries. 
  • $ref = flow2txt( \%row )

  • $ref = txt2flow( \%row )

    The function flow2txt gets hash reference to items returned by fetchrow_hashref and converts all items into humman readable text format. Applies finctions ip2txt, mac2txt, mpl2txt to the items where it make sense. The function txt2flow does opossite functionality.

  • $ref = file_info( $file_name )

    $ref = file_info('file.nfcap');
    print Dumper($ref);

    Reads information from nfdump file header. It provides various atributes like number of blocks, version, flags, statistics, etc. As the result the follwing items are returned:

    version
    ident
    blocks
    catalog
    anonymized
    compressed
    sequence_failures
    
    first
    last
    
    flows, bytes, packets
    
    flows_tcp, flows_udp, flows_icmp, flows_other
    bytes_tcp, bytes_udp, bytes_icmp, bytes_other
    packets_tcp, packets_udp, packets_icmp, packets_other

SUPPORTED ITEMS

 Time items
 =====================
 first - Timestamp of first seen packet in miliseconds
 last - Timestamp of last seen packet in miliseconds
 received - Timestamp when the packet was received by collector 

 Statistical items
 =====================
 bytes - The number of bytes 
 pkts - The number of packets 
 outbytes - The number of output bytes 
 outpkts - The number of output packets 
 flows - The number of flows (aggregated) 

 Layer 4 information
 =====================
 srcport - Source port 
 dstport - Destination port 
 tcpflags - TCP flags  

 Layer 3 information
 =====================
 srcip - Source IP address 
 dstip - Destination IP address 
 nexthop - IP next hop 
 srcmask - Source mask 
 dstmask - Destination mask 
 tos - Source type of service 
 dsttos - Destination type of Service 
 srcas - Source AS number 
 dstas - Destination AS number 
 nextas - BGP Next AS 
 prevas - BGP Previous AS 
 bgpnexthop - BGP next hop 
 proto - IP protocol  

 Layer 2 information
 =====================
 srcvlan - Source vlan label 
 dstvlan - Destination vlan label 
 insrcmac - In source MAC address 
 outsrcmac - Out destination MAC address 
 indstmac - In destintation MAC address 
 outdstmac - Out source MAC address 

 MPLS information
 =====================
 mpls - MPLS labels 

 Layer 1 information
 =====================
 inif - SNMP input interface number 
 outif - SNMP output interface number 
 dir - Flow directions ingress/egress 
 fwd - Forwarding status 

 Exporter information
 =====================
 router - Exporting router IP 
 systype - Type of exporter 
 sysid - Internal SysID of exporter 

 NSEL fields, see: http://www.cisco.com/en/US/docs/security/asa/asa81/netflow/netflow.html
 =====================
 flowstart - NSEL The time that the flow was create
 connid - NSEL An identifier of a unique flow for the device 
 icmpcode - NSEL ICMP code value 
 icmptype - NSEL ICMP type value 
 event - NSEL High-level event code
 xevent - NSEL Extended event code
 xsrcip - NSEL Mapped source IPv4 address 
 xdstip - NSEL Mapped destination IPv4 address 
 xsrcport - NSEL Mapped source port 
 xdstport - NSEL Mapped destination port 
NSEL The input ACL that permitted or denied the flow
 iacl - Hash value or ID of the ACL name
 iace - Hash value or ID of the ACL name 
 ixace - Hash value or ID of an extended ACE configuration 
NSEL The output ACL that permitted or denied a flow  
 eacl - Hash value or ID of the ACL name
 eace - Hash value or ID of the ACL name
 exace - Hash value or ID of an extended ACE configuration
 username - NSEL username

 NEL (NetFlow Event Logging) fields
 =====================
 nevent - NEL NAT Event
 nsrcport - NEL NAT src port 
 ndstport - NEL NAT dst port
 vrf - NEL NAT ingress vrf id 
 nsrcip - NEL NAT inside address
 ndstip - NEL NAT outside address

 Extra/special fields
 =====================
 cl - nprobe latency client_nw_delay_usec 
 sl - nprobe latency server_nw_delay_usec
 al - nprobe latency appl_latency_usec

PERFORMANCE

It is obvious tahat prformance of the perl interface is lower comparing to highly optimized nfdump utility. As nfdump is able to process up to 2 milions of records per second, the Net::NfDump is not bale to process more than 1 milion of records per second. However there are several rules to keep the code optimised:

  • Use $obj->fetchrow_arrayref() and $obj->storerow_arrayref() instead of *_array and *_hashref equivalents. Arrayref handles only the reference to the structure with data. Avoid of using *_hashref functions, it can by 5 times slower.

  • Handle to the perl API only items that are nescessary for using in the code. It is always more effective to define in Fields => 'srcip,dstip,...' intead of Fileds => '*'.

  • Prefer of using $obj->clonerow($obj2) method. This method copies data between two instances directly in the C code in the libnf layer.

    Following code:

    $obj1->exec( Fields => '*' );
    $obj2->create( Fields => '*' );
    
    while ( my $ref = $obj1->fetchrow_arrayref() ) {
        # do something with srcip 
        $obj2->storerow_arrayref($ref);
    }

    can be written in more effective way (several times faster):

    $obj1->exec( Fields => 'srcip' );
    $obj2->create( Fields => 'srcip' );
    
    while ( my $ref = $obj1->fetchrow_arrayref() ) {
        # do something with srcip 
        $obj2->clonerow($obj1);
        $obj2->storerow_arrayref($ref);
    }

NOTE ABOUT 32BIT PLATFORMS

Nfdump primary uses 64 bit counters and other items to store single integer value. However the native 64 bit support is not compiled in every perl. For thoose cases where only 32 integer values are supported the Net::NfDump uses Math::Int64 module.

The build scripts automatically detect the platform and Math::Int64 module is required only on platforms where available perl do not supports 64bit integer values.

EXAMPLES OF USE

There are several examples in the examples directory.

  • example1.pl - The trivial example showing how the Net::NfDump can be used for reading files. The exaple also uses the progress bar to show the status of processed files.

  • download_asn_db, nf_asn_geo_update - The set of sripts for updating information about AS numbers and country codes based on BGP and geaolocation database. Every flow can be extended with src/dst AS number and src/dst country code.

    The firts script (download_asn_db) downloads the BGP database that is available on RIPE server. The database then is preprocessed and prepared for second script (with support of Net::IP::LPM module).

    The sceond script (download_asn_db) updates the AS (or country code) information in the nfdump file. It can be run as the extra command (-x option of nfcapd) to update information as the new file is available.

    The information about src/dst country works in simmilar way. It uses maxmind database and Geo::IP module. However nfdump do not support any field for storing that kinf of information the xsrcport andf xdstport fiealds are used indtead. The contry code is converted into 16 bit informatiuon (firt 8 bytes for first characted of country code and second 8 bytes for second one).

SEE ALSO

http://nfdump.sourceforge.net/

AUTHOR

Tomas Podermanski, <tpoder@cis.vutbr.cz>, Brno University of Technology

COPYRIGHT AND LICENSE

Copyright (C) 2012 by Brno University of Technology

This library is free software; you can redistribute it and modify it under the same terms as Perl itself.

If you are satisfied with using Net::NfDump please send us a postcard, preferably with a picture from your location / city to:

Brno University of Technology 
CVIS
Tomas Podermanski 
Antoninska 1
601 90 
Czech Republic