NAME
Finnigan - Thermo/Finnigan mass spec data decoder
SYNOPSIS
use Finnigan;
seek INPUT, $object_address, 0
my $o = Finnigan::Object->decode(\*STREAM, $arg);
$o->dump;
where 'Object' is a symbol for any of the specific decoder objects (Finnigan::*
) and STREAM
is an open filehandle positioned at the start of the structure to be decoded. Some decoders may require an additional argument (file format version).
DESCRIPTION
Finnigan
is a non-functional package whose only purpose is to pull in all other packages in the module into its namespace. It does no work; all work is done in the sub-modules. Each submodule has its own documentation; please see the "SUBMODULES" section below or visit the project's home page for a more detailed descripion of the file format, data structures, decoders and tools.
Each decoder submodule has a simple command-line interface. See the "TOOLS" section for a list of command-line tools that can be used to examine the Finnigan file structures and dump their contents with absolute or relative addresses. One of the tools, uf-mzxml, can be used to convert the entire data stream in a Finnigan file to the mzXML format.
METHODS
- list_modules
-
The only method defined in the top-level
Finnigan
package islist_modules
, which can be used to ascertain that all packages have been successfully loaded:perl -MFinnigan -e 'Finnigan::list_modules'
SUBMODULES
To simplify the decoder and allow it to accommodate a variety of file versions, it has been subdivided into a set of submodules, each representing a structural unit of the Finnigan file format. The partitioning of the format into units is somewhat arbitrary; it was done based on the comparative analysis of the structure of several different formats. The structures common to all formats are viewed as "basic" and merit a dedicated decoder; the same goes for the highly repetitive structures, such as Finnigan::ScanIndexEntry. Some structures remain roughly similar, but keep acquiring new elements with every new file version; the decoders for these structures are parameterised with the version number (for example, Finnigan::ScanEventPreamble).
The notion of a preamble (the term I made up, not knwoing better) represents what seems to be a persistent idiom in Thermo structure coding: collect the binary data in a fixed-size block followed by variable-length objects (mostly text strings). The earlier Finnigan formats contained little or no text and virtually no variable-length data, so what I call a preamble today used to be the whole deal in the past, and it makes sense to have a separate decoder for each such rudimentary container. Keeping these decoders separate makes it possible to go back and decode the historical data simply by recombining the existing decoders.
Common submodule methods
- decode($stream, $arg)
-
Each
Finnigan::*
object has a constructor method nameddecode()
, whose first argument is a filehandle positioned at the start of the object to be decoded. Some decoders require additional arguments, such as the file version number. A single argument is passed as it is, while multiple arguments can be passed as an array reference.The constructor advances the handle to the start of the next object, so seeking to the start of the object of interest is only necessary when doing partial reads; in principle, the entire file can be read by calling of object constructors in sequency. In reality, it is often more efficient to seek ahead to fetch an index structure stored near the end of the file, then go back to the data stream using the pointers in the index.
The decoded data can be obtained by calling accessor methods on the object or by de-referencing the object reference (since all Finnigan objects are blessed hash references):
$x = $object->element
or
$x = $object->{element}
The accessor option is nicer, as it leads to less clutter in the code and leaves the possibility for additional processing of the data by the accessor routine, but it incurs a substantial performance penalty. For this reason, hash dereference is preferred in performance-critical code (inside loops).
This is an "instance" method; it must be defined in each non-trivial decoder object.
- dump(%args)
-
All Finnigan objects are the descendants of Finnigan::Decoder. One of the methods they inherit is
dump
, which provides an easy way to explore the contents of decoded objects. Thedump
method prints out the structure of the object it is called on in a few styles, with relative or absolute addressess.For example, many object dumps used in this wiki were created thus:
$object->dump(style => 'wiki', relative => 1);
The
style
argument can have the values ofwiki
,html
or no value at all (meaning plain text). Therelative
argument is a boolean indicating whether to use the absolute or relative file addresses in the output. In this case, "relative" means "an offset within the object", while "absolute" is the seek address within the data file. - read($stream, $template_list, $arg)
-
This is the
Finnigan::Decoder
constructor method. Some derived decoders use it internally, but it can also be used to decode trivial objects at a given location in a file without having to write a dedicated decoder.For example, to read a 32-bit stream length, use:
my $object = Finnigan::Decoder->read(\*INPUT, ['length' => ['V', 'UInt32']]);
The
$template_list
argument names all fields to decode (in this case, just one:length
), the template to use for each field (in this example,V
), and provides a human-readable symbol for the template, which can be used in a number of ways; for example, when inspecting the structures with thedump
method.This may seem like a kludgy way of reading four bytes, but the upshot is that the resulting
$object
will have the size, type and location information tucked into it, so it can be analysed and dumped in a way consistent with other decoded objects. The advantage becomes even more apparent when the structure is more complex than a single scalar object.The inherited
read
method provides the core functionality in all Finnigan decoders.If only the value of the object is sought, then this even more kludgy code can be used:
my $stream_length = Finnigan::Decoder->read(\*INPUT, ['length' => ['V', 'UInt32']])->{data}->{length}->{value};
Doing it this way is nonetheless easier than writing several lines of code to read the data into a buffer, check for the I/O errors and unpack the value.
- stringify
-
A convenience method defined in some of the Finnigan objects. It allows a concise representation of an object to be injected anywhere Perl expects a string. For example,
$scan_event = Finnigan::ScanEvent->decode( \*INPUT, $header->version); say "$scan_event";
Submodule index
- Finnigan::AuditTag (sample audit tag)
- Finnigan::CASInfo (autosampler info)
- Finnigan::CASInfoPreamble (numerical autosampler parameters)
- Finnigan::Decoder (the base class for all Finnigan decoders)
- Finnigan::Error (error log entry)
- Finnigan::FileHeader
- Finnigan::FractionCollector (M/z range decoder)
- Finnigan::GenericDataDescriptor (a self-decoding structure element)
- Finnigan::GenericDataHeader (self-decoding structure header)
- Finnigan::GenericRecord (self-decoding structure)
- Finnigan::InjectionData (sample injection parameters)
- Finnigan::InstID (instrument identifiers)
- Finnigan::InstrumentLogRecord (instrument log entry)
- Finnigan::MethodFile (an OLE2 container for instrument method files)
- Finnigan::OLE2DIF (Double-Indirect FAT decoder)
- Finnigan::OLE2DirectoryEntry
- Finnigan::OLE2FAT (FAT sector decoder)
- Finnigan::OLE2File (Microsoft OLE2/CDF file decoder)
- Finnigan::OLE2Header (OLE2 header decoder)
- Finnigan::OLE2Property (OLE2 index node decoder)
- Finnigan::PacketHeader (scan data header)
- Finnigan::Peak (an element of the peak centroid list)
- Finnigan::Peaks (the peak centroid list)
- Finnigan::Profile (scan profile)
- Finnigan::ProfileChunk (a single chunk of a filetered profile)
- Finnigan::RawFileInfo (primary index structure)
- Finnigan::RawFileInfoPreamble (the binary data part of
RawFileInfo
) - Finnigan::Reaction (precursor ion data)
- Finnigan::RunHeader (secondary index structure)
- Finnigan::SampleInfo (secondary index structure)
- Finnigan::Scan (a lightweight
ScanDataPacket
decoder) - Finnigan::ScanEvent (scan type descriptor)
- Finnigan::ScanEventPreamble (the byte array component of
ScanEvent
) - Finnigan::ScanEventTemplate (the prototype scan descriptor)
- Finnigan::ScanIndexEntry (scan data pointer)
- Finnigan::ScanParameters (scan meta-data)
- Finnigan::SeqRow (sequencer table row)
TOOLS
The Unfinnigan tools extract data from the Finnigan files of several known versions. They are listed roughly in the order in which the structures they decode occur in the data file.
Query tools
- uf-header
-
read the
FileHeader
structure - uf-seqrow
-
read the
SeqRow
structure (Sequence Table Row) - uf-casinfo
-
read the
CASInfo
structure (autosampler info) - uf-rfi
-
read
RawFileInfo
, the primary index structure - uf-meth
-
unravel the embedded
MethodFile
container - uf-scan
-
examine the scan profile and peak data in a single MS scan (
ScanDataPacket
) - uf-runheader
-
read
RunHeader
), the secondary index structure - uf-instrument
-
read the instrument IDs (the
InstID
structure) - uf-log
-
list or dump the instrument log stream (
InstrumentLogRecord
structures) - uf-error
-
list the error log (a steam of
Error
structures) - uf-segments
-
dump the
ScanEventTemplate
structures in the order of segment hierarchy - uf-params
-
print or dump the
ScanParameters
stream - uf-tune
-
print or dump the
TuneFile
structure - uf-index
-
read the stream of
ScanIndexEntry
records (scan data pointers) - uf-trailer
-
read the stream of
ScanEvent
records
Conversion tools
The following are the conversion tools, transcoding the entire raw files into alternative representations.
- uf-mzxml
-
convert a raw file to mzXML
- mzxml-unpack
-
unpack the base64-encoded scan data in an mzXML file
All tools contain their own POD sections. To read the documentation for a tool, use
man <tool>
perldoc <tool>
AUTHOR
Gene Selkov, <selkovjr@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2010 by Gene Selkov
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.