The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Geo::BUFR - Perl extension for handling of WMO BUFR files.

SYNOPSIS

# A simple program to print decoded content of a BUFR file

use Geo::BUFR;

Geo::BUFR->set_tablepath('path to BUFR tables');

my $bufr = Geo::BUFR->new();

# If you want flag and code table values to be resolved
$bufr->load_Ctable('your favourite C table');

$bufr->fopen('BUFR file');

while (not $bufr->eof()) {
    my ($data, $descriptors) = $bufr->next_observation();
    print $bufr->dumpsections($data, $descriptors);
}

$bufr->fclose();

DESCRIPTION

BUFR = Binary Universal Form for the Representation of meteorological data. BUFR is approved by WMO (World Meteorological Organization) as the standard universal exchange format for meteorological data, gradually replacing a lot of older alphanumeric data formats.

This module provides methods for decoding and encoding BUFR messages, and for displaying information in BUFR B and D tables and in BUFR flag and code tables.

This module also installs some programs: bufrread.pl, bufrresolve.pl, bufrencode.pl, bufr_reencode.pl and bufralter.pl. See https://wiki.met.no/bufr.pm/start for examples of use.

For the majority of potential users of Geo::BUFR I would expect these programs to provide the BUFR toolkit they are looking for, saving you from reading through the rather lengthy list of methods below.

Note that being Perl, this module cannot compete in speed with for example the (free) ECMWF Fortran library libbufr. Still, some effort has been put into making the module reasonable fast in that the core routines for encoding and decoding bitstreams are implemented in C.

METHODS

The get_ methods will return undef if the requested information is not available. The set_ methods as well as fopen and fclose will always return 1, or croak if failing.

Create a new object:

$bufr = Geo::BUFR->new();
$bufr = Geo::BUFR->new($BUFRmessages);

The second form of new is useful if you want to provide the BUFR messages to decode directly as an input buffer (string). You also have the option of providing the BUFR messages in a file, using the no argument form of new() and then calling fopen.

Copy an existing object:

$new_bufr = Geo::BUFR->clone($bufr);

Associate the object with a file for reading of BUFR messages:

$bufr->fopen($filename);

Close the associated file that was opened by fopen:

$bufr->fclose();

Check for end-of-file (or end of the input buffer provided as argument to new):

$bufr->eof();

Returns true if end-of-file (or end of input buffer) is reached, false if not.

Load B and D tables:

$bufr->load_BDtables($table);

$table is optional, and should be (base)name of a file containing a BUFR table B or D, using the ECMWF libbufr naming convention, i.e. [BD]'table_version'.TXT. If no argument is provided, load_BDtables() will use BUFR section 1 information to decide which tables to load. Previously loaded tables are kept in memory, and load_BDtables will return immediately if the tables already have been loaded. Returns table version (see get_table_version).

Load C table:

$bufr->load_Ctable($table,$default_table);

Both $table and $default_table are optional. This will load the flag and code tables (if not already loaded), which in ECMWF libbufr are put in tables C'table_version'.TXT (not to be confused with WMO BUFR table C, which contain the operator descriptors). $default_table will be used if $table is not found. If no arguments are provided, load_Ctable() will use BUFR section 1 information to decide which table to load (in which case next_observation ought to have been called already). Returns table version.

Get next observation (next subset in current BUFR message or first subset in next message):

($data, $descriptors) = $bufr->next_observation();

where $descriptors is a reference to the array of fully expanded descriptors for this subset, $data is a reference to the corresponding values. This call will also decode section 0-3, whose content is then available through the access methods listed below. This is the main BUFR decoding routine in Geo::BUFR, and will call load_BDtables() internally, but not load_Ctable. The list of descriptors returned from next_observation might contain also some data description operators (like 222000) which have no corresponding data value in section 4 in the BUFR message, with the corresponding value in $data set to the empty string. These operator descriptors will be printed by the dumpsection4 methods, but on unnumbered lines.

Print the content of a subset in BUFR message:

print $bufr->dumpsections($data,$descriptors);

If this is first subset in message, will also print message number and, if this is first message in a WMO bulletin, WMO ahl (abbreviated header line), as well as content of sections 0, 1 and 3. For section 4, will also print subset number. Normally dumpsections is called after next_observation, with same arguments as returned from this call. To get an impression of what the output will look like, please examine the examples provided at https://wiki.met.no/bufr.pm/start for readbufr.pl. If dumpsections does not give you exactly what you want, you might prefer to instead call the individual dumpsection methods below.

Print the contents of sections 0-3 in BUFR message:

print $bufr->dumpsection0();
print $bufr->dumpsection1();
print $bufr->dumpsection2($sec2_code_ref);
print $bufr->dumpsection3();

dumpsection2 returns an empty string if there is no optional section in the message. The argument should be a reference to a subroutine which takes the optional section as argument and returns the text you want displayed after the 'Length of section:' line. For general BUFR messages probably the best you can do is displaying a hex dump, in which case

sub {return '    Hex dump:' . ' 'x26 . unpack('H*',substr(shift,4))}

might be a suitable choice for $sec2_code_ref. For most applications there should be no real need to call dumpsection2.

Print the data of a subset (descriptor, value, name and unit):

print $bufr->dumpsection4($data,$descriptors,$width);
print $bufr->dumpsection4_with_bitmaps($data,$descriptors,$width);

$width fixes the number of characters used for displaying the data values, and is optional (defaults to 15). $data and $descriptors are references to arrays of data values and BUFR descriptors respectively, likely to have been fetched from next_observation. When printing, the artificial descriptor 999999 is used for associated fields following the 204Y operator, while element descriptors defining new reference values (following the 203Y operator) will have f=9 instead of 0 in order to distinguish them from normal element descriptors (also done by setting name to 'NEW REFERENCE VALUE'). Code and flag values will be resolved if a C table has been loaded, i.e. if load_Ctable has been called earlier. dumpsection4_with_bitmaps will display the bitmapped values side by side with the corresponding data values. If there is no bitmap in the BUFR message, dumpsection4_with_bitmaps will provide same output as dumpsection4.

Set verbose level:

Geo::BUFR->set_verbose($level); # 0 <= $level < 3
$bufr->set_verbose($level);

Some info about what is going on in Geo::BUFR will be printed to STDOUT if $level > 0. With $level set to 1, all that is printed is the B, C and D tables used (with full path).

No decoding of quality information:

 Geo::BUFR->set_noqc($n);
- $n=1 (or not provided): Don't decode quality information (more
  specifically: skip all descriptors after 222000)
- $n=0: Decode quality information (default in Geo::BUFR)

Enable/disable strict checking of BUFR format for recoverable errors (like using BUFR compression for one subset message etc):

 Geo::BUFR->set_strict_checking($n);
- $n=0: disable checking (default in Geo::BUFR)
- $n=1: warn (carp) if error but continue decoding
- $n=2: die (croak) if error

Show all BUFR table C operators (data description operators) when calling dumpsection4:

 Geo::BUFR->set_show_all_operators($n);
- $n=1 (or not provided): Show all operators
- $n=0: Show only the really informative ones (default in Geo::BUFR)

set_show_all_operators(1) cannot be combined with dumpsections because dumpsections calls dumpsection4_with_bitmaps, not dumpsection4.

Set or get tablepath:

Geo::BUFR->set_tablepath($tablepath);
$tablepath = Geo::BUFR->get_tablepath();

Get table version:

$table_version = $bufr->get_table_version($table);

$table is optional. If for example $table = 'B0000000000088013001.TXT', will return '0000000000088013001'. In the more interesting case where $table is not provided, will return table version from BUFR section 1 information.

Get number of subsets:

$nsubsets = $bufr->get_number_of_subsets();

Get current subset number:

$subset_no = $bufr->get_current_subset_number();

Get current message number:

$message_no = $bufr->get_current_message_number();

Get last WMO abbreviated header line (ahl) before current message (undef if not present):

$message_ahl = $bufr->get_current_ahl();

Accessor methods for section 0-3:

$bufr->set_<variable>($variable);
$variable = $bufr->get_<variable>();

where <variable> is one of

bufr_edition
master_table
subcentre
centre
update_sequence_number
optional_section (0 or 1)
data_category
int_data_subcategory
loc_data_subcategory
data_subcategory
master_table_version
local_table_version
year_of_century
year
month
day
hour
minute
second
local_use
number_of_subsets
observed_data (0 or 1)
compressed_data (0 or 1)
descriptors_unexpanded

set_year_of_century(0) will set year of century to 100. get_year_of_century will for BUFR edition 4 calculate year of century from year in section 1.

Encode a new BUFR message:

$new_message = $bufr->encode_message($data_refs,$desc_refs);

where $desc_refs->[$i] is a reference to the array of fully expanded descriptors for subset $i ($i=1 for first subset), $data_refs->[$i] is a reference to the corresponding values, using undef for missing values. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method. See "DECODING/ENCODING" for meaning of 'fully expanded descriptors'.

Encode a NIL message:

$new_message = $bufr->encode_nil_message($station_id_ref);

In section 4 all values will be set to missing except delayed replication factors (which are all set to 1) and the (descriptor, value) pairs in the hashref $station_id_ref.

Reencode BUFR message(s):

$new_messages = $bufr->reencode_message($decoded_messages,$width);

$width is optional. Takes a text $decoded_messages as argument and returns a (binary) string of BUFR messages which, when printed to file and then processed by bufrread.pl with no output modifying options set (except possibly --width), would give output equal to $decoded_messages. If bufrread.pl is to be called with --width $width, this $width must be provided to reencode_message also.

Resolve BUFR table descriptors:

print $bufr->resolve_descriptor($how,@descriptors);

where $how is one of 'fully', 'partially', 'simply' and 'noexpand'. Returns information about the BUFR table descriptors given. See https://wiki.met.no/bufr.pm/start#bufrresolvepl for examples of how different values of $how affects the output. The relevant B/D table must have been loaded before calling resolve_descriptor.

Resolve flag table value:

print $bufr->resolve_flagvalue($value,$flag_table,$B_table,
                               $default_B_table,$num_leading_spaces);

Last 2 arguments are optional. $default_B_table will be used if $B_table is not found, $num_leading_spaces defaults to 0. Example:

print $bufr->resolve_flagvalue(4,8006,'B0000000000098013001.TXT')

Print the content of BUFR code (or flag) table:

print $bufr->dump_codetable($code_table,$table,$default_table);

where $table is name of the C...TXT file containing the code tables, optionally followed by a default table which will be used if $table is not found.

Manipulate binary data (these are implemented in C for speed and primarily intended as module internal subroutines):

$value = Geo::BUFR->bitstream2dec($bitstream,$bitpos,$num_bits,$offset,$scale);

Extract $num_bits bits from $bitstream, starting at bit $bitpos. The extracted bits is interpreted as a non negative integer, then $offset is added and the result multiplied with $scale. Returns undef if all bits extracted are 1 bits.

$ascii = Geo::BUFR->bitstream2ascii($bitstream,$bitpos,$num_bytes);

Extract $num_bytes bytes from bitstream, starting at $bitpos, and interpret the extracted bytes as an ascii string. Returns undef if the extracted bytes are all 1 bits.

Geo::BUFR->dec2bitstream($value,$bitstream,$bitpos,$bitlen);

Encode non negative integer value $value in $bitlen bits in $bitstream, starting at bit $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $value. The part of $bitstream before $bitpos and after last encoded byte are not altered.

Geo::BUFR->ascii2bitstream($ascii,$bitstream,$bitpos,$width);

Encode ASCII string $ascii in $width bytes in $bitstream, starting at $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $ascii. The part of $bitstream before $bitpos and after last encoded byte are not altered.

Geo::BUFR->null2bitstream($bitstream,$bitpos,$num_bits);

Set $num_bits bits in bitstream starting at bit $bitpos to 0 bits. Last byte affected will be padded with 1 bits. $bitstream must be at least $bitpos + $num_bits bits long. The part of $bitstream before $bitpos and after last encoded byte are not altered.

DECODING/ENCODING

The term 'fully expanded descriptors' used in the description of encode_message (and next_observation) in "METHODS" might need some clarification. The short version is that the list of descriptors should be exactly those which will be written out by running dumpsection4 (or bufrread.pl without any modifying options set) on the encoded message. If you don't have a similar BUFR message at hand to use as an example when wanting to encode a new message, you might need a more specific prescription. Which is that for every data value which occurs in the section 4 bitstream, you should include the corresponding BUFR descriptor, using the artificial 999999 for associated fields following the 204Y operator, and including the data operator descriptors 22[2345]000 and 23[2567]000 with data value set to the empty string, if these occurs among the descriptors in section 3 (rather: in the expansion of these, use bufrresolve.pl to check!). Element descriptors defining new reference values (following the 203Y operator) will have f=0 (first digit in descriptor) replaced with f=9 in next_observation, while in encode_message both f=0 and f=9 will be accepted for new reference values.

Some words about the procedure used for decoding and encoding data in section 4 might shed some light on this choice of design.

When decoding section 4 for a subset, first of all the BUFR descriptors provided in section 3 are expanded as far as is possible without looking at the actual bitstream, i.e. by eliminating nondelayed replication descriptors (f=1) and by using BUFR table D to expand sequence descriptors (f=3). Then, for each of the thus expanded descriptors, the data value is fetched from the bitstream according to the prescriptions in BUFR table B, applying the data operator descriptors (f=2) from BUFR table C as they are encountered, and reexpanding the remaining descriptors every time a delayed replication factor is fetched from bitstream. The resulting set of data values is returned in an array @data, with the corresponding B (and sometimes also some C) BUFR table descriptors in an array @descriptors. next_observation returns references to these two arrays. For convenience, some of the data operator descriptors without a corresponding data value (like 222000) are included in the @descriptors because they are considered to provide valuable information to the user, with corresponding value in @data set to the empty string. These descriptors without a value are written by the dumpsection4 methods on unnumbered lines, thereby distinguishing them from descriptors corresponding to 'real' data values in section 4, which are numbered consecutively.

Encoding a subset is done in a very similar way, by expanding the descriptors in section 3 as described above, but instead fetching the data values from the @data array that the user supplies (actually @{$data_refs->{$i}} where $i is subset number), and then finally encoding this value to bitstream.

The input parameter $desc_ref to encode_message is in fact not strictly necessary to be able to encode a new BUFR message. But there is a good reason for requiring it. During encoding the descriptors from expanding section 3 will consecutively be compared with the descriptors in the user supplied $desc_ref, and if these at some point differs, encoding will be aborted with an error message stating the first descriptor which deviated from the expected one. By requiring $desc_ref as input, the risk for encoding an erronous section 4 is thus greatly reduced, and also provides the user with highly valuable debugging information if encoding fails.

BUFR TABLE FILES

The BUFR table files should follow the format and naming conventions used by ECMWF libbufr software. Other table file formats exist and might on request be supported in future versions of Geo::BUFR.

STRICT CHECKING

The package global $Strict_checking defaults to

0: Ignore recoverable errors in BUFR format met during decoding or encoding

but can be changed to

1: Issue warning (carp) but continue decoding/encoding

2: Croak instead of carp

by calling set_strict_checking. The following is checked for when $Strict_checking is set to 1 or 2:

  • Compression set in section 1 for one subset message (BUFR reg. 94.6.3.2)

  • Local reference value for compressed character data not null bytes (94.6.3.2.i)

  • Excessive bytes in section 4 (section longer than computed from section 3)

  • Illegal flag values (rightmost bit set for non missing values)

  • Cancellation operators (20[1-4]00, 203255 etc) when there is nothing to cancel

  • Invalid date and/or time in section 1

Plus some few more checks not considered interesting enough to be mentioned here.

BUGS OR MISSING FEATURES

Some BUFR table C operators are not implemented or are untested, mainly because I do not have access to BUFR messages containing such operators. If you happen to come over a BUFR message which the current module fails to decode properly, I would therefore highly appreciate if you could mail me this.

Don't expect decoding and encoding to be completely reversible: slight rounding differences might occur. The decoded value 56.5204 for element descriptor 022161 in a BUFR message was for instance found to be changed into 56.5207 when encoded and decoded again.

AUTHOR

Pål Sannes <pal.sannes@met.no>

CREDITS

I am very grateful to Alvin Brattli, who (while employed as a researcher at met.no) wrote the first version of this module, with the sole purpose of being able to decode some very specific BUFR satellite data, but still provided the main framework upon which this module is built.

SEE ALSO

Guide to WMO Table Driven Code Forms: FM 94 BUFR and FM 95 CREX; Layer 3: Detailed Description of the Code Forms (for programmers of encoder/decoder software)

https://wiki.met.no/bufr.pm/start

COPYRIGHT

Copyright (C) 2010 met.no

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 4618:

Non-ASCII character seen before =encoding in 'Pål'. Assuming CP1252