The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

IO::ReadHandle::Include - A filehandle for reading with include facility

VERSION

Version 1.2.1

SYNOPSIS

use IO::ReadHandle::Include;

open $ofh1, '>', 'extra.txt';
print $ofh1 "Extra, extra!  Read all about it!\n";
close $ofh1;

open $ofh2, '>', 'file.txt';
print $ofh2 <<EOD;
The paperboy said:
#include extra.txt
and then he ran off.
EOD
close $ofh2;

$ifh = IO::ReadHandle::Include
  ->new({ source => 'file.txt',
          include => qr/^#include (.*)$/) });
print while <$ifh>;
close $ifh;

# prints:
#
# The paperboy said:
# Extra, extra!  Read all about it!
# and then he ran off.

DESCRIPTION

This module produces filehandles for reading from a source text file and any number of included files, identified from include directives found in the read text.

Filehandle functions/methods associated with writing cannot be used with an IO::ReadHandle::Include object.

INCLUDE DIRECTIVES AND THE READLINE FUNCTION

The include directives are identified through a regular expression ("new").

$ifh = IO::ReadHandle::Include->new({ include => $regex, ... });

If the text read from the source file matches the regular expression, then, in the output, the part of the text matching the regular expression is replaced with the contents of the identified include file, if that include file exists. This works recursively: The included file can itself include other files, using the same format for include directives. If an include file does not exist, then the include directive naming that file is not replaced.

An include file cannot recursively include itself, because that leads to an infinite loop. If such an include directive detected, then it is not replaced. It is not a problem if a particular file is included multiple times, as long as each next include of that file begins after the previous include has completed.

The include file is identified by the text corresponding to a particular capture group ((?<include>...) or $1) of the regular expression. For example, given the two lines of text

#include foo.txt
#include "bar.txt"

the regular expression

qr/^#include (?|"(.*?)"|(.*))$/

identifies foo.txt and bar.txt as the include files through $1, and the regular expression

qr/^#include ("?)(?<include>.*?)\g{1}$/

does the same through $+{include}.

The text is transformed if a transformation code reference is defined ("set_transform"). The final text is interpreted as the path to the file to include at this point.

Text is read from the source file and the included files piece by piece. If you're unlucky, then the piece most recently read ends in the middle of an include directive, and then the current module cannot detect that include directive because it isn't complete yet.

To resolve this problem, the current module assumes that if the regular expression matches the input record separator, then it must be at the very end of the regular expression. If any piece of text ending with the input record separator does not match the regular expression, then the current module concludes that that piece of text does not contain an include directive.

This means that an include directive should not contain an input record separator $/ (by default a newline), except perhaps at the very end. Otherwise the include directive may not always be recognized.

This works well for the CORE::readline function, for the "getline" and "getlines" methods, and for the angle brackets operator (<$ih>), which read text up to and including the input record separator (or the end of the data, whichever comes first).

INCLUDE DIRECTIVES AND THE READ FUNCTION

Function CORE::read and method "read" read up to a user-selected number of characters from the source. The read chunk of text does not necessarily end with the input record separator, so it might end in the middle of an include directive, and then the include directive cannot be recognized.

To resolve this problem, the "read" function/method when called on an IO::ReadHandle::Include object by default quietly read beyond the requested number of characters until the next input record separator or the end of the data is seen, so it can properly detect and resolve any include directives. It then returns only up to the requested number of characters, and remembers the remainder for the next call.

This means that if the source file or an include file contains no input record separator at all and is read using the "read" function/method, then the entire contents of the source and/or include file are read into memory at once.

When using the "read" function/method to read the text, you don't know beforehand how many lines of text you get. This can be a problem if the transformation of include path names from later lines of text may depend on something seen in earlier lines of text. Any change that gets made to the transformation (via "set_transform") can apply only to include directives that haven't been resolved yet -- so they cannot apply to any include directives that were resolved while processing the "read" call that produced the text that indicates the need to change the transformation.

In such a case, use the "set_read_by_line" method to indicate that you want "read" to return text that does not extend beyond the first input record separator -- i.e., at most one line of text. You may then get fewer characters from a call to "read" than you asked for, even if there is still more text in the source.

LINE NUMBER

The value of the line number special variable $. is supposed to be equal to the number of lines read through the last used filehandle, but for an IO::ReadHandle::Include, that value is not trustworthy. It takes a lot more bookkeeping to make it trustworthy.

PRIVATE FIELDS

IO::ReadHandle::Include objects support the use of private fields stored within the object. "set_field" sets such a field, "get_field" queries it, and "remove_field" removes it again.

These fields can be used, for example, to pass information from the application using the object to the include path transformation code ("set_transform") to guide the transformation.

The fields are private in the sense that an IO::ReadHandle::Include object does not itself access them, so they're all yours.

SUBROUTINES/METHODS

new

$ifh = IO::ReadHandle::Include->new({ source => $source,
                                      include => $regex,
                                      transform => $coderef });

Creates an object that can be used as a filehandle for reading, with include files.

The $source is the path to the main file to read from, if it is a scalar. If it is a filehandle, then the main contents are read from that filehandle.

The $regex is a regular expression that identifies an include directive. If the regular expression defines a capture group called include ((?<include>...)), then its value identifies the file to include. Otherwise, the first capture group identifies the file to include. If the include file path is relative, then it is interpreted relative to the path of the file from which the include directive was read.

The $coderef, if specified, must be a reference to code, i.e. \&foo for a reference to function foo, or sub { ... } for a reference to an anonymous block of code. That code is used to transform the path name of the include file. The reference gets called as

$path = $coderef->($path, $ifh);

where $path is the path name extracted from the include directive, and $ifh is the IO::ReadHandle::Include object. You can use the latter, for example, to access the private area of the IO::ReadHandle::Include to assist the transformation ("get_field"). The result of executing the code reference is used as the path of the include file to open.

close

$ifh->close;
close $ifh;

Closes the IO::ReadHandle::Include. Closes any internal filehandles that the instance was using, but if the main source was passed as a filehandle then that filehandle is not closed.

current_source

$current_source = $ifh->current_source;

Returns text describing the main source or include file that the next input through IO::ReadHandle::Include will come from, or (at the end of the stream) that the last input came from.

For a main source specified as a path name, or for an included file, returns the path name.

For a main source specified as a filehandle, returns the result of calling the current_source method on that filehandle, unless it returns the undefined value or the filehandle doesn't support the current_source method, in which case the current method returns the stringified version of the filehandle.

NOTE: The result of this method is not always accurate. Currently, it in fact describes the source that data will be read from next, but that is not always the source of the data that is returned next, because in some circumstances data gets buffered and returned only later, when the source from where it came may already have run dry.

The results of this method are only accurate if (1) all of the data is read by lines, and (2) the include directive always comes at the very end of a line.

Making this method always accurate requires a lot more internal bookkeeping.

eof

$end_of_data = eof $ifh;
$end_of_data = $ifh->eof;

Returns 1 when there is no (more) data to read through the IO::ReadHandle::Include, and '' otherwise, similar to CORE::eof and "eof" in IO::Handle.

get_field

$value = $ifh->get_field($field);
$value = $ifh->get_field($field, $default);

Returns the value of the private field $field from the filehandle.

If that field does not yet exist, and if $default is not specified, then does not modify the object and returns the undefined value.

If the field does not yet exist but $default is specified, then creates the field, assigns it the value $default, and then returns that value.

getline

$line = $ifh->getline;
$line = <$ifh>;
$line = readline $ifh;

Reads the next line from the IO::ReadHandle::Include. The input record separator ($/) or end-of-data mark the end of the line.

getlines

@lines = $ifh->getlines;
@lines = <$ifh>;

Reads all remaining lines from the IO::ReadHandle::Include. The input record separator ($/) or end-of-data mark the end of each line.

input_line_number

$line_number = $ifh->input_line_number;
$line_number = $.;

Returns the number of lines read through the IO::ReadHandle::Include (first example) or through the last used filehandle (second example).

NOTE: The result of this method is not always accurate, because the current module may need to read ahead and buffer some data in order to properly detect and resolve include directives.

The results of this method are accurate if (1) all of the data is read by lines, and (2) the include directive always comes at the very end of a line.

open

$ih->open({ source => $source,
            include => $regex,
            transform => $coderef });

(Re)opens the IO::ReadHandle::Include object. See "new" for details about the arguments.

read

$ifh->read($buffer, $length, $offset);
read $ifh, $buffer, $length, $offset;

Read up to $length characters from the IO::ReadHandle::Include into the $buffer at offset $offset, similar to the CORE::read function. Returns the number of characters read, or 0 when there are no more characters.

If "set_read_by_line" is active, then the reading stops after the first encountered input record separator ($/), even if the requested number of characters has not been reached yet.

remove_field

$cfh->remove_field($field);

Removes the filehandle's private field with the specified name, if it exists. Returns the filehandle.

seek

seek $ifh, $pos, $whence;
$ifh->seek($pos, $whence);

Sets the IO::ReadHandle::Include filehandle's position, similar to the CORE::seek function -- but at present the support is very limited.

$whence indicates relative to what the target position $pos is specified. This can be 0 for the beginning of the data, 1 for the current position, or 2 for the end of the data.

$pos says how many bytes beyond the position indicated by $whence to set the filehandle to. At present, $pos must be equal to 0, otherwise the method croaks. So, the position can only be set to the very beginning, the very end, or the current position. Supporting more requires a lot more bookkeeping.

Returns 1 on success, false otherwise.

set_field

$ifh->set_field($field, $value);

Sets the filehandle's private field with key $field to the specified $value. Returns the filehandle.

set_read_by_line

$ifh->set_read_by_line($value);
$ifh->set_read_by_line;

Configures whether "read" can return more than a single line's worth of data per call.

By default, a single "read" call reads and returns data until the requested number of characters has been read or until it runs out of data, whichever comes first. If set_read_by_line is called without an argument or with an argument that is a true value (e.g., 1), then subsequent calls of "read" return at most the next line, as defined by the input record separator $/ -- or less, if the requested number of characters has been reached. If set_read_by_line is called with an argument that is a false value (e.g., 0), then "read" reverts to its default behavior.

set_transform

$ifh->set_transform($coderef);

Sets the transformation code reference, with the same purpose as the transform parameter of "new". Returns the object.

AUTHOR

Louis Strous, <lstrous at cpan.org>

BUGS

KNOWN BUGS

Resolving these bugs requires much more bookkeeping.

  • The result of "input_line_number" (and $.) may not be accurate.

  • The result of "current_source" may not be accurate.

  • "seek" can only be used to go to the very beginning, the current position, or the very end of the stream.

  • tell cannot be used on an IO::ReadHandle::Include.

REPORT BUGS

Please report any bugs or feature requests to bug-io-readhandle-include at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=IO-ReadHandle-Include. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc IO::ReadHandle::Include

You can also look for information at:

LICENSE AND COPYRIGHT

Copyright 2018 Louis Strous.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

SEE ALSO

IO::ReadHandle::Chain.