NAME
podfilter - function to extract selected sections of pod documentation
Pod::Filter - base class for creating pod filters and translators
SYNOPSIS
use Pod::Filter qw(podfilter);
podfilter (@filelist);
podfilter ({OUTPUT => "tmp.out"}, @filelist):
podfilter ({SELECT => ["NAME|SYNOPSIS", "OPTIONS"]}, @filelist):
podfilter ({OUTPUT => ">&STDERR", SELECT => ["DESCRIPTION"]}, "-");
or
use Pod::Filter;
package MyFilter;
@ISA = qw(Pod::Filter);
sub new {
## constructor code ...
}
## implementation of appropriate subclass methods ...
package main;
$filter = new MyFilter;
@ARGV = ('-') unless (@ARGV > 0);
for (@ARGV) {
$filter->process_file($_);
}
DESCRIPTION
Pod::Filter is an abstract base class for implementing filters and/or translators to convert pod documentation into other formats. It handles most of the difficulty of parsing the pod sections in a file and leaves it to the subclasses to override various methods to provide the actual translation. The other thing that Pod::Filter provides is the ability to process only selected sections of pod documentation from the input.
SECTION SPECIFICATIONS
Certain methods and functions provided by Pod::Filter may be given one or more "section specifications" to restrict the text processed to only the desired set of sections and their corresponding subsections. A section specification is a string containing one or more Perl-style regular expressions separated by forward slashes ("/"). If you need to use a forward slash literally within a section title you can escape it with a backslash ("\/").
The formal syntax of a section specification is:
head1-title-regexp/head2-title-regexp/...
Any omitted or empty regular expressions will default to ".*". Please note that each regular expression given is implicitly anchored by adding "^" and "$" to the beginning and end. Also, if a given regular expression starts with a "!" character, then the expression is negated (so !foo
would match anything except foo
).
Some example section specifications follow.
- Match the
NAME
andSYNOPSIS
sections and all of their subsections: -
NAME|SYNOPSIS
- Match only the
Question
andAnswer
subsections of theDESCRIPTION
section: -
DESCRIPTION/Question|Answer
- Match the
Comments
subsection of all sections: -
/Comments
- Match all subsections of
DESCRIPTION
except forComments
: -
DESCRIPTION/!Comments
- Match the
DESCRIPTION
section but do not match any of its subsections: -
DESCRIPTION/!.+
- Match all top level sections but none of their subsections:
-
/!.+
FUNCTIONS
Pod::Filter provides the following functions (please note that these are functions and not methods, they do not take an object reference as an implicit first parameter):
version()
Return the current version of this package.
podfilter(\%options, @filelist)
podfilter will print the raw (untranslated) pod documentation of all pod sections in the given input files specified by @filelist
according to the given options.
If any argument to podfilter is a reference to a hash (associative array) then the values with the following keys are processed as follows:
OUTPUT
-
A string corresponding to the desired output file (or ">&STDOUT" or ">&STDERR"). The default is to use standard output.
SELECT
-
A reference to an array of sections specifications (as described in "SECTION SPECIFICATIONS") which indicate the desired set of pod sections and subsections to be selected from input. If no section specifications are given, then all sections of pod documentation are used.
All other arguments should correspond to the names of input files containing pod documentation. A file name of "-" or "<&STDIN" will be intepreted to mean standard input (which is the default if no filenames are given).
INSTANCE METHODS
Pod::Filter provides several methods, some of which should be overridden by subclasses. They are as follows:
new()
This is the the constructor for the base class. You should only use it if you want to create an instance of a Pod::Filter instead of one of its subclasses. The constructor for this class and all of its subclasses should return a blessed reference to an associative array (hash).
initialize()
This method performs any necessary base class initialization. It takes no arguments (other than the object instance of course). If subclasses override this method then they must be sure to invoke the superclass' initialize() method.
select($section_spec1, $section_spec2, ...)
This is the method that is used to select the particular sections and subsections of pod documentation that are to be printed and/or processed. Each of the $section_spec arguments should be a section specification as described in "SECTION SPECIFICATIONS". The section specifications are parsed by this method and the resulting regular expressions are stored in the array referenced by $self->{SELECTED} (please see the description of this member variable in "INSTANCE DATA").
This method should not normally be overridden by subclasses.
want_section($head1_title, $head2_title, ...)
Returns a value of true if the given section and subsection titles match any of the section specifications passed to the select() method (or if no section specifications were given). Returns a value of false otherwise. If $headN_title is ommitted then it defaults to the current headN
section title in the input.
This method should not normally be overridden by subclasses.
begin_input()
This method is invoked by process_filehandle() immediately prior to processing input from a filehandle. The base class implementation does nothing but subclasses may override it to perform any per-file intializations.
end_input()
This method is invoked by process_filehandle() immediately after processing input from a filehandle. The base class implementation does nothing but subclasses may override it to perform any per-file cleanup actions.
preprocess($text)
This method should be overridden by subclasses that wish to perform any kind of preprocessing for each block (paragraph) of pod documentation. The parameter $text is the pod paragraph from the input file and the value returned should correspond to the new text to use in its place. If the empty string is returned or an undefined value is returned, then the given $text is ignored (not processed).
This method is invoked by process_filehandle(). After it returns, process_filehandle() examines the current cutting state (which is stored in $self->{CUTTING}
). If it evaluates to false then input text (including the given $text) is cut (not processed) until the next pod directive is encountered.
The base class implementation of this method returns the given text.
process_pragmas($text)
This method is called when an =pod
directive is encountered. It is passed the remainder of the text block that appeared immediately after the =pod
command. Each word in $text is examined to see if it is a pragma specification. Pragma specifications are of the form pragma_name=pragma_value
.
Unless the given object is an instance of the Pod::Filter class, the base class implementation of this method will invoke the pragma() method for each pragma specification in $text. If and only if the given object is an instance of the Pod::Filter class, the base class version of this method will simply reproduce the =pod
command exactly as it appeared in the input.
Derived classes should not usually need to reimplement this method.
pragma($pragma_name, $pragma_value)
This method is invoked for each pragma encountered inside an =pod
paragraph (see the description of the process_pragmas() method). The pragma name is passed in $pragma_name (which should always be lowercase) and the corresponding value is $pragma_value.
The base class implementation of this method does nothing. Derived class implementations of this method should be able to recognize the following pragmas and take any necessary actions when they are encountered:
- fill=value
-
The argument value should be one of
on
,off
, orprevious
. Specified that "filling-mode" should set to 1, 0, or its previous value (respectively). If value is omitted then the default ison
. Derived classes may use this to decide whether or not to perform any filling (wrapping) of subsequent text. - style=value
-
The argument value should be one of
bold
,italic
,code
,plain
, orprevious
. Specifies that the current default paragraph font should be set tobold
,italic
,code
, the empty string, or its previous value (respectively). If value is omitted then the default is
plain
. Derived classes may use this to determine the default font style to use for subsequent text. - indent=value
-
The argument value should be an integer value (with an optional sign). Specifies that the current indentation level should be reset to the given value. If a plus (minus) sign precedes the number then the indentation level should be incremented (decremented) by the given number. If only a plus or minus sign is given (without a number) then the current indentation level is incremented or decremented by some default amount (to be determined by subclasses).
Please note that all the pragma names and values above are case insensitive and that pragma values may be abbreviated to a unique prefix (pragma names may not be abbreviated). The return value will be 1 if the pragma name was recognized and 0 if it wasnt (in which case the pragma was ignored).
Derived classes may wish to override this method (perhaps invoking the base class' method somewhere in the implementation).
command($cmd, $text, $sep)
This method should be overridden by subclasses to take the appropriate action when a pod command paragraph (denoted by a line beginning with "=") is encountered. When such a pod directive is seen in the input, this method is called and is passed the command name $cmd and the remainder of the text paragraph $text which appears immediately after the command name. If desired, the text which separated the command from its corresponding text may be found in $sep. Note that this method is not called for =pod
paragraphs.
The base class implementation of this method simply prints the raw pod command to the output filehandle and then invokes the textblock() method, passing it the $text parameter.
verbatim($text)
This method may be overridden by subclasses to take the appropriate action when a block of verbatim text is encountered. It is passed the text block $text as a parameter.
The base class implementation of this method simply prints the textblock (unmodified) to the output filehandle.
textblock($text)
This method may be overridden by subclasses to take the appropriate action when a normal block of pod text is encountered (although the base class method will usually do what you want). It is passed the text block $text as a parameter.
Subclasses implementations of this method should invoke the method interpolate(), passing it the text block $text as a parameter and then perform any desired processing upon the returns result.
The base class implementation of this method simply prints the text block as it occurred in the input stream).
interior_sequence($seq_cmd, $seq_arg)
This method should be overridden by subclasses to take the appropriate action when an interior sequence is encountered. An interior sequence is an embedded command within a block of text which appears as a command name (usually one or more uppercase characters) followed immediately by a string of text which is enclosed in angle brackets. This method is passed the sequence command $seq_cmd and the corresponding text $seq_arg and is invoked by the interpolate() method for each interior sequence that occurs in the string that it is passed. It should return the desired text string to be used in place of the interior sequence.
Subclass implementationss of this method may wish to examine the the array referenced by $self->{SEQUENCES}
which is a stack of all the interior sequences that are currently being processed (they may be nested). The current interior sequence (the one given by $seq_cmd<$seq_arg>
) should always be at the top of this stack.
The base class implementation of the interior_sequence() method simply returns the raw text of the of the interior sequence (as it occurred in the input) to the output filehandle.
interpolate($text, $end_re)
This method will translate all text (including any embedded interior sequences) in the given text string $text and returns the interpolated result. If a second argument is given, then it is taken to be a regular expression that indicates when to quit interpolating the string. Upon return, the $text
parameter will have been modified to contain only the un-processed portion of the given string (which will not contain any text matched by $end_re
).
This method should probably not be overridden by subclasses. It should be noted that this method invokes itself recursively to handle any nested interior sequences.
process_filehandle($infilehandle, $outfilehandle)
This method takes a glob to a filehandle (which is assumed to already be opened for reading) and reads the entire input stream looking for blocks (paragraphs) of pod documentation to be processed. For each block of pod documentation encountered it will call the appropriate method (one of command(), verbatim(), or textblock()). If a second argument is given then it should a filehandle glob where output should be sent (otherwise the default output filehandle is STDOUT
). If no first argument is given the default input filehandle STDIN
is used.
The input filehandle that is currently in use is stored in the member variable whose key is "INPUT" (e.g. $self->{INPUT}
).
The output filehandle that is currently in use is stored in the member variable whose key is "OUTPUT" (e.g. $self->{OUTPUT}
).
This method does not usually need to be overridden by subclasses.
process_file($filename, $outfile)
This method takes a filename and does the following:
opens the input and output files for reading (creating the appropriate filehandles)
invokes the process_filehandle() method passing it the corresponding input and output filehandles.
closes the input and output files.
If the special input filename "-" or "<&STDIN" is given then the STDIN filehandle is used for input (and no open or close is performed). If no input filename is specified then "-" is implied.
If a second argument is given then it should the name of the desired output file. If the special output filename "-" or ">&STDOUT" is given then the STDOUT filehandle is used for output (and no open or close is performed). If the special output filename ">&STDERR" is given then the STDERR filehandle is used for output (and no open or close is performed). If no output filename is specified then "-" is implied. If a reference is passed instead of a filename then it is assumed to be a reference to a filehandle.
This method does not usually need to be overridden by subclasses.
INSTANCE DATA
Pod::Filter uses the following data members for each of its instances (where $self
is a reference to such an instance):
$self->{INPUT}
The current input filehandle.
$self->{OUTPUT}
The current output filehandle.
$self->{HEADINGS}
A reference to an array of the current section heading titles for each heading level (note that the first heading level title is at index 0).
$self->{SELECTED}
A reference to an array of references to arrays. Each subarray is a list of anchored regular expressions (preceded by a "!" if the regexp is to be negated). The index of the expression in the subarray should correspond to the index of the heading title in $self->{HEADINGS} that it is to be matched against.
$self->{CUTTING}
A boolean-valued scalar which evaluates to true if text from the input file is currently being "cut".
$self->{SEQUENCES}
An array reference to the stack of interior sequence commands that are currently in the middle of being processed.
NOTES
To create a pod translator to translate pod documentation to some other format, you usually only need to create a subclass of Pod::Filter which overrides the base class implementation for the following methods:
pragma()
command()
verbatim()
textblock()
interior_sequence()
You may also want to implement the begin_input() and end_input() methods for your subclass.
Also, don't forget to make sure your subclass constructor invokes the base class' intialize() method.
Sometimes it may be necessary to make more than one pass over the input files. This isnt a problem as long as none of the input files correspond to standard input. You can override either the process_filehandle method or the process_file method to make the first pass yourself to collect all the information you need and then invoke the base class method to do the rest of the standard processing.
Feel free to add any member data fields you need to keep track of things like current font, indentation, horizontal or vertical position, or whatever else you like.
For the most part, the Pod::Filter base class should be able to do most of the input parsing for you and leave you free to worry about how to intepret the commands and translate the result.
AUTHOR
Brad Appleton <Brad_Appleton-GBDA001@email.mot.com>
Based on code for Pod::Text written by Tom Christiansen <tchrist@mox.perl.com>