NAME
App::Framework::Extension::Filter - Script filter application object
SYNOPSIS
use App::Framework '::Filter' ;
DESCRIPTION
Application that filters a file or files to produce some other output
Application Subroutines
This extension modifies the normal call flow for the application subroutines. The extension calls the subroutines for each input file being filtered. Also, the main 'app' subroutine is called for each of the lines of text in the input file.
The pseudo-code for the extension is:
FOREACH input file
<init variables, state HASH>
call 'app_start' subroutine
FOREACH input line
call 'app' subroutine
END
call 'app_end' subroutine
END
For each input file, a state HASH is created and passed as a reference to the application subroutines. The state HASH contains various values maintained by the extension, but the application may add it's own additional values to the HASH. These values will be passed unmodified to each of the application subroutine calls.
The state HASH contains the following fields:
num_files
Total number of input files.
file_number
Current input file number (1 to num_files)
file_list
ARRAY ref. List of input filenames.
vars
HASH ref. Empty HASH created so that any application-specific variables may be stored here.
line_num
Current line number of line being processed (1 to N).
output_lines
ARRAY ref. List of the output lines that are to be written to the output file (maintained by the extension).
file
Current file name of the file being processed.
line
String of line being processed.
output
Special variable used by application to tell extension what to output (see "Output").
The state HASH reference is passed to all 3 of the application subroutines. In addition, the input line of text is also passed to the main 'app' subroutine. The interface for the subroutines is:
- app_start($app, $opts_href, $state_href)
-
Called once for each input file. Called at the start of processing. Allows any setting up of variables stored in the state HASH.
Arguments are:
- $app - The application object
- $opts_href - HASH ref to the command line options (see App::Framework::Feature::Options and "Filter Options")
- $state_href - HASH ref to state
- app($app, $opts_href, $state_href, $line)
-
Called once for each input file. Called at the start of processing. Allows any setting up of variables stored in the state HASH.
Arguments are:
- $app - The application object
- $opts_href - HASH ref to the command line options (see App::Framework::Feature::Options and "Filter Options")
- $state_href - HASH ref to state
- $line - Text of input line
- app_end($app, $opts_href, $state_href)
-
Called once for each input file. Called at the end of processing. Allows for any end of file tidy up, data sorting etc.
Arguments are:
- $app - The application object
- $opts_href - HASH ref to the command line options (see App::Framework::Feature::Options and "Filter Options")
- $state_href - HASH ref to state
Output
By default, each time the extension calls the 'app' subroutine it sets the output field of the state HASH to undef. The 'app' subroutine must set this field to some value for the extension to write anything to the output file.
For examples, the following simple 'app' subroutine causes all input files to be output uppercased:
sub app
{
my ($app, $opts_href, $state_href, $line) = @_ ;
# uppercase
$state_href->{output} = uc $line ;
}
If no "outfile" option is specified, then all output will be written to STDOUT. Also, normally the output is written line-by-line after each line has been processed. If the "buffer" option has been specified, then all output lines are buffered (into the state variable "output_lines") then written out at the end of processing all input. Similarly, if the "inplace" option is specified, then buffering is used to process the complete input file then overwrite it with the output.
Outfile option
The "outfile" option may be used to set the output filename. This may include variables that are specific to the Filter extension, where the variables value is updated for each input file being processed. The following Filter-sepcific variables may be used:
$filter{'filter_file'} = $state_href->{file} ;
$filter{'filter_filenum'} = $state_href->{file_number} ;
my ($base, $path, $ext) = fileparse($file, '\..*') ;
$filter{'filter_name'} = $base ;
$filter{'filter_base'} = $base ;
$filter{'filter_path'} = $path ;
$filter{'filter_ext'} = $ext ;
- filter_file - Input full file path
- filter_base - Basename of input file (excluding extension)
- filter_name - Alias for "filter_base"
- filter_path - Directory path of input file
- filter_ext - Extension of input file
- filter_filenum - Input file number (starting from 1)
NOTE: Specifying these variables for options at the command line will require you to escape the variables per the operating system you are using (e.g. use single quotes ' ' around the value in Linux).
For example, with the command line arguments:
-outfile '/tmp/$filter_name-$filter_filenum.txt' afile.doc /doc/bfile.text
Processes './afile.doc' into '/tmp/afile-1.txt', and '/doc/bfile.text' into '/tmp/bfile-2.txt'
Example
As an example, here is a script that filters one or more HTML files to strip out unwanted sections (they are actually Doxygen HTML files that I wanted to convert into a pdf book):
#!/usr/bin/perl
#
use strict ;
use App::Framework '::Filter' ;
# VERSION
our $VERSION = '1.00' ;
## Create app
go() ;
#----------------------------------------------------------------------
sub app_begin
{
my ($app, $opts_href, $state_href, $line) = @_ ;
# force in-place editing
$app->set(inplace => 1) ;
# set to start state
$state_href->{vars} = {
'state' => 'start',
} ;
}
#----------------------------------------------------------------------
# Main execution
#
sub app
{
my ($app, $opts_href, $state_href, $line) = @_ ;
my $ok = 1 ;
if ($state_href->{'vars'}{'state'} eq 'start')
{
if ($line =~ m/<!-- Generated by Doxygen/i)
{
$ok = 0 ;
$state_href->{'vars'}{'state'} = 'doxy-head' ;
}
}
elsif ($state_href->{'vars'}{'state'} eq 'doxy-head')
{
$ok = 0 ;
if ($line =~ m/<div class="contents">/i)
{
$ok = 1 ;
$state_href->{'vars'}{'state'} = 'contents' ;
}
}
elsif ($state_href->{'vars'}{'state'} eq 'contents')
{
if ($line =~ m/<hr size="1"><address style="text-align: right;"><small>Generated/i)
{
$ok = 0 ;
$state_href->{'vars'}{'state'} = 'doxy-foot' ;
}
}
elsif ($state_href->{'vars'}{'state'} eq 'doxy-foot')
{
$ok = 0 ;
if ($line =~ m%</body>%i)
{
$ok = 1 ;
$state_href->{'vars'}{'state'} = 'end' ;
}
}
# only output if ok to do so
$state_href->{'output'} = $line if $ok ;
}
#=================================================================================
# SETUP
#=================================================================================
__DATA__
[SUMMARY]
Filter Doxygen created html removing frames etc.
[DESCRIPTION]
B<$name> does some stuff.
The script takes in HTML of the form:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<title>rctu4_test: File Index</title>
<link href="doxygen.css" rel="stylesheet" type="text/css">
<link href="tabs.css" rel="stylesheet" type="text/css">
</head><body>
**<!-- Generated by Doxygen 1.5.5 -->
**<div class="navigation" id="top">
** <div class="tabs">
** <ul>
..
** </div>
**</div>
<div class="contents">
<h1>File List</h1>Here is a list of all files with brief descriptions:<table>
<tr><td class="indexkey">src/<a class="el" href="rctu4__tests_8c.html">rctu4_tests.c</a></td><td class="indexvalue"></td></tr>
<tr><td class="indexkey">src/common/<a class="el" href="ate__general_8c.html">ate_general.c</a></td><td class="indexvalue"></td></tr>
...
<tr><td class="indexkey">src/tests/<a class="el" href="test__star__daisychain__specific_8c.html">test_star_daisychain_specific.c</a></td><td class="indexvalue"></td></tr>
<tr><td class="indexkey">src/tests/<a class="el" href="test__version__functions_8c.html">test_version_functions.c</a></td><td class="indexvalue"></td></tr>
</table>
</div>
**<hr size="1"><address style="text-align: right;"><small>Generated on Fri Jun 5 13:43:31 2009 for rctu4_test by
**<a href="http://www.doxygen.org/index.html">
**<img src="doxygen.png" alt="doxygen" align="middle" border="0"></a> 1.5.5 </small></address>
</body>
</html>
And removes the lines beginning '**'.
The script does in-place updating of the HTML files and can be run as:
filter-script *.html
ADDITIONAL COMMAND LINE OPTIONS
This extension adds the following additional command line options to any application:
- -skip_empty - Skip blanks
-
Do not process empty lines (lines that contain only whitespace)
- -trim_space - Trim spaces
-
Remove spaces from start and end of lines
- -trim_comment - Trim comments
-
Remove any comments from the line, starting from the comment string to the end of the line
- -inplace - In-place filter
-
Read file, process, then overwrite original input file with processed output
- -outdir - Specify output directory
-
Write file(s) into specified directory rather that into same directory as input file
- -outfile - Specify output file
-
Specify the output filename, which may include variables (see "Output Filename")
- -comment - Specify command string
-
Specify the comment start string. Used in conjuntion with "-trim_comment".
COMMAND LINE ARGUMENTS
This extension sets the following additional command line arguments for any application:
- file - Input file(s)
-
Specify one of more input files to be processed. If no files are specified on the command line then reads from STDIN.
FIELDS
Note that the fields match with the command line options.
- skip_empty - Skip blanks
-
Do not process empty lines (lines that contain only whitespace)
- trim_space - Trim spaces
-
Remove spaces from start and end of lines
- trim_comment - Trim comments
-
Remove any comments from the line, starting from the comment string to the end of the line
- inplace - In-place filter
-
Read file, process, then overwrite original input file with processed output
- buffer - Buffer output
-
Store output lines into a buffer, then write out file at end of processing
- outdir - Specify output directory
-
Write file(s) into specified directory rather that into same directory as input file
- outfile - Specify output file
-
Specify the output filename, which may include variables (see "Output Filename")
- comment - Specify command string
-
Specify the comment start string. Used in conjuntion with "trim_comment".
- out_fh - Output file handle
-
Read only. File handle of current output file.
CONSTRUCTOR METHODS
- new([%args])
-
Create a new App::Framework::Extension::Filter.
The %args are specified as they would be in the set method, for example:
'mmap_handler' => $mmap_handler
The full list of possible arguments are :
'fields' => Either ARRAY list of valid field names, or HASH of field names with default values
CLASS METHODS
OBJECT METHODS
- filter_run($app, $opts_href, $args_href)
-
Filter the specified file(s) one at a time.
- write_output($output)
-
Application interface for writing out extra lines
- _start_output($state_href, $opts_href)
-
Start of output file
- _handle_output($state_href, $opts_href)
-
Write out line (if required)
- _end_output($state_href, $opts_href)
-
End of output file
- _open_output($state_href, $opts_href)
-
Open the file (or STDOUT) depending on settings
- _close_output($state_href, $opts_href)
-
Close the file if open
- _wr_output($state_href, $opts_href, $line)
-
End of output file
DIAGNOSTICS
Setting the debug flag to level 1 prints out (to STDOUT) some debug messages, setting it to level 2 prints out more verbose messages.
AUTHOR
Steve Price <sdprice at cpan.org>
BUGS
None that I know of!