NAME

Alvis::Convert - Perl extension for converting documents from a number of different source formats to Alvis XML format.

SYNOPSIS

 use Alvis::Convert;

 # Create a new instance, outputting under 'out'. Get the detected
 # encoding from sourceEncodingFromMeta.
 #
 my $C=Alvis::Convert->new(outputRootDir=>'out',
	   		   outputNPerSubdir=>1000,
			   outputAtSameLocation=>0,
			   includeOriginalDocument=>0,
                           sourceEncodingFromMeta=>1);
 # Restart output counters
 $C->init_output();

 # Convert e.g. HTML
 for my $html_text (@html)
 {
     my $alvisXML=$C->HTML($html_txt,$meta_txt);
     if (!defined($alvisXML))
     {
	warn $C->errmsg();
	$C->clearerr();
	next;
     }
 
     if (!$C->output_Alvis([$alvisXML]))
     {
         warn $C->errmsg();
         $C->clearerr();
         next;
     }
 }

DESCRIPTION

Converts document collections of different formats to Alvis XML format.

METHODS

new()

Options:

fileType                 the MIME type of the source file to convert. 
                         Default: guess.
sourceEncoding           encoding of the source document. Default: guess.  
urlFromBasename          extract URL from basename. Default: no.
outputAtSameLocation     output Alvis XML to the same directories as the
                         source documents. Default: no.
alvisSuffix              suffix of the output Alvis XML records. Default:
                         'alvis'.
outputRootDir            root directory for output files. Default: '.'
outputNPerSubdir         number of records output per subdirectory.
                         Default: 1000
defaultDocType           first guess document (MIME) type. Default: 'text'.
defaultDocSubType        first guess document subtype. Default: 'html'.
defaultEncoding          first guess encoding. Default: 'iso-8859-1'.
includeOriginalDocument  include original document in the output?
                         Default: yes.
ainodumpWarnings         issue warnings concerning ainodump conversion?
                         Default: yes.
sourceEncodingFromMeta   read source encoding from Meta information?
                         Default: no.

HTML()

     my $alvisXML=$C->HTML($html_txt,$meta_txt,
                           {sourceEncoding=>'utf8',
                            sourceEncodingFromMeta=>0
                            });
     if (!defined($alvisXML))
     {
	warn $C->errmsg();
	$C->clearerr();
	next;
     }

newsXML()

$meta_txt=$C->read_meta($news_xml_entries{$base_name}{metaF});
if (!defined($meta_txt))
{
    warn "Reading meta file " .
         "\"$news_xml_entries{$base_name}{metaF}\" failed. " .
         $C->errmsg();
    $C->clearerr();
    next;
}
my $alvisXMLs;
$xml_txt=$C->read_news_XML($news_xml_entries{$base_name}{xmlF});
if (!defined($xml_txt))
{
    warn "Reading the news XML for basename \"$base_name\" failed. " .
          $C->errmsg();
    $C->clearerr();
    next;
}
$alvisXMLs=$C->newsXML($xml_txt,$meta_txt,$original_document_text);
if (!defined($alvisXMLs))
{
    warn "Obtaining the Alvis versions of the documents inside " .
         "\"$base_name\"'s XML file failed. " . $C->errmsg();
    $C->clearerr();
    next;
}

ainodump()

if (!$C->ainodump($ainodump_file))
{
   warn "Obtaining the Alvis version of the " .
        "ainodump file \"$dump_entries{$base_name}{ainoF}\" " .
        "failed. " . $C->errmsg() if
          $Warnings;
   $C->clearerr();
}

set()

$C->set('alvisSuffix','foo');

read_HTML()

$html_txt=$C->read_HTML($html_file});
 if (!defined($html_txt))
 {
     warn "Reading the HTML failed. " .
           $C->errmsg();
     $C->clearerr();
     next;
 }

read_meta()

read_news_XML()

init_output()

Initializes output counters.

output_alvis()

$alvisXML=$C->HTML($html_txt,$meta_txt);
if (!$C->output_Alvis([$alvisXML],$base_name))
{
    warn "Outputting the Alvis records failed. " . $C->errmsg() if
            $Warnings;
    $C->clearerr();
    next;
}

errmsg()

Returns a stack of error messages, if any. Empty string otherwise.

SEE ALSO

Alvis::Document

AUTHOR

Kimmo Valtonen, <kimmo.valtonen@hiit.fi>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Kimmo Valtonen

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.