NAME

alvisXMLsplit -- splits a big file into pieces in a directory for easier processing.

SYNOPSIS

alvisXMLsplit [--bzip2]  <Alvis XML file> <N per file> <out-dir>

Split a large file into N documentRecords per file into a directory.
Set --bzip2 if both input and output are bzip2'ed
Output file is UTF8 and Perl friendly, so one <documentRecord> or
</documentRecord> per line to facilitate processing.

DESCRIPTION

Script to split a big file into pieces in a directory for easier processing. Algorithm is simple, but a bit slow because each document is built up in memory before being dumped, and this is not efficient in Perl.

AUTHOR

Wray Buntine