NAME
Text::Parts - split text file to some parts(from one line start to another/same line end)
SYNOPSIS
If you want to split Text file to some number of parts:
use Text::Parts;
my $splitter = Text::Parts->new(file => $file);
my (@parts) = $splitter->split(num => 4);
foreach my $part (@parts) {
while(my $l = $part->getline) { # or <$part>
# ...
}
}
If you want to split Text file by about specified size:
use Text::Parts;
my $splitter = Text::Parts->new(file => $file);
my (@parts) = $splitter->split(size => 10); # size of part will be more that 10.
# same as the previous example
If you want to split CSV file:
use Text::Parts;
use Text::CSV_XS; # don't work with Text::CSV_PP if you want to use {binary => 1} option
my $csv = Text::CSV_XS->new();
my $splitter = Text::Parts->new(file => $file, parser => $csv);
my (@parts) = $splitter->split(num => 4);
foreach my $part (@parts) {
while(my $col = $part->getline_parser) { # getline_parser returns parsed result
print join "\t", @$col;
# ...
}
}
with Parallel::ForkManager:
my $splitter = Text::Parts->new(file => $file);
my (@parts) = $splitter->split(num => 4);
my $pm = new Parallel::ForkManager(4);
foreach my $part (@parts) {
$pm->start and next; # do the fork
while (my $l = $part->getline) {
# ...
}
}
$pm->wait_all_children;
DESCRIPTION
This moudle splits file by specified number of part. The range of each part is from one line start to another/same line end. For example, file content is the following:
1111
22222222222222222222
3333
4444
If $splitter->split(num => 3)
, split like the following:
1st part: 1111 22222222222222222222
2nd part: 3333
3rd part: 4444
At first, split
method trys to split by bytes of file size / 3, Secondly, trys to split by bytes of rest file size / the number of rest part. So that:
1st part : 36 bytes / 3 = 12 byte + bytes to line end(if needed)
2nd part : (36 - 26 bytes) / 2 = 5 byte + bytes to line end(if needed)
last part: rest part of file
METHODS
new
$s = Text::Parts->new(file => $filename);
$s = Text::Parts->new(file => $filename, parser => Text::CSV_XS->new({binary => 1}));
Constructor. It can take following optins:
num
number how many you want to split.
size
file size how much you want to split. This value is used for calucurating num
. If file size is 100 and this value is 25, num
is 4.
file
target file which you want to split.
parser
Pass parser object(like Text::CSV_XS->new()). The object must have method which takes filehandle and whose name is getline
as default. If the object's method is different name, pass the name to parser_method
option.
parser_method
name of parser's method. default is getline
.
check_line_start
If this options is true, check line start and move to this position before <$fh>
or parser's getline
/parser_method
. It may be useful when parser's getline
/parser_method
method doesn't work correctly when parsing wrong format.
default value is 0.
file
my $file = $s->file;
$s->file($filename);
get/set target file.
parser
my $parser_object = $s->parser;
$s->parser($parser_object);
get/set paresr object.
parser_method
my $method = $s->parser_method;
$s->parser_method($method);
get/set paresr method.
split
my @parts = $s->split(num => $num);
my @parts = $s->split(size => $size);
Try to split target file to $num
of parts. The returned value is array of Text::Parts::Part object. If you pass size => bytes
, calcurate $num
from file size / $size
.
This returns array of Text::Parts::Part object. See "Text::Parts::Part METHODS".
eol
my $eol = $s->eol;
$s->eol($eol);
get/set end of line string. default value is $/.
Text::Parts::Part METHODS
Text::Parts::Part objects are returned by split
method.
getline
my $line = $part->getline;
return 1 line. You can use <$part>
, also.
my $line = <$part>
getline_parser
my $parsed = $part->getline_parser;
returns parsed result.
eof
$part->eof;
If current position is the end of parts, return true.
AUTHOR
Ktat, <ktat at cpan.org>
BUGS
Please report any bugs or feature requests to bug-text-parts at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Parts. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Text::Parts
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2011 Ktat.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.