NAME
Text::Parts - split text file to some parts(from one line start to another/same line end)
SYNOPSIS
If you want to split a text file to some number of parts:
use Text::Parts;
my $splitter = Text::Parts->new(file => $file);
my (@parts) = $splitter->split(num => 4);
foreach my $part (@parts) {
while(my $l = $part->getline) { # or <$part>
# ...
}
}
If you want to split a text file by about specified size:
use Text::Parts;
my $splitter = Text::Parts->new(file => $file);
my (@parts) = $splitter->split(size => 10); # size of part will be more than 10.
# same as the previous example
If you want to split CSV file:
use Text::Parts;
use Text::CSV_XS; # don't work with Text::CSV_PP if you want to use {binary => 1} option
# I don't recommend to use it for CSV which has multiline lines in columns.
my $csv = Text::CSV_XS->new();
my $splitter = Text::Parts->new(file => $file, parser => $csv);
my (@parts) = $splitter->split(num => 4);
foreach my $part (@parts) {
while(my $col = $part->getline_parser) { # getline_parser returns parsed result
print join "\t", @$col;
# ...
}
}
Write splitted parts to files:
$splitter->write_files('file%d.csv', num => 4);
my $i = 0;
foreach my $part ($splitter->slit(num => 4)) {
$part->write_file("file" . $i++ . '.csv');
}
with Parallel::ForkManager:
my $splitter = Text::Parts->new(file => $file);
my (@parts) = $splitter->split(num => 4);
my $pm = new Parallel::ForkManager(4);
foreach my $part (@parts) {
$pm->start and next; # do the fork
while (my $l = $part->getline) {
# ...
}
}
$pm->wait_all_children;
NOTE THAT: If the file is on the same disk, fork is no use. Maybe, using fork makes sense when the file is on RAID (I haven't try it).
DESCRIPTION
This module splits file by specified number of part. The range of each part is from one line start to another/same line end. For example, file content is the following:
1111
22222222222222222222
3333
4444
If $splitter->split(num => 3)
, split like the following:
1st part: 1111 22222222222222222222
2nd part: 3333
3rd part: 4444
At first, split
method tries to split by bytes of file size / 3, Secondly, tries to split by bytes of rest file size / the number of rest part. So that:
1st part : 36 bytes / 3 = 12 byte + bytes to line end(if needed)
2nd part : (36 - 26 bytes) / 2 = 5 byte + bytes to line end(if needed)
last part: rest part of file
METHODS
new
$s = Text::Parts->new(file => $filename);
$s = Text::Parts->new(file => $filename, parser => Text::CSV_XS->new({binary => 1}));
Constructor. It can take following options:
num
number how many you want to split.
size
file size how much you want to split. This value is used for calculating num
. If file size is 100 and this value is 25, num
is 4.
file
target file which you want to split.
parser
Pass parser object(like Text::CSV_XS->new()). The object must have method which takes filehandle and whose name is getline
as default. If the object's method is different name, pass the name to parser_method
option.
parser_method
name of parser's method. default is getline
.
check_line_start
If this options is true, check line start and move to this position before <$fh>
or parser's getline
/parser_method
. It may be useful when parser's getline
/parser_method
method doesn't work correctly when parsing wrong format.
default value is 0.
no_open
If this option is true, don't open file on creating Text::Parts::Part object. You need to call open_and_seek
method from the object when you read the file (But, all
and write_file
checks this option, so you don't need to call open_and_seek
).
This option is required when you pass too much number, which is more than OS's open file limit, to split method.
file
my $file = $s->file;
$s->file($filename);
get/set target file.
parser
my $parser_object = $s->parser;
$s->parser($parser_object);
get/set parser object.
parser_method
my $method = $s->parser_method;
$s->parser_method($method);
get/set parser method.
split
my @parts = $s->split(num => $num);
my @parts = $s->split(size => $size);
my @parts = $s->split(num => $num, max_num => 3);
Try to split target file to $num
of parts. The returned value is array of Text::Parts::Part object. If you pass size => bytes
, calculate $num
from file size / $size
.
This method doesn't actually split file, only calculate the start and end position of parts.
This returns array of Text::Parts::Part object. See "Text::Parts::Part METHODS".
If you set max_num, only split number of max_num.
my @parts = $s->split(num => 5, max_num => 2);
This trys to split 5 parts, but only 2 parts are returned. This is useful to try to test a few parts of too many parts.
eol
my $eol = $s->eol;
$s->eol($eol);
get/set end of line string. default value is $/.
write_files
@filenames = $s->write_files('path/to/name%d.txt', num => 4);
name_format
is the format of filename. %d is replaced by number. For example:
path/to/name1.txt
path/to/name2.txt
path/to/name3.txt
path/to/name4.txt
The rest of arguments are as same as split
except the following 2 options.
- code
-
code
option takes code reference which would be done immediately after file had been written. If you passcode
option as the following:@filenames = $s->write_files('path/to/name%d.txt', num => 4, code => \&do_after_split)
splitted file name is given to &do_after_split:
sub do_after_split { my $filename = shift; # 'path/to/name1.txt' # ... unlink $filename; }
- start_number
-
@filenames = $s->write_files('path/to/name%d.txt', num => 4, start_number => 0); # $filenames[0] is 'path/to/name0.txt'
This is used for filename.
if start_number is 0.
path/to/name0.txt path/to/name1.txt ...
if start_number is 1 (default).
path/to/name1.txt path/to/name2.txt ...
if start_number is 2
path/to/name2.txt path/to/name3.txt ...
- last_number
-
If last_number is specified, stop to split file when number reaches last_number. Note that this option override max_num.
@filenames = $s->write_files('path/to/name%d.txt', num => 4, start_number => 0, last_number => 1); # $filenames[0] is 'path/to/name0.txt' # $filenames[1] is 'path/to/name1.txt' # $filenames[2] doesn't exist
Text::Parts::Part METHODS
Text::Parts::Part objects are returned by split
method.
getline
my $line = $part->getline;
return 1 line. You can use <$part>
, also.
my $line = <$part>
getline_parser
my $parsed = $part->getline_parser;
returns parsed result.
all
my $all = $part->all;
$part->all(\$all);
return all of the part. just read
from start to end position.
If scalar reference is passed as argument, the content of the part is into the passed scalar.
This method checks no_open option. If no_open is true, open file before writing file and close file after writing.
eof
$part->eof;
If current position is the end of parts, return true.
write_file
$part->write_file($filename);
Write the contents of the part to $filename.
This method checks no_open option. If no_open is true, open file before writing file and close file after writing.
open_and_seek
$part->open_and_seek;
If the object is created with no_open true, you need to call this method before reading file.
close
$part->close;
close file handle.
is_opened
$part->is_opened;
If file handle is opened, return true.
AUTHOR
Ktat, <ktat at cpan.org>
BUGS
Please report any bugs or feature requests to bug-text-parts at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Parts. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Text::Parts
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2011 Ktat.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.