NAME
split-data.pl
SYNOPSIS
Splits a given data file into N parts such that each part has approximately same number of lines.
USAGE
split-data.pl [Options] DATA
Type 'split-data.pl --help' for a quick summary of the Options.
INPUT
Required Arguments:
DATA
DATA should be a file in plain text format such that each line in the DATA file shows a single training example.
Optional Arguments:
--parts N
Splits the DATA file into N equal parts. If the DATA file has M lines, each part except the last part will have int(M/N) lines while the last part will have all the remaining lines, M - (N-1 * (int(M/N))).
Default N is 10.
Other Options :
--help
Displays the quick summary of options.
--version
Displays the version information.
OUTPUT
split-data.pl creates exactly N files in the current directory. If the name of the DATA file is say DATA-file, then the N files will have names as DATA-file1, DATA-file2, DATA-file3,... DATA-fileN. e.g. If the DATA filename is ANC, then the N files created by split-data.pl will have names like ANC1, ANC2, ..., ANCN.
A DATA file containing total M lines is split into N parts such that each part/file contains approximately M/N lines.
Thus, if N = 1, the output file will be exactly same as the given DATA file. If N = M where N = value of --parts and M = #lines in DATA then, each part will have a single line.
AUTHOR
Amruta Purandare, Ted Pedersen. University of Minnesota, Duluth.
COPYRIGHT
Copyright (c) 2004,
Amruta Purandare, University of Minnesota, Duluth. pura0010@umn.edu
Ted Pedersen, University of Minnesota, Duluth. tpederse@umn.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.