NAME
linkBags - input file of links and tokens for document set, plus tables generated with linkTables, to produce forward bags.
SYNOPSIS
linkBags [--linktext|--nocase|--noclean|--titletext|--update] LINK-FILE STEM
Options:
LINK-FILE Filename for input link file usually created by XSL.
STEM Stem for output file, severl extensions read and made.
--linktext add link text to the bag
--nocase set nocase flag in Alvis::URLs
--noclean set noclean flag in Alvis::URLs
--titletext add title text to the bag
--update indicate to the output config file that this is an update
-h, --help display help message and exit.
--man print man page and exit.
DESCRIPTION
This package works in conjunction with an XSL script which is used to generate a text file giving URL+title+link+tag information for the input XML files. Use name '-' to input stdin. The final output files are created when the MPCA mpdata(1) utility is called.
Input file of links and tags is assumed to be in UTF-8 encoding in the format given in linkTables(1). Separate tables ( STEM.words, STEM.docs, STEM.docmap) have previously been made by running linkTables(1). The linktext, nocase, noclean and titletext flags should be the same as those used.
If redirects exist in the links file, process them out first using the linkRedir(1) script. Code assumes that collection is small enough so that all required hash tables fit into memory.
Output files in form STEM.EXT, for EXT one of: .txtbag[.gz] : constructed input text bag for mpdata .bag, .cnf, .iinx, .par : usual output of mpdata
See mpdata(1) for detail of these formats.
SEE ALSO
Alvis::URLs(3), linkMpca(1), linkRedir(1), linkTables(1), mpdata(1).
MPCA website is http://www.componentanalysis.org
AUTHOR
Wray Buntine
COPYRIGHT AND LICENSE
Copyright (C) 2005-2006 Wray Buntine
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.