NAME

linkBags - input file of links and tokens for document set, plus tables generated with linkTables, to produce forward bags.

SYNOPSIS

linkBags [--linktext|--nocase|--noclean|--titletext|--update] LINK-FILE STEM

Options:

LINK-FILE           Filename for input link file usually created by XSL.
STEM                Stem for output file, severl extensions read and made.
--linktext          add link text to the bag
--nocase            set nocase flag in Alvis::URLs
--noclean           set noclean flag in Alvis::URLs
--titletext         add title text to the bag
--update            indicate to the output config file that this is an update
-h, --help          display help message and exit.
 --man              print man page and exit.

DESCRIPTION

This package works in conjunction with an XSL script which is used to generate a text file giving URL+title+link+tag information for the input XML files. Use name '-' to input stdin. The final output files are created when the MPCA mpdata(1) utility is called.

Input file of links and tags is assumed to be in UTF-8 encoding in the format given in linkTables(1). Separate tables ( STEM.words, STEM.docs, STEM.docmap) have previously been made by running linkTables(1). The linktext, nocase, noclean and titletext flags should be the same as those used.

If redirects exist in the links file, process them out first using the linkRedir(1) script. Code assumes that collection is small enough so that all required hash tables fit into memory.

Output files in form STEM.EXT, for EXT one of: .txtbag[.gz] : constructed input text bag for mpdata .bag, .cnf, .iinx, .par : usual output of mpdata

See mpdata(1) for detail of these formats.

SEE ALSO

Alvis::URLs(3), linkMpca(1), linkRedir(1), linkTables(1), mpdata(1).

MPCA website is http://www.componentanalysis.org

AUTHOR

Wray Buntine

COPYRIGHT AND LICENSE

Copyright (C) 2005-2006 Wray Buntine

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.