NAME

wikipedia2alvis.pl - Wikipedia XML dump to Alvis XML converter

SYNOPSIS

  wikipedia2alvis.pl [options] [Wikipedia XML dump file]

Options:

  --out-dir                      output directory
  --namespaces                   list of namespaces to extract
  --N-per-out-dir                # of records per output directory
  --[no-]original                include original document?
  --[no-]expand-templates-fully  do we try to expand templates fully?
  --[no-]dump-templates          do we dump the templates?
  --template-dump-file           the file to dump the templates to
  --[no-]convert-via-html        do we convert via HTML or directly to Alvis? 
  --date                         the date of the Wikipedia dump
  --[no-]dump-category-graph     do we dump the category graph?
  --category-graph-dump-file     the file to dump the category graph to
  --category-word                category namespace identifier
  --root-category                root category identifier
  --template-word                template namespace identifier
  --language                     the language of the Wikipedia dump
  --help                         brief help message
  --man                          full documentation
  --[no]warnings                 warnings output flag
  

OPTIONS

--out-dir
Sets the output directory. Default value: '.'.
--namespaces
Sets the namespaces whose records to extract. Given as a ','-separated
list. The namespace names have to be the exact identifiers. 
Articles are always extracted. Default value: '''', i.e. articles.
--N-per-out-dir
Sets the # of records per output directory. Default value: 1000.
--[no-]original
Shall the original document be included in the output? Default
value: no.
--[no-]expand-templates-fully
Do we try to expand templates fully or do we simply insert a list of
the template parameter values given in the call? Default value: no.
--[no-]dump-templates
Do we dump the templates onto disk in a loadable format? 
Default value: no.
--template-dump-file
 The name of the (possible) template dump file. Default value: 
'Templates.storable'.
--[no-]convert-via-html
Do we sacrifice speed for quality (possibly) by converting from 
Wikitext to Alvis XML via an intermediate HTML version. 
Default value: yes.
--language
The language of the Wikipedia dump. Affects category and template
extraction. Possible values: 'en' (English), 'fr' (French), 'sl'
(Slovenian). Default value: 'en'.
--category-word
The identifier for the category namespace. Overruled by '--language'.
Default value: 'Category'.
--root-category
The identifier for the root category of the category graph. 
Overruled by '--language'. Default value: 'fundamental'.
--template-word
The identifier for the template namespace. Overruled by '--language'.
Default value: 'Template'.
--date
The date of the Wikipedia dump as YYYYMMDD. Default value: undefined 
(means: use current date).
--[no-]dump-category-graph
Do we dump the category graph onto disk in a loadable format?. 
Default value: yes.
--category-graph-dump-file
The name of the (possible) category graph dump file. Default value: 
'CategoryGraph.storable'.
--help
Prints a brief help message and exits.
--man
Prints the manual page and exits.
--[no]warnings
Output (or suppress) warnings. Default value: yes.

DESCRIPTION

Converts the articles in the Wikipedia XML dump to Alvis records.