TITLE
sh2odt - convert Shoebox/Toolbox to OpenOffice .odt file
SYNOPSIS
sh2odt [-s settings_dir] [-c codepage] [-e encs] [-m] infile [outfile]
Converts Shoebox data to OpenOffice format
OPTIONS
-c codepage Set system codepage for this process
-e enc,enc Add Encoding:: subsets in Perl 5.8.1
-m MDF character marker support
-s dir Directory to find .typ files in [.]
If outfile is missing, it is created as the input file with extension replaced by .odt. This allows a user to drop a data file on a shortcut.
DESCRIPTION
sh2odt converts a Shoebox/Toolbox file into an OpenOffice .odt file. To do this it needs to convert data to Unicode. It also converts interlinear text into character level frames whereby each frame contains a single interlinear block and is treated by the system as if it were a character. It can then be copied and pasted into tables, reflowed like normal text, etc.
Using sh2odt involves two aspects: preparing for conversion in terms of giving information about encoding conversion and even XML template output; and running the program, knowing what command line option does what. This manual is not a tutorial and so we list all the details with little or no indication of relative priority.
Running sh2odt
Here we list the various command line options and give further details on each
- -c
-
Specifies the default codepage to be used when converting data. In effect it specifies that sh2xml should act as though it were running on a system with the given default codepage. This means that data in languages with no given encoding conversion will be converted using this codepage.
- -e
-
Perl has internal support for a large number of industry standard encodings. This option specifies which sets to pull in apart from the default set. Values include
Byte - standard ISO 8859 type single byte encodings CN - Continental China encodings including cp 936, GB 12345 and GB 2312 JP - Japanese encodings including cp 932 and ISO 2022 KR - Korean encodings including cp 949 TW - Taiwanese encodings including cp 950 HanExtra - more Chinese encodings including GB 18030 JIS2K - More Japanese encodings Ebcdic - surely not! Symbols - various symbol encodings
See man Encode::Supported or the corresponding module documentation for details of what is supported on your Perl installation.
- -m
-
MDF and perhaps other schemas support the ability to use inline markers of the form
|mk{
text}
. sh2odt has the ability to work with these schemes. Data marked in such a way is output with a character style of the given marker's name. - -s
-
sh2xml requires access to information about the structure of the database and language information. This is held in files in the same directory as the
.prj
project file used when running Shoebox/Toolbox.
Preparing for Conversion
The basic need is to be able to specify how to convert text in a particular language into Unicode. This can be done by specifying a conversion mapping in each language file. Shoebox and Toolbox do not have a UI for specifying such conversion information, so we add information to the options/description field. The codepage specification takes the form:
\codepage = value
The specification needs to be on a line on its own. The value can take a number of forms.
- name
-
A mapping name either from the set of names supported by the Perl Encode module, or specified in an SIL Converters repository.
- filename.tec
-
The path and filename of a TECkit binary mapping file. The path is relative to the settings directory.
- none
-
No mapping should be done. The data is assumed to be in UTF-8 encoding.
sh2odt creates styles for each marker and outputs the font used for each marker. If the data has been converted, then the font isn't appropriate to that encoding any more. To specify an appropriate font it is possible to specify this in the description field using
\unicode_font = value
Where value is the font name to be used for the Unicode form of the data.