NAME
LaTeX::ToUnicode::Tables - Character tables for LaTeX::ToUnicode
VERSION
version 0.54
CONSTANTS
@LIGATURES
Standard TeX character sequences (not \commands) which need to be replaced: ---
with U+2014 (em dash), etc. Includes: em dash, en dash, inverted exclamation, inverted question, left double quote, right double quote, left single quote, right single quote. They are replaced in that order.
%MARKUPS
Hash where keys are the names of formatting commands like \tt
, without the backslash, namely: bf cal em it rm sc sf sl small tt
. Values are the obvious HTML equivalent where one exists, given as the tag name without the angle brackets: b em i tt
. Otherwise the value is the empty string.
%ARGUMENT_COMMANDS
Hash where keys are the names of TeX commands taking arguments that we handle, without the backslash, such as enquote
. Each value is a reference to a list of two strings, the first being the text to insert before the argument, the second being the text to insert after. For example, for enquote
the value is ["`", "'"]
. The inserted text is subject to further replacements.
Only three such commands are currently handled: \emph
, \enquote
, and \path
.
%CONTROL_SYMBOLS
A hash where the keys are non-alphabetic \command
s (without the backslash), other than accents and special cases. These don't take arguments. Although some of these have Unicode equivalents, such as the \,
thin space, it seems better to keep the output as simple as possible; small spacing tweaks in TeX aren't usually desirable in plain text or HTML.
The values are single-quoted strings '\x{...}'
, not double-quoted literal characters <"\x{...}">, to ease future parsing of the TeX/text/HTML.
This hash is necessary because TeX's parsing rules for control symbols are different from control words: no space or other token is needed to terminate control symbols.
%CONTROL_WORDS
Keys are names of argument-less commands, such as \LaTeX
(without the backslash). Values are the replacements, often the empty string.
%SYMBOLS
Keys are the commands for extended characters, such as \AA
(without the backslash.)
%ACCENT_SYMBOLS
Two-level hash of accented characters like \'{a}
. The keys of this hash are the accent symbols (without the backslash), such as `
and '
. The corresponding values are hash references where the keys are the base letters and the values are single-quoted '\x{....}'
strings.
%ACCENT_LETTERS
Same as %ACCENT_SYMBOLS, except the keys are accents that are alphabetic, such as \c
(without the backslash as always).
As with control sequences, it's necessary to distinguish symbols and alphabetic commands because of the different parsing rules.
%GERMAN
Character sequences (not necessarily commands) as defined by the package `german'/`ngerman', e.g. "a
(a with umlaut), "s
(german sharp s) or "`"
(german left quote). Note the missing backslash.
The keys of this hash are the literal character sequences.
AUTHOR
Gerhard Gossen <gerhard.gossen@googlemail.com>, Boris Veytsman <boris@varphi.com>, Karl Berry <karl@freefriends.org>
https://github.com/borisveytsman/bibtexperllibs
COPYRIGHT AND LICENSE
Copyright 2010-2024 Gerhard Gossen, Boris Veytsman, Karl Berry
This is free software; you can redistribute it and/or modify it under the same terms as the Perl5 programming language system itself.