NAME

LaTeX::ToUnicode::Tables - Character tables for LaTeX::ToUnicode

VERSION

version 0.53

CONSTANTS

@LIGATURES

Standard TeX character sequences (not \commands) which need to be replaced: --- with U+2014 (em dash), etc. Includes: em dash, en dash, inverted exclamation, inverted question, left double quote, right double quote, left single quote, right single quote. They are replaced in that order.

%MARKUPS

Hash where keys are the names of formatting commands like \tt, without the backslash, namely: bf cal em it rm sc sf sl small tt. Values are the obvious HTML equivalent where one exists, given as the tag name without the angle brackets: b em i tt. Otherwise the value is the empty string.

%ARGUMENT_COMMANDS

Hash where keys are the names of TeX commands taking arguments that we handle, without the backslash, such as enquote. Each value is a reference to a list of two strings, the first being the text to insert before the argument, the second being the text to insert after. For example, for enquote the value is ["`", "'"]. The inserted text is subject to further replacements.

Only three such commands are currently handled: \emph, \enquote, and \path.

%CONTROL_SYMBOLS

A hash where the keys are non-alphabetic \commands (without the backslash), other than accents and special cases. These don't take arguments. Although some of these have Unicode equivalents, such as the \, thin space, it seems better to keep the output as simple as possible; spacing tweaks in the TeX aren't usually desirable in plain text or HTML.

The values are single-quoted strings '\x{...}', not double-quoted literal characters <"\x{...}">, to ease future parsing of the TeX/text/HTML.

This hash is necessary because TeX's parsing rules for control symbols are different from control words: no space or other token is needed to terminate control symbols.

%CONTROL_WORDS

Keys are names of argument-less commands, such as \LaTeX (without the backslash). Values are the replacements, often the empty string.

%SYMBOLS

Keys are the commands for extended characters, such as \AA (without the backslash.)

%ACCENT_SYMBOLS

Two-level hash of accented characters like \'{a}. The keys of this hash are the accent symbols (without the backslash), such as ` and '. The corresponding values are hash references where the keys are the base letters and the values are single-quoted '\x{....}' strings.

%ACCENT_LETTERS

Same as %ACCENT_SYMBOLS, except the keys are accents that are alphabetic, such as \c (without the backslash as always).

As with control sequences, it's necessary to distinguish symbols and alphabetic commands because of the different parsing rules.

%GERMAN

Character sequences (not necessarily commands) as defined by the package `german'/`ngerman', e.g. "a (a with umlaut), "s (german sharp s) or "`" (german left quote). Note the missing backslash.

The keys of this hash are the literal character sequences.

AUTHOR

Gerhard Gossen <gerhard.gossen@googlemail.com>, Boris Veytsman <boris@varphi.com>, Karl Berry <karl@freefriends.org>

https://github.com/borisveytsman/bibtexperllibs

COPYRIGHT AND LICENSE

Copyright 2010-2023 Gerhard Gossen, Boris Veytsman, Karl Berry

This is free software; you can redistribute it and/or modify it under the same terms as the Perl5 programming language system itself.