NAME
HTML::FromText - mark up text as HTML
SYNOPSIS
use HTML::FromText;
print text2html($text, urls => 1, paras => 1, headings => 1);
DESCRIPTION
The text2html
function marks up plain text as HTML. By default it expands tabs and converts HTML metacharacters into the corresponding entities. More complicated transformations, such as splitting the text into paragraphs or marking up bulleted lists, can be carried out by setting the appropriate options.
SUMMARY OF OPTIONS
These options always apply:
metachars Convert HTML metacharacters to entity references
urls Convert URLs to links
email Convert email addresses to links
bold Mark up words with *asterisks* in bold
underline Mark up words with _underscores_ as underlined
You can then choose to treat the text according to one of these options:
pre Treat text as preformatted
lines Treat text as line-oriented
paras Treat text as paragraph-oriented
(If more than one of these is specified, pre
takes precedence over lines
which takes precedence over paras
.) The following option applies when the lines
option is specified:
spaces Preserve spaces from the original text
The following options apply when the paras
option is specified:
blockparas Mark up indented paragraphs as block quote
blockquotes Ditto, also preserve lines from original
blockcode Ditto, also preserve spaces from original
bullets Mark up bulleted paragraphs as unordered list
headings Mark up headings
numbers Mark up numbered paragraphs as ordered list
tables Mark up tables
title Mark up first paragraph as level 1 heading
OPTIONS
- blockparas
- blockquotes
- blockcode
-
These options cause to
text2html
to spot paragraphs where every line begins with whitespace, and mark them up as block quotes. If more than one of these options is specified,blockparas
takes precedence overblockcode
, which takes precedence overblockquotes
. All three options are ignored unless theparas
option is also set.The
blockparas
option marks up the paragraph as a block quote with no other changes. For example,Turing wrote, I propose to consider the question, "Can machines think?"
becomes
<P>Turing wrote,</P> <BLOCKQUOTE>I propose to consider the question, "Can machines think?"</BLOCKQUOTE>
The
blockquotes
option preserves line breaks in the original text. For example,From "The Waste Land": Phlebas the Phoenecian, a fortnight dead, Forgot the cry of gulls, and the deep sea swell
becomes
<P>From "The Waste Land":</P> <BLOCKQUOTE>Phlebas the Phoenecian, a fortnight dead,<BR> Forgot the cry of gulls, and the deep sea swell</BLOCKQUOTE>
The
blockcode
option preserves line breaks and spaces in the original text and renders the paragraph in a fixed-width font. For example:Here's how to output numbers with commas: sub commify { local $_ = shift; 1 while s/^(-?\d+)(\d{3})/$1,$2/; $_; }
becomes
<P>Here's how to output numbers with commas:</P> <BLOCKQUOTE><TT>sub commify {<BR> local $_ = shift;<BR> 1 while s/^(-?\d+)(\d{3})/$1,$2/;<BR> $_;<BR> }</TT></BLOCKQUOTE>
- bold
-
Words surrounded with asterisks are marked up in bold, so
*abc*
becomes<B>abc</B>
. - bullets
-
Spots bulleted paragraphs (beginning with optional whitespace, an asterisk or hyphen, and whitespace) and marks them up as an unordered list. Bulleted paragraphs don't have to be separated by blank lines. For example,
Shopping list: * apples * pears
becomes
<P>Shopping list:</P> <UL><LI><P>apples</P> <LI><P>pears</P> </UL>
This option is ignored unless the
paras
option is set. -
Spots email addresses in the text and converts them to links. For example
Mail me at web@perl.com.
becomes
Mail me at <TT><A HREF="mailto:web@perl.com">web@perl.com</A></TT>.
- headings
-
Spots headings (paragraphs starting with numbers) and marks them up as headings of the appropriate level. For example,
1. Introduction 1.1 Background 1.1.1 Previous work 2. Conclusion
becomes
<H1>1. Introduction</H1> <H2>1.1 Background</H2> <H3>1.1.1 Previous work</H3> <H1>2. Conclusion</H1>
This option is ignored unless the
paras
option is set. - lines
-
Formats the text so as to preserve line breaks. For example,
Line 1 Line 2
becomes
Line 1<BR> Line 2
If two or more of the options
pre
,lines
andparas
are set, thenpre
takes precedence overlines
, which takes precedence overparas
. - metachars
-
Converts HTML metacharacters into their corresponding entity references. Ampersand (
&
) becomes&
, less than (<
) becomes<
, greater than (>
) becomes>
, and quote (") becomes"
. This option is 1 by default. - numbers
-
Spots numbered paragraphs (beginning with whitespace, digits, an optional period/parenthesis/bracket, and whitespace) and marks them up as an ordered list. Numbered paragraphs don't have to be separated by blank lines. For example,
To do: 1. Write thesis 2. Submit it 3. Celebrate
becomes
<P>To do:</P> <OL><LI VALUE="1"><P>Write thesis</P> <LI VALUE="2"><P>Submit it</P> <LI VALUE="3"><P>Celebrate</P> </OL>
This option is ignored unless the
paras
option is set. - paras
-
Format the text into paragraphs. Paragraphs are separated by one or more blank lines. For example,
Paragraph 1 Paragraph 2
becomes
<P>Paragraph 1</P> <P>Paragraph 2</P>
If two or more of the options
pre
,lines
andparas
are set, thenpre
takes precedence overlines
, which takes precedence overparas
. - pre
-
Wrap the text in a
<PRE>
element. For example,preformatted text
becomes
<PRE>preformatted text</PRE>
If two or more of the options
pre
,lines
andparas
are set, thenpre
takes precedence overlines
, which takes precedence overparas
. - spaces
-
Preserves spaces throughout the text. For example,
Line 1 Line 2 Line 3
becomes
Line 1 Line 2 Line 3
This option is ignored unless the
lines
option is set. - tables
-
Spots tables and marks them up appropriately. Columns must be separated by two or more spaces (this prevents accidental incorrect recognition of a paragraph where interword spaces happen to line up). If there are two or more rows in a paragraph and all rows share the same set of (two or more) columns, the paragraph is assumed to be a table. For example
-e File exists. -z File has zero size. -s File has nonzero size (returns size).
becomes
<P><TABLE> <TR><TD>-e</TD><TD>File exists.</TD></TR> <TR><TD>-z</TD><TD>File has zero size.</TD></TR> <TR><TD>-s</TD><TD>File has nonzero size (returns size).</TD></TR> </TABLE></P>
text2html
guesses for each column whether it is intended to be left, centre or right aligned.This option is ignored unless the
paras
option is set. - title
-
Formats the first paragraph of the text as a first-level heading. For example,
Paragraph 1 Paragraph 2
becomes
<H1>Paragraph 1</H1> <P>Paragraph 2</P>
This option is ignored unless the
paras
option is set. - underline
-
Words surrounded with underscores are marked up with underline, so
_abc_
becomes<U>abc</U>
. - urls
-
Spots Uniform Resource Locators (URLs) in the text and converts them to links. For example
See https://perl.com/.
becomes
See <TT><A HREF="https://perl.com/">https://perl.com/</A></TT>.
SEE ALSO
The HTML::Entities
module (part of the LWP package) provides functions for encoding and decoding HTML entities.
Seth Golub's txt2html
utility does everything that HTML::FromText
does, and a few things that it would like to do. See http://www.thehouse.org/txt2html/
.
RFC 822: "Standard for the Format of ARPA Internet Text Messages" describes the syntax of email addresses (the more esoteric features of structured field bodies, in particular quoted-strings, domain literals and comments, are not recognized by HTML::FromText
). See ftp://src.doc.ic.ac.uk/rfc/rfc822.txt
.
RFC 1630: "Universal Resource Identifiers in WWW" lists the protocols that may appear in URLs. HTML::FromText
also recognizes "https:", but ignores "file:" because experience suggests that it results in too many false positives. See ftp://src.doc.ic.ac.uk/rfc/rfc1630.txt
.
AUTHOR
Gareth Rees <garethr@cre.canon.co.uk>
.
COPYRIGHT
Copyright (c) 1999 Canon Research Centre Europe. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.