NAME
MKDoc::Text::Structured - Another Text to HTML module
SYNOPSIS
my $text = some_structured_text();
my $html = MKDoc::Text::Structured::process ($text);
SUMMARY
MKDoc::Text::Structured is a library which allows simple syntaxic text construct to be turned into HTML. These constructs are the ones you would be using when writing a text email or newsgroup message.
MKDoc::Text::Structured follows the KISS philosophy. Comparing with similar modules which try to implement as many HTML constructs as possible, this module is incredibly conservative.
Block level elements
P
Paragraphs are defined by blocks of text separated by one or more empty lines.
The text:
This is a paragraph,
until it meets an empty line.
This is another paragraph.
Would become:
<p>This is a paragraph,
until it meets an empty line.</p>
<p>This is another paragraph.</p>
H1, H2, H3
Headlines are really just like a paragraph, except that they have the following syntax:
The text:
==========
Headline 1
==========
Headline 2
==========
Headline 3
----------
Would become:
<h1>Headline 1</h1>
<h2>Headline 2</h2>
<h3>Headline 3</h3>
The advantage in treating headlines just like paragraph is that multi-line headlines are no problem. Also, it means you can use *strong* and _emphasized_ within a headline (see STRONG and EM sections).
PRE
Pre-formatted text looks like a paragraph, except that it must be indented with at least one space character.
The text:
This is a paragraph,
until it meets an empty line.
But this is pre-formatted text.
Hey Hey Ho Ho!
This is another paragraph.
Would become:
<p>This is a paragraph,
until it meets an empty line.</p>
<pre>But this is pre-formatted text.
Hey Hey Ho Ho!</pre>
<p>This is another paragraph.</p>
Again, you can use *strong* and _emphasized_ within pre-formatted text (see STRONG and EM sections).
Inline Elements
STRONG
The text:
This is *strong text*
Would become:
<p>This is <strong>strong text</strong></p>
Note 1: The star character will act as a 'strong' marker only when:
- The "opening" star is preceded by whitespace or carriage return,
- The "closing" star is followed by whitespace or carriage return, or punctuation immediately followed by whitespace or carriage return.
In other words, you can write 3*3*2 = 18 safely. The module tries to follow the DWIM ("Do What I Mean") philosophy as much as possible.
Note 2: This can only work within one block level element. It will not work across paragraphs or lists (See UL, LI and OL, LI sections).
Example 1:
* Hello, *I will not
* be bold*
* but
* *I will be*
Example 2:
This is a paragraph. *Nothing in this paragraph
is going to be bold.
Nor in this one*.
EM
The text:
This is _emphasized text_
Would become:
<p>This is <em>emphasized text</em></p>
Same notes as for bold / strong text also applied for emphasized text.
Entity substitution
Characters that would otherwise be interpreted as XML are encoded. i.e. &, < and > become & < and >
Additionally some standard typed versions of special characters are substituted with a richer and better-looking HTML entity:
-- surrounded by whitespace becomes —
- surrounded by whitespace becomes –
... becomes …
(tm) becomes ™
(r) becomes ®
(c) becomes ©
x between numbers becomes ×
'' surrounding text becomes ‘ ’
"" surrounding text becomes “ ”
Nested Structures
BLOCKQUOTE
Quoted text is text that starts with a 'greater than' character and followed by a space on each line.
> > Hey, that's pretty cool!
> Well, sort-of
I think it's pretty cool...
Would become:
<blockquote><blockquote><p>Hey, that's pretty cool!</p></blockquote>
<p>Well, sort-of</p></blockquote>
<p>I think it's pretty cool...</p>
UL, LI
Ordered lists and unordered lists can be constructed and nested:
The text:
* An item
* Another item
* Headlines work too
==================
I can write *paragraphs within lists*.
And even _pre-formatted text_!
- Also, I can have sub-lists
- That's no problem
- Notice that '*' and '-' have the same meaning.
It's just syntaxic sugar, really :-)
Would become:
<ul><li><p>An item</p></li>
<li><p>Another item</p></li>
<li><h2>Headlines work too</h2>
<p>I can write <strong>paragraphs within lists</strong>.</p>
<pre>And even <em>pre-formatted text</em>!</pre>
<ul><li><p>Also, I can have sub-lists</p></li>
<li><p>That's no problem</p></li>
<li><p>Notice that '*' and '-' have the same meaning.
It's just syntaxic sugar, really :-)</p></li></ul></li></ul>
OL, LI
Un-ordered lists and unordered lists can be constructed and nested:
The text:
1. An item
2. Another item
3. Headlines work too
==================
* An un-ordered list
* Can be nested
* It should all work nicely.
Would become:
<ol><li><p>An item</p></li>
<li><p>Another item</p></li>
<li><h2>Headlines work too</h2>
<ul><li><p>An un-ordered list</p></li>
<li><p>Can be nested</p></li>
<li><p>It should all work nicely.</p></li></ul></li></ol>
Hyperlinks
This module uses URI::Find to locate URIs such as http://mkdoc.com/ and turn them into clickable links.
Add rel="nofollow" attributes to <a> tags like so:
local $MKDoc::Text::Structured::Inline::NoFollow = 1;
Additionally, once the XHTML fragment is produced, you could use MKDoc::XML::Tagger to hyperlink it against a glossary of hyperlinks.
Smilies
Basic smilies such as :-) and :-( are wrapped in a CSS class:
<span class="smiley-happy">:-)</span>
<span class="smiley-sad">:-(</span>
Long Words
Long words are split up into fragments separated by spaces if the length exceeds a 78 character default.
Change the default length using a package variable:
local $MKDoc::Text::Structured::Inline::LongestWord = 12;
Disable this fuctionality by setting a value of 0.
AUTHOR
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.
SEE ALSO
MKDoc: http://www.mkdoc.com/
Help us open-source MKDoc. Join the mkdoc-modules mailing list:
mkdoc-modules@lists.webarch.co.uk