NAME

HTML::Toc - Generate, insert and update HTML Table of Contents.

DESCRIPTION

Generate, insert and update HTML Table of Contents.

Introduction

The HTML::Toc consists out of the following packages:

HTML::Toc
HTML::TocGenerator
HTML::TocInsertor
HTML::TocUpdator

HTML::Toc is the object which will eventually hold the Table of Contents. HTML::TocGenerator does the actual generation of the ToC. HTML::TocInsertor handles the insertion of the ToC in the source. HTML::TocUpdator takes care of updating previously inserted ToCs.

HTML::Parser is the base object of HTML::TocGenerator, HTML::TocInsertor and HTML::TocUpdator. Each of these objects uses its predecessor as its ancestor, as shown in the UML diagram underneath:

+---------------------+
|    HTML::Parser     |
+---------------------+
+---------------------+
|    +parse()         |
|    +parse_file()    |
+----------+----------+
          /_\
           |
+----------+----------+  <<uses>>  +-----------+
| HTML::TocGenerator  + - - - - - -+ HTML::Toc |
+---------------------+            +-----------+
+---------------------+            +-----------+
| +extend()           |            | +clear()  |
| +extendFromFile()   |            | +format() |
| +generate()         |            +-----+-----+
| +generateFromFile() |                  :
+----------+----------+                  :
          /_\                            :
           |                             :
+----------+----------+     <<uses>>     :
|  HTML::TocInsertor  + - - - - - - - - -+
+---------------------+                  :
+---------------------+                  :
|  +insert()          |                  :
|  +insertIntoFile()  |                  :
+----------+----------+                  :
          /_\                            :
           |                             :
+----------+----------+     <<uses>>     :
|  HTML::TocUpdator   + - - - - - - - - -+
+---------------------+
+---------------------+
|  +insert()          |
|  +insertIntoFile()  |
|  +update()          |
|  +updateFile()      |
+---------------------+

When generating a ToC you'll have to decide which object you want to use:

TocGenerator:
    for generating a ToC without inserting the ToC into the source
TocInsertor:
    for generating a ToC and inserting the ToC into the source
TocUpdator:
    for generating and inserting a ToC, removing any previously
    inserted ToC elements

Thus in tabular view, each object is capable of:

               generating   inserting   updating
               ---------------------------------
TocGenerator        X
TocInsertor         X           X
TocUpdator          X           X           X

Generating

The code underneath will generate a ToC of the HTML headings <h1>..<h6> from a file index.htm:

use HTML::Toc;
use HTML::TocGenerator;

my $toc          = HTML::Toc->new();
my $tocGenerator = HTML::TocGenerator->new();

$tocGenerator->generateFromFile($toc, 'index.htm');
print $toc->format();

For example, with index.htm containing:

<html>
<body>
   <h1>Chapter</h1>
</body>
</html>

the output will be:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1>Chapter</a>
</ul>
<!-- End of generated Table of Contents -->

Inserting

This code will generate a ToC of HTML headings <h1>..<h6> of file index.htm, and insert the ToC after the <body> tag at the same time:

use HTML::Toc;
use HTML::TocInsertor;

my $toc         = HTML::Toc->new();
my $tocInsertor = HTML::TocInsertor->new();

$tocInsertor->insertIntoFile($toc, 'index.htm');

For example, with index.htm containing:

<html>
<body>
   <h1>Chapter</h1>
</body>
</html>

the output will be:

<html>
<body>
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1>Chapter</a>
</ul>
<!-- End of generated Table of Contents -->

   <a name=h-1><h1>Chapter</h1></a>
</body>
</html>

If you're planning to update the inserted ToC, you'd better use TocUpdator to insert the ToC. TocUpdator marks the inserted ToC elements with update tokens. These update tokens allow TocUpdator to identify and remove the ToC elements during a future update session. This code uses TocUpdator instead of TocInsertor:

use HTML::Toc;
use HTML::TocUpdator;

my $toc        = HTML::Toc->new();
my $tocUpdator = HTML::TocUpdator->new();

$tocUpdator->insertIntoFile($toc, 'index.htm');

When applying the code above on 'index.htm':

<html>
<body>
   <h1>
   Chapter
   </h1>
</body>
</html>

the output will contain additional update tokens:

<!-- #BeginToc -->
<!-- #EndToc -->
<!-- #BeginTocAnchorNameBegin -->
<!-- #EndTocAnchorNameBegin -->
<!-- #BeginTocAnchorNameEnd -->
<!-- #EndTocAnchorNameEnd -->

around the inserted ToC elements:

<html>
<body><!-- #BeginToc-->
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1> Chapter </a>
</ul>
<!-- End of generated Table of Contents -->
<!-- #EndToc -->
   <!-- #BeginTocAnchorNameBegin --><a name=h-1><!-- #EndTocAnchorNameBegin --><h1>
   Chapter
   </h1><!-- #BeginTocAnchorNameEnd --></a><!-- #EndTocAnchorNameEnd -->
</body>
</html>

Instead of HTML::TocUpdator::insertIntoFile you can also use HTML::TocUpdator::updateFile(). HTML::TocUpdator::updateFile() will also insert the ToC, whether there is a ToC already inserted or not.

Updating

This code will generate a ToC of HTML headings <h1>..<h6> of file indexToc.htm, and insert or update the ToC after the <body> tag at the same time:

use HTML::Toc;
use HTML::TocUpdator;

my $toc        = HTML::Toc->new();
my $tocUpdator = HTML::TocUpdator->new();

$tocUpdator->updateFile($toc, 'indexToc.htm');

For example, with indexToc.htm containing:

<html>
<body><!-- #BeginToc -->
foo
<!-- #EndToc -->
   <!-- #BeginTocAnchorNameBegin -->bar<!-- #EndTocAnchorNameBegin --><h1>
   Chapter
   </h1><!-- #BeginTocAnchorNameEnd -->foo<!-- #EndTocAnchorNameEnd -->
</body>h
</html>

the output will be:

<html>
<body><!-- #BeginToc -->
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1> Chapter </a>
</ul>
<!-- End of generated Table of Contents -->
<!-- #EndToc -->
   <!-- #BeginTocAnchorNameBegin --><a name=h-1><!-- #EndTocAnchorNameBegin --><h1>
   Chapter
   </h1><!-- #BeginTocAnchorNameEnd --></a><!-- #EndTocAnchorNameEnd -->
</body>
</html>

All text between the update tokens will be replaced. So be warned: all manual changes made to text between update tokens will be removed unrecoverable after calling HTML::TocUpdator::update() or HTML::TocUpdator::updateFile().

Formatting

The ToC isn't generated all at once. There are two stages involved: generating and formatting. Generating the ToC actually means storing a preliminary ToC in HTML::Toc->{_toc}. This preliminary, tokenized ToC has to be turned into something useful by calling HTML::Toc->format(). For an example, see paragraph 'Generating'.

Advanced

The ToC generation can be modified in a variety of ways. The following paragraphs each explain a single modification. An example of most of the modifications can be found in the manualTest.t test file. Within this test, a manual containing:

preface
introduction
table of contents
table of figures
table of tables
parts
chapters
appendixes
bibliography

is formatted.

Using attribute value as ToC text

Normally, the ToC will be made of text between specified ToC tokens. It's also possible to use the attribute value of a token as a ToC text. This can be done by specifying the attribute marked with an attributeToTocToken within the tokenBegin token. For example, suppose you want to generate a ToC of the alt attributes of the following image tokens:

<body>
   <img src=test1.gif alt="First picture">
   <img src=test2.gif alt="Second picture">
</body>

This would be the code:

use HTML::Toc;
use HTML::TocInsertor;

my $toc         = HTML::Toc->new();
my $tocInsertor = HTML::TocInsertor->new();

$toc->setOptions({
   'tokenToToc'   => [{
      'groupId'    => 'image',
      'tokenBegin' => '<img alt=@>'
   }],
});
$tocInsertor->insertIntoFile($toc, $filename);

and the output will be:

<body>
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#image-1>First picture</a>
   <li><a href=#image-2>Second picture</a>
</ul>
<!-- End of generated Table of Contents -->

   <a name=image-1><img src=test1.gif alt="First picture"></a>
   <a name=image-2><img src=test2.gif alt="Second picture"></a>
</body>

Generate single ToC of multiple files

Besides generating a ToC of a single file, it's also possible to generate a single ToC of multiple files. This can be done by specifying either an array of files as the file argument and/or by extending an existing ToC.

Specify an array of files

For example, suppose you want to generate a ToC of both doc1.htm:

<body>
   <h1>Chapter of document 1</h1>
</body>

and doc2.htm:

<body>
   <h1>Chapter of document 2</h1>
</body>

Here's the code to do so by specifying an array of files:

use HTML::Toc;
use HTML::TocGenerator;

my $toc          = HTML::Toc->new();
my $tocGenerator = HTML::TocGenerator->new();

$toc->setOptions({'doLinkToFile' => 1});
$tocGenerator->generateFromFile($toc, ['doc1.htm', 'doc2.htm']);
print $toc->format();

And the output will be:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=doc1.htm#h-1>Chapter of document 1</a>
   <li><a href=doc2.htm#h-2>Chapter of document 2</a>
</ul>
<!-- End of generated Table of Contents -->

Extend an existing ToC

It's also possible to extend an existing ToC. For example, suppose we want the generate a ToC of file doc1.htm:

<body>
   <h1>Chapter of document 1</h1>
</body>

and extend this ToC with text from doc2.htm:

<body>
   <h1>Chapter of document 2</h1>
</body>

Here's the code to do so:

use HTML::Toc;
use HTML::TocGenerator;

my $toc          = HTML::Toc->new();
my $tocGenerator = HTML::TocGenerator->new();

$toc->setOptions({'doLinkToFile' => 1});
$tocGenerator->generateFromFile($toc, 'doc1.htm');
$tocGenerator->extendFromFile($toc, 'doc2.htm');
print $toc->format();

And the output will be:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=doc1.htm#h-1>Chapter of document 1</a>
   <li><a href=doc2.htm#h-2>Chapter of document 2</a>
</ul>
<!-- End of generated Table of Contents -->

Generate multiple ToCs

It's possible to generate multiple ToCs at once by specifying a HTML::Toc object array as the ToC argument. For example, suppose you want to generate a default ToC of HTML headings <h1>..<h6> as well as a ToC of the alt image attributes of the following text:

<body>
   <h1>Header One</h1>
   <img src=test1.gif alt="First picture" id=image_001>
   <h2>Paragraph One</h2>
   <img src=test2.gif alt="Second picture" id=image_002>
</body>

Here's how you would do so:

use HTML::Toc;
use HTML::TocInsertor;

my $toc1        = HTML::Toc->new();
my $toc2        = HTML::Toc->new();
my $tocInsertor = HTML::TocInsertor->new();

$toc2->setOptions({
   'tokenToToc'   => [{
      'groupId'    => 'image',
      'tokenBegin' => '<img alt=@>'
   }],
});
$tocInsertor->insertIntoFile([$toc1, $toc2], $filename);

And the output will be:

<body>
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1>Header One</a>
   <ul>
      <li><a href=#h-1.1>Paragraph One</a>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#image-1>First picture</a>
   <li><a href=#image-2>Second picture</a>
</ul>
<!-- End of generated Table of Contents -->

   <a name=h-1><h1>Header One</h1></a>
   <a name=image-1><img src=test1.gif alt="First picture"></a>
   <a name=h-1.1><h2>Paragraph One</h2></a>
   <a name=image-2><img src=test2.gif alt="Second picture"></a>
</body>

Generate multiple groups in one ToC

You may want to generate a ToC consisting of multiple ToC groups.

Specify an additional 'Appendix' group

Suppose you want to generate a ToC with one group for the normal headings, and one group for the appendix headings, using this source file:

<body>
   <h1>Chapter</h1>
   <h2>Paragraph</h2>
   <h3>Subparagraph</h3>
   <h1>Chapter</h1>
   <h1 class=appendix>Appendix Chapter</h1>
   <h2 class=appendix>Appendix Paragraph</h2>
</body>

With the code underneath:

use HTML::Toc;
use HTML::TocInsertor;

my $toc         = HTML::Toc->new();
my $tocInsertor = HTML::TocInsertor->new();

$toc->setOptions({
   'tokenToToc' => [{
         'tokenBegin' => '<h1 class=-appendix>'
      }, {
         'tokenBegin' => '<h2 class=-appendix>',
         'level'      => 2
      }, {
         'groupId'    => 'appendix',
         'tokenBegin' => '<h1 class=appendix>',
      }, {
         'groupId'    => 'appendix',
         'tokenBegin' => '<h2 class=appendix>',
         'level'      => 2
      }]
});
$tocInsertor->insertIntoFile($toc, $filename);

the output will be:

<body>
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1>Chapter</a>
   <ul>
      <li><a href=#h-1.1>Paragraph</a>
   </ul>
   <li><a href=#h-2>Chapter</a>
</ul>
<ul>
   <li><a href=#appendix-1>Appendix Chapter</a>
   <ul>
      <li><a href=#appendix-1.1>Appendix Paragraph</a>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

   <a name=h-1><h1>Chapter</h1></a>
   <a name=h-1.1><h2>Paragraph</h2></a>
   <h3>Subparagraph</h3>
   <a name=h-2><h1>Chapter</h1></a>
   <a name=appendix-1><h1 class=appendix>Appendix Chapter</h1></a>
   <a name=appendix-1.1><h2 class=appendix>Appendix Paragraph</h2></a>
</body>

Specify an additional 'Part' group

Suppose you want to generate a ToC of a document which is divided in multiple parts like this file underneath:

<body>
   <h1 class=part>First Part</h1>
   <h1>Chapter</h1>
   <h2>Paragraph</h2>
   <h1 class=part>Second Part</h1>
   <h1>Chapter</h1>
   <h2>Paragraph</h2>
</body>

With the code underneath:

use HTML::Toc;
use HTML::TocInsertor;

my $toc         = HTML::Toc->new();
my $tocInsertor = HTML::TocInsertor->new();

$toc->setOptions({
   'doNumberToken'    => 1,
   'tokenToToc' => [{
         'tokenBegin' => '<h1 class=-part>'
      }, {
         'tokenBegin' => '<h2 class=-part>',
         'level'      => 2,
      }, {
         'groupId'        => 'part',
         'tokenBegin'     => '<h1 class=part>',
         'level'          => 1,
         'numberingStyle' => 'upper-alpha'
      }]
});
$tocInsertor->insertIntoFile($toc, $filename);

the output will be:

<body>
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#part-A>First Part</a>
</ul>
<ul>
   <li><a href=#h-1>Chapter</a>
   <ul>
      <li><a href=#h-1.1>Paragraph</a>
   </ul>
</ul>
<ul>
   <li><a href=#part-B>Second Part</a>
</ul>
<ul>
   <li><a href=#h-2>Chapter</a>
   <ul>
      <li><a href=#h-2.1>Paragraph</a>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

   <a name=part-A><h1 class=part>A &nbsp;First Part</h1></a>
   <a name=h-1><h1>1 &nbsp;Chapter</h1></a>
   <a name=h-1.1><h2>1.1 &nbsp;Paragraph</h2></a>
   <a name=part-B><h1 class=part>B &nbsp;Second Part</h1></a>
   <a name=h-2><h1>2 &nbsp;Chapter</h1></a>
   <a name=h-2.1><h2>2.1 &nbsp;Paragraph</h2></a>
</body>

Number ToC entries

By default, the generated ToC will list its entries unnumbered. If you want to number the ToC entries, two options are available. Either you can specify a numbered list by modifying templateLevelBegin and templateLevelEnd. Or when the ToC isn't a simple numbered list, you can use the numbers generated by HTML::TocGenerator.

Specify numbered list

By modifying templateLevelBegin and templateLevelEnd you can specify a numbered ToC:

use HTML::Toc;
use HTML::TocGenerator;

my $toc          = HTML::Toc->new();
my $tocGenerator = HTML::TocGenerator->new();

$toc->setOptions({
    'templateLevelBegin' => '"<ol>\n"',
    'templateLevelEnd'   => '"</ol>\n"',
});
$tocGenerator->generateFromFile($toc, 'index.htm');
print $toc->format();

For instance with the original file containing:

<body>
    <h1>Chapter</h1>
    <h2>Paragraph</h2>
</body>

The formatted ToC now will contain ol instead of ul tags:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ol>
   <li><a href=#h-1>Chapter</a>
   <ol>
      <li><a href=#h-1.1>Paragraph</a>
   </ol>
</ol>
<!-- End of generated Table of Contents -->

See also: Using CSS for ToC formatting.

Use generated numbers

Instead of using the HTML ordered list (OL), it's also possible to use the generated numbers to number to ToC nodes. This can be done by modifying templateLevel:

use HTML::Toc;
use HTML::TocGenerator;

my $toc          = HTML::Toc->new();
my $tocGenerator = HTML::TocGenerator->new();

$toc->setOptions({
   'templateLevel' => '"<li>$node &nbsp;$text\n"',
});
$tocGenerator->generateFromFile($toc, 'index.htm');
print $toc->format();

For instance with the original file containing:

<body>
    <h1>Chapter</h1>
    <h2>Paragraph</h2>
</body>

The formatted ToC now will have the node numbers hard-coded:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li>1 &nbsp;<a href=#h-1>Chapter</a>
   <ul>
      <li>1.1 &nbsp;<a href=#h-1.1>Paragraph</a>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

See also: Using CSS for ToC formatting.

Using CSS for ToC formatting

Suppose you want to display a ToC with upper-alpha numbered appendix headings. To accomplish this, you can specify a CSS style within the source document:

<html>
<head>
   <style type="text/css">
      ol.toc_appendix1 { list-style-type: upper-alpha }
   </style>
</head>
<body>
   <h1 class=appendix>Appendix</h1>
   <h2 class=appendix>Appendix Paragraph</h2>
   <h1 class=appendix>Appendix</h1>
   <h2 class=appendix>Appendix Paragraph</h2>
</body>
</html>

Here's the code:

my $toc          = new HTML::Toc;
my $tocInsertor  = new HTML::TocInsertor;

$toc->setOptions({
   'templateLevelBegin'   => '"<ol class=toc_$groupId$level>\n"',
   'templateLevelEnd'     => '"</ol>\n"',
   'doNumberToken'        => 1,
   'tokenToToc' => [{
         'groupId'        => 'appendix',
         'tokenBegin'     => '<h1>',
         'numberingStyle' => 'upper-alpha'
      }, {
         'groupId'        => 'appendix',
         'tokenBegin'     => '<h2>',
         'level'          => 2,
     }]
});
$tocInsertor->insertIntoFile($toc, $filename);

Which whill result in the following output:

<html>
<head>
   <style type="text/css">
      ol.toc_appendix1 { list-style-type: upper-alpha }
   </style>
</head>
<body>
<!-- Table of Contents generated by Perl - HTML::Toc -->
<ol class=toc_appendix1>
   <li><a href=#appendix-A>Appendix</a>
   <ol class=toc_appendix2>
      <li><a href=#appendix-A.1>Appendix Paragraph</a>
   </ol>
   <li><a href=#appendix-B>Appendix</a>
   <ol class=toc_appendix2>
      <li><a href=#appendix-B.1>Appendix Paragraph</a>
   </ol>
</ol>
<!-- End of generated Table of Contents -->

   <a name=appendix-A><h1>A &nbsp;Appendix</h1></a>
   <a name=appendix-A.1><h2>A.1 &nbsp;Appendix Paragraph</h2></a>
   <a name=appendix-B><h1>B &nbsp;Appendix</h1></a>
   <a name=appendix-B.1><h2>B.1 &nbsp;Appendix Paragraph</h2></a>
</body>
</html>

Creating site map

Suppose you want to generate a table of contents of the <title> tags of the files in the following directory structure:

path               file

.                  index.htm, <title>Main</title>
|- SubDir1         index.htm, <title>Sub1</title>
|  |- SubSubDir1   index.htm, <title>SubSub1</title>
|
|- SubDir2         index.htm, <title>Sub2</title>
|  |- SubSubDir1   index.htm, <title>SubSub1</title>
|  |- SubSubDir2   index.htm, <title>SubSub2</title>
|
|- SubDir3         index.htm, <title>Sub3</title>

By specifying 'fileSpec' which determine how many slashes (/) each file may contain for a specific level:

use HTML::Toc;
use HTML::TocGenerator;
use File::Find;

my $toc          = HTML::Toc->new;
my $tocGenerator = HTML::TocGenerator->new;
my @fileList;

sub wanted {
      # Add file to 'fileList' if extension matches '.htm'
   push (@fileList, $File::Find::name) if (m/\.htm$/);
}

$toc->setOptions({
   'doLinkToFile'       => 1,
   'templateAnchorName' => '""',
   'templateAnchorHref' => '"<a href=$file"."#".$groupId.$level.">"',
   'doLinkTocToToken'   => 1,
   'tokenToToc'         => [{
      'groupId'         => 'dir',
      'level'           => 1,
      'tokenBegin'      => '<title>',
      'tokenEnd'        => '</title>',
      'fileSpec'        => '\./[^/]+$'
   }, {
      'groupId'         => 'dir',
      'level'           => 2,
      'tokenBegin'      => '<title>',
      'tokenEnd'        => '</title>',
      'fileSpec'        => '\./[^/]+?/[^/]+$'
   }, {
      'groupId'         => 'dir',
      'level'           => 3,
      'tokenBegin'      => '<title>',
      'tokenEnd'        => '</title>',
      'fileSpec'        => '\./[^/]+?/[^/]+?/[^/]+$'
   }]
});

   # Traverse directory structure
find({wanted => \&wanted, no_chdir => 1}, '.');
   # Generate ToC of case-insensitively sorted file list
$tocGenerator->extendFromFile(
   $toc, [sort {uc($a) cmp uc($b)} @fileList]
);
print $toc->format();

the following ToC will be generated:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=./index.htm#>Main</a>
   <ul>
      <li><a href=./SubDir1/index.htm#>Sub1</a>
      <ul>
         <li><a href=./SubDir1/SubSubDir1/index.htm#>SubSub1</a>
      </ul>
      <li><a href=./SubDir2/index.htm#>Sub2</a>
      <ul>
         <li><a href=./SubDir2/SubSubDir1/index.htm#>SubSub1</a>
         <li><a href=./SubDir2/SubSubDir2/index.htm#>SubSub2</a>
      </ul>
      <li><a href=./SubDir3/index.htm#>Sub3</a>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

Methods

HTML::Toc::clear()

syntax:  $toc->clear()
returns: --

Clear the ToC.

HTML::Toc::format()

syntax:  $scalar = $toc->format()
returns: Formatted ToC.

Format tokenized ToC.

HTML::TocGenerator::extend()

syntax:  $tocGenerator->extend($toc, $string [, $options])
args:    - $toc:     (reference to array of) HTML::Toc object(s) to extend
         - $string:  string to retrieve ToC from
         - $options: hash reference containing generator options.

Extend ToC from specified string. For available options, see Parser Options

HTML::TocGenerator::extendFromFile()

syntax:  $tocGenerator->extendFromFile($toc, $filename [, $options])
args:    - $toc:      (reference to array of) HTML::Toc object(s) to extend
         - $filename: (reference to array of) file(s) to extend ToC from
         - $options:  hash reference containing generator options.

Extend ToC from specified file. For available options, see Parser Options. For an example, see "Extend an existing ToC".

HTML::TocGenerator::generate()

syntax:  $tocGenerator->generate($toc, $string [, $options])
args:    - $toc:     (reference to array of) HTML::Toc object(s) to generate
         - $string:  string to retrieve ToC from
         - $options: hash reference containing generator options.

Generate ToC from specified string. Before generating, the ToC will be cleared. For extending an existing ToC, use the HTML::TocGenerator::extend() method. For available options, see Parser Options.

HTML::TocGenerator::generateFromFile()

syntax:  $tocGenerator->generateFromFile($toc, $filename [, $options])
args:    - $toc:      (reference to array of) HTML::Toc object(s) to 
                      generate
         - $filename: (reference to array of) file(s) to generate ToC from
         - $options:  hash reference containing generator options.

Generate ToC from specified file. Before generating, the ToC will be cleared. For extending an extisting ToC, use the HTML::TocGenerator::extendFromFile() method. For available options, see Parser Options.

HTML::TocInsertor::insert()

syntax:  $tocInsertor->insert($toc, $string [, $options])
args:    - $toc:     (reference to array of) HTML::Toc object(s) to insert
         - $string:  string to insert ToC in
         - $options: hash reference containing insertor options.

Insert ToC into specified string. For available options, see Parser Options.

HTML::TocInsertor::insertIntoFile()

syntax:  $tocInsertor->insertIntoFile($toc, $filename [, $options])
args:    - $toc:      (reference to array of) HTML::Toc object(s) to insert
         - $filename: (reference to array of) file(s) to insert ToC in
         - $options:  hash reference containing insertor options.

Insert ToC into specified file. For available options, see Parser Options.

HTML::TocUpdator::insert()

syntax:  $tocUpdator->insert($toc, $string [, $options])
args:    - $toc:     (reference to array of) HTML::Toc object(s) to insert
         - $string:  string to insert ToC in
         - $options: hash reference containing updator options.

Insert ToC into specified string. Differs from HTML::TocInsertor::insert() in that inserted text will be surrounded with update tokens in order for HTML::TocUpdator to be able to update this text the next time an update is issued. For updator options, see HTML::TocUpdator Options.

HTML::TocUpdator::insertIntoFile()

syntax:  $tocUpdator->insertIntoFile($toc, $filename [, $options])
args:    - $toc:      (reference to array of) HTML::Toc object(s) to insert
         - $filename: (reference to array of) file(s) to insert ToC in
         - $options:  hash reference containing updator options.

Insert ToC into specified file. Differs from HTML::TocInsertor::insert() in that inserted text will be surrounded with update tokens in order for HTML::TocUpdator to be able to update this text the next time an update is issued. For updator options, see HTML::TocUpdator Options.

HTML::TocUpdator::update()

syntax:  $tocUpdator->update($toc, $string [, $options])
args:    - $toc:     (reference to array of) HTML::Toc object(s) to insert
         - $string:  string to update ToC in
         - $options: hash reference containing updator options.

Update ToC within specified string. For updator options, see HTML::TocUpdator Options.

HTML::TocUpdator::updateFile()

syntax:  $tocUpdator->updateFile($toc, $filename [, $options])
args:    - $toc:      (reference to array of) HTML::Toc object(s) to insert
         - $filename: (reference to array of) file(s) to update ToC in
         - $options:  hash reference containing updator options.

Update ToC of specified file. For updator options, see HTML::TocUpdator Options.

Parser Options

When generating a ToC, additional options may be specified which influence the way the ToCs are generated using either TocGenerator, TocInsertor or TocUpdator. The options must be specified as a hash reference. For example:

$tocGenerator->generateFromFile($toc, $filename, {doUseGroupsGlobal => 1});

Available options are:

doGenerateToc
doUseGroupsGlobal
output
outputFile

doGenerateToc

syntax:         [0|1]
default:        1
applicable to:  TocInsertor, TocUpdator

True (1) if ToC must be generated. False (0) if ToC must be inserted only.

doUseGroupsGlobal

syntax:         [0|1]
default:        0
applicable to:  TocGenerator, TocInsertor, TocUpdator

True (1) if group levels must be used globally accross ToCs. False (0) if not. This option only makes sense when an array of ToCs is specified. For example, suppose you want to generate two ToCs, one ToC for '<h1>' tokens and one ToC for '<h2>' tokens, of the file 'index.htm':

<h1>Chapter</h1>
<h2>Paragraph</h2>

Using the default setting of 'doUseGroupsGlobal' => 0:

use HTML::Toc;
use HTML::TocGenerator;

my $toc1         = HTML::Toc->new();
my $toc2         = HTML::Toc->new();
my $tocGenerator = HTML::TocGenerator->new();

$toc1->setOptions({
   'header'     => '',
   'footer'     => '',
   'tokenToToc' => [{'tokenBegin' => '<h1>'}]
});
$toc2->setOptions({
   'header'     => '',
   'footer'     => '',
   'tokenToToc' => [{'tokenBegin' => '<h2>'}]
});
$tocGenerator->generateFromFile([$toc1, $toc2], 'index.htm');
print $toc1->format() . "\n\n" . $toc2->format();

the output will be:

<ul>
   <li><a href=#h-1>Chapter</a>
</ul>

<ul>
   <li><a href=#h-1>Paragraph</a>
</ul>

Each ToC will use its own numbering scheme. Now if 'doUseGroupsGlobal = 1' is specified:

$tocGenerator->generateFromFile(
   [$toc1, $toc2], 'index.htm', {'doUseGroupsGlobal' => 1}
);

the output will be:

<ul>
   <li><a href=#h-1>Chapter</a>
</ul>

<ul>
   <li><a href=#h-2>Paragraph</a>
</ul>

using a global numbering scheme for all ToCs.

output

syntax:         reference to scalar
default:        none
applicable to:  TocInsertor, TocUpdator

Reference to scalar where the output must be stored in.

outputFile

syntax:         scalar
default:        none
applicable to:  TocInsertor, TocUpdator

Filename to write output to. If no filename is specified, output will be written to standard output.

HTML::Toc Options

The HTML::Toc options can be grouped in the following categories:

Generate options
Insert options
Update options
Format options

The ToC options must be specified using the 'setOptions' method. For example:

my $toc = new HTML::Toc;

$toc->setOptions({
   'doNumberToken' => 1,
   'footer'        => '<!-- End Of ToC -->'
   'tokenToToc'    => [{
      'level'          => 1,
      'tokenBegin'     => '<h1>',
      'numberingStyle' => 'lower-alpha'
   }]
});

Generate options

Token groups
tokenToToc
doNumberToken
fileSpec
groupId
level
tokenBegin
tokenEnd
numberingStyle
groupToToc
levelToToc
Numbering tokens
doNumberToken
numberingStyle
templateTokenNumber
Miscellaneous
attributeToExcludeToken
attributeToTocToken
groupToToc
levelToToc
Linking ToC to tokens
doLinkToToken
doLinkToFile
doLinkToId
templateAnchorName
templateAnchorHrefBegin
templateAnchorHrefEnd
templateAnchorNameBegin
templateAnchorNameEnd

Insert options

insertionPoint

Update options

tokenUpdateBeginAnchorName
tokenUpdateEndAnchorName
tokenUpdateBeginToc
tokenUpdateEndToc
tokenUpdateBeginNumber
tokenUpdateEndNumber

Format options

doSingleStepLevel
doNestGroup
groupToToc
levelIndent
levelToToc
templateLevelBegin
templateLevelEnd

HTML::Toc Options Reference

attributeToExcludeToken

syntax:  $scalar
default: '-'

Token which marks an attribute value in a tokenBegin or insertionPoint token as an attribute value a token should not have to be marked as a ToC token. See also: Using attribute value as ToC entry.

attributeToTocToken

syntax:  $scalar
default: '@'

Token which marks an attribute in a tokenBegin token as an attribute which must be used as ToC text. See also: Using attribute value as ToC entry.

doLinkToToken

syntax:  [0|1]
default: 1

True (1) if ToC must be linked to tokens, False (0) if not. Note that 'HTML::TocInsertor' must be used to do the actual insertion of the anchor name within the source data.

doLinkToFile

syntax:  [0|1]
default: 0

True (1) if ToC must be linked to file, False (0) if not. In effect only when doLinkToToken equals True (1) and templateAnchorHrefBegin isn't specified.

doLinkToId

syntax:  [0|1]
default: 0

True (1) if ToC must be linked to tokens by using token ids. False (0) if ToC must be linked to tokens by using anchor names.

doNestGroup

syntax:  [0|1]
default: 0

True (1) if groups must be nested in the formatted ToC, False (0) if not. In effect only when multiple groups are specified within the tokenToToc setting. For an example, see Generate multiple groups in one ToC.

doNumberToken

syntax:  [0|1]
default: 0

True (1) if tokens which are used for the ToC generation must be numbered. This option may be specified both as a global ToC option or within a tokenToToc group. When specified within a tokenToToc option, the doNumberToken applies to that group only. For an example, see Specify an additional 'Part' group.

doSingleStepLevel

syntax:  [0|1]
default: 1

True (1) if levels of a formatted ToC must advance one level at a time. For example, when generating a ToC of a file with a missing '<h2>':

<h1>Chapter</h1>
<h3>Paragraph</h3>

By default, an empty indentation level will be inserted in the ToC:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1>Header 1</a>
   <ul>
      <ul>
         <li><a href=#h-1.0.1>Header 3</a>
      </ul>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

After specifying:

$toc->setOptions({'doSingleStepLevel' => 0});

the ToC will not have an indentation level inserted for level 2:

<!-- Table of Contents generated by Perl - HTML::Toc -->
<ul>
   <li><a href=#h-1>Header 1</a>
   <ul>
         <li><a href=#h-1.0.1>Header 3</a>
   </ul>
</ul>
<!-- End of generated Table of Contents -->

fileSpec

syntax:  <regexp>
default: undef

Specifies which files should match the current level. Valid only if doLinkToFile equals 1. For an example, see Site map.

footer

syntax:  $scalar
default: "\n<!-- End of generated Table of Contents -->\n"

String to output at end of ToC.

groupId

syntax:  $scalar
default: 'h'

Sets the group id attribute of a tokenGroup. With this attribute it's possible to divide the ToC into multiple groups. Each group has its own numbering scheme. For example, to generate a ToC of both normal headings and 'appendix' headings, specify the following ToC settings:

$toc->setOptions({
   'tokenToToc' => [{
          'tokenBegin' => '<h1 class=-appendix>'
       }, {
          'groupId' => 'appendix',
          'tokenBegin' => '<h1 class=appendix>'
   }]
});

groupToToc

syntax:  <regexp>
default: '.*'

Determines which groups to use for generating the ToC. For example, to create a ToC for groups [a-b] only, specify:

'groupToToc => '[a-b]'

This option is evaluated during both ToC generation and ToC formatting. This enables you to generate a ToC of all groups, but - after generating - format only specified groups:

$toc->setOptions({'groupToToc' => '.*'});
$tocGenerator->generateToc($toc, ...);
    # Get ToC of all groups
$fullToc = $toc->format();
    # Get ToC of 'appendix' group only
$toc->setOptions({'groupToToc' => 'appendix'});
$appendixToc = $toc->format();

header

syntax:  $scalar
default: "\n<!-- Table of Contents generated by Perl - HTML::Toc -->\n"

String to output at begin of ToC.

insertionPoint

syntax:  [<before|after|replace>] <token>
default: 'after <body>'
token:   <[/]tag{ attribute=[-|@]<regexp>}> |
         <text regexp> |
         <declaration regexp> |
         <comment regexp>

Determines the point within the source, where the ToC should be inserted. When specifying a start tag as the insertion point token, attributes to be included may be specified as well. Note that the attribute value must be specified as a regular expression. For example, to specify the <h1 class=header> tag as insertion point:

'<h1 class=^header$>'

Examples of valid 'insertionPoint' tokens are:

'<h1>'
'</h1>'
'<!-- ToC -->'
'<!ToC>'
'ToC will be placed here'

It is also possible to specify attributes to exclude, by prefixing the value with an attributeToExcludeToken, default a minus sign (-). For example, to specify the <h1> tag as insertion point, excluding all <h1 class=header> tags:

'<h1 class=-^header$>'

See also tokenBegin.

level

syntax:  number
default: 1

Number which identifies at which level the tokengroup should be incorporated into the ToC. See also: tokenToToc.

levelIndent

syntax:  number
default: 3

Sets the number of spaces each level will be indented, when formatting the ToC.

levelToToc

syntax:  <regexp>
default: '.*'

Determines which group levels to use for generating the ToC. For example, to create a ToC for levels 1-2 only, specify:

'levelToToc => '[1-2]'

This option is evaluated during both ToC generation and ToC formatting. This enables you to generate a ToC of all levels, but - after generating - retrieve only specified levels:

$toc->setOptions({'levelToToc' => '.*'});
$tocGenerator->generateToc($toc, ...);
    # Get ToC of all levels
$fullToc = $toc->getToc();
    # Get ToC of level 1 only
$toc->setOptions({'levelToToc' => '1'});
$level1Toc = $toc->getToc();

numberingStyle

syntax:  [decimal|lower-alpha|upper-alpha|lower-roman|upper-roman]}
default: decimal

Determines which numbering style to use for a token group when doLinkToToken is set to True (1). When specified as a main ToC option, the setting will be the default for all groups. When specified within a tokengroup, this setting will override any default for that particular tokengroup, e.g.:

$toc->setOptions({
   'doNumberToken' => 1,
   'tokenToToc' => [{
      'level'          => 1,
      'tokenBegin'     => '<h1>',
      'numberingStyle' => 'lower-alpha'
   }]
});

If roman style is specified, be sure to have the Roman module installed, available from http://www.perl.com/CPAN/modules/by-module/Roman.

templateAnchorName

syntax:  <expression|function reference>
default: '$groupId."-".$node'

Anchor name to use when doLinkToToken is set to True (1). The anchor name is passed to both templateAnchorHrefBegin and templateAnchorNameBegin. The template may be specified as either an expression or a function reference. The expression may contain the following variables:

$file
$groupId
$level
$node

If templateAnchorHrefBegin is a function reference to a function returning the anchor, like in:

$toc->setOptions({'templateAnchorName' => \&assembleAnchorName});

the function will be called with the following arguments:

$anchorName = assembleAnchorName($file, $groupId, $level, $node);

templateAnchorHrefBegin

syntax:  <expression|function reference>
default: '"<a href=#$anchorName>"' or
         '"<a href=$file#$anchorName>"',
         depending on 'doLinkToFile' being 0 or 1 respectively.

Anchor reference begin token to use when doLinkToToken is set to True (1). The template may be specified as either an expression or a function reference. The expression may contain the following variables:

$file
$groupId
$level
$node
$anchorName

If templateAnchorHrefBegin is a function reference to a function returning the anchor, like in:

$toc->setOptions({'templateAnchorHrefBegin' => \&assembleAnchorHrefBegin});

the function will be called with the following arguments:

$anchorHrefBegin = &assembleAnchorHrefBegin(
   $file, $groupId, $level, $node, $anchorName
);

See also: templateAnchorName, templateAnchorHrefEnd.

templateAnchorHrefEnd

syntax:  <expression|function reference>
default: '"</a>"'

Anchor reference end token to use when doLinkToToken is set to True (1). The template may be specified as either an expression or a function reference. If templateAnchorHrefEnd is a function reference to a function returning the anchor end, like in:

$toc->setOptions({'templateAnchorHrefEnd' => \&assembleAnchorHrefEnd});

the function will be called without arguments:

$anchorHrefEnd = &assembleAnchorHrefEnd;

See also: templateAnchorHrefBegin.

templateAnchorNameBegin

syntax:  <expression|function reference>
default: '"<a name=$anchorName>"'

Anchor name begin token to use when doLinkToToken is set to True (1). The template may be specified as either an expression or a function reference. The expression may contain the following variables:

$file
$groupId
$level
$node
$anchorName

If templateAnchorNameBegin is a function reference to a function returning the anchor name, like in:

$toc->setOptions({'templateAnchorNameBegin' => \&assembleAnchorNameBegin});

the function will be called with the following arguments:

$anchorNameBegin = assembleAnchorNameBegin(
    $file, $groupId, $level, $node, $anchorName
);

See also: templateAnchorName, templateAnchorNameEnd.

templateAnchorNameEnd

syntax:  <expression|function reference>
default: '"</a>"'

Anchor name end token to use when doLinkToToken is set to True (1). The template may be specified as either an expression or a function reference. If templateAnchorNameEnd is a function reference to a function returning the anchor end, like in:

$toc->setOptions({'templateAnchorNameEnd' => \&assembleAnchorNameEnd});

the function will be called without arguments:

$anchorNameEnd = &assembleAnchorNameEnd;

See also: templateAnchorNameBegin.

templateLevel

syntax:  <expression|function reference>
default: '"<li>$text\n"'

Expression to use when formatting a ToC node. The template may be specified as either an expression or a function reference. The expression may contain the following variables:

$level
$groupId
$node
$sequenceNr
$text

If templateLevel is a function reference to a function returning the ToC node, like in:

$toc->setOptions({'templateLevel' => \&AssembleTocNode});

the function will be called with the following arguments:

$tocNode = &AssembleTocNode(
    $level, $groupId, $node, $sequenceNr, $text
);

templateLevelBegin

syntax:  <expression>
default: '"<ul>\n"'

Expression to use when formatting begin of ToC level. See templateLevel for list of available variables to use within the expression. For example, to give each ToC level a class name to use with Cascading Style Sheets (CSS), use the expression:

'"<ul class=toc_$groupId$level>\n"'

which will result in each ToC group being given a class name:

<ul class=toc_h1>
   <li>Header
</ul>

For an example, see Using CSS for ToC formatting.

templateLevelEnd

syntax:  <expression>
default: '"<ul>\n"'

Expression to use when formatting end of ToC level. See templateLevel for a list of available variables to use within the expression. The default expression is:

'"</ul>\n"'

For an example, see Using CSS for ToC formatting.

templateTokenNumber

syntax:  <expression|function reference>
default: '"$node &nbsp;"'

Token number to use when doNumberToken equals True (1). The template may be specified as either an expression or a function reference. The expression has access to the following variables:

$file
$groupId
$groupLevel
$level
$node
$toc

If templateTokenNumber is a function reference to a function returning the token number, like in:

$toc->setOptions({'templateTokenNumber' => \&assembleTokenNumber});

the function will be called with the following arguments:

$number = &assembleTokenNumber(
    $node, $groupId, $file, $groupLevel, $level, $toc
);

tokenBegin

syntax:  <token>
default: '<h1>'
token:   <[/]tag{ attribute=[-|@]<regexp>}> |
         <text regexp> |
         <declaration regexp> |
         <comment regexp>

This scalar defines the token that will trigger text to be put into the ToC. Any start tag, end tag, comment, declaration or text string can be specified. Examples of valid 'tokenBegin' tokens are:

'<h1>'
'</end>'
'<!-- Start ToC entry -->'
'<!Start ToC entry>'
'ToC entry'

When specifying a start tag, attributes to be included may be specified as well. Note that the attribute value is used as a regular expression. For example, to specify the <h1 class=header> tag as tokenBegin:

'<h1 class=^header$>'

It is also possible to specify attributes to exclude, by prefixing the value with an attributeToExcludeToken, default a minus sign (-). For example, to specify the <h1> tag as tokenBegin, excluding all <h1 class=header> tags:

'<h1 class=-^header$>'

Also, you can specify here an attribute value which has to be used as ToC text, by prefixing the value with an attributeToTocToken, default an at sign (@). For example, to use the class value as ToC text:

'<h1 class=@>'

See Generate multiple ToCs for an elaborated example using the attributeToTocToken to generate a ToC of image alt attribute values.

See also: tokenEnd, tokenToToc.

tokenEnd

syntax:  $scalar
default: empty string ('') or end tag counterpart of 'tokenBegin' if 
         'tokenBegin' is a start tag

The 'tokenEnd' definition applies to the same rules as tokenBegin.

See also: tokenBegin, tokenToToc.

tokenToToc

syntax:  [{array of hashrefs}]
default: [{
            'level'      => 1,
            'tokenBegin' => '<h1>'
         }, {
            'level'      => 2,
            'tokenBegin' => '<h2>'
         }, {
            'level'      => 3,
            'tokenBegin' => '<h3>'
         }, {
            'level'      => 4,
            'tokenBegin' => '<h4>'
         }, {
            'level'      => 5,
            'tokenBegin' => '<h5>'
         }, {
            'level'      => 6,
            'tokenBegin' => '<h6>'
         }]

This hash define the tokens that must act as ToC entries. Each tokengroup may contain a groupId, level, numberingStyle, tokenBegin and tokenEnd identifier.

tokenUpdateBeginAnchorName

syntax:  <string>
default: '<!-- #BeginTocAnchorNameBegin -->';

This token marks the begin of an anchor name, inserted by HTML::TocInsertor. This option is used by HTML::TocUpdator.

tokenUpdateEndAnchorName

syntax:  <string>
default: '<!-- #EndTocAnchorName -->';

This option is used by HTML::TocUpdator, to mark the end of an inserted anchor name.

tokenUpdateBeginNumber

syntax:  <string>
default: '<!-- #BeginTocNumber -->';

This option is used by HTML::TocUpdator, to mark the begin of an inserted number.

tokenUpdateEndNumber

syntax:  <string>
default: '<!-- #EndTocAnchorName -->';

This option is used by HTML::TocUpdator, to mark the end of an inserted number.

tokenUpdateBeginToc

syntax:  <string>
default: '<!-- #BeginToc -->';

This option is used by HTML::TocUpdator, to mark the begin of an inserted ToC.

tokenUpdateEndToc

syntax:  <string>
default: '<!-- #EndToc -->';

This option is used by HTML::TocUpdator, to mark the end of an inserted ToC.

Known issues

Cygwin

In order for the test files to run on Cygwin without errors, the 'UNIX' default text file type has to be selected during the Cygwin setup. When extracting the tar.gz file with WinZip the 'TAR file smart CR/LF conversion' has to be turned off via {Options|Configuration...|Miscellaneous} in order for the files 'toc.pod' and './manualTest/manualTest1.htm' to be left in UNIX format.

Nested anchors

HTML::Toc can only link to existing anchors if these anchors are placed outside of the ToC tokens. Otherwise a warning will be given. For example, generating a linked ToC of <h1> tokens of the following text:

<a name=foo><h1>Header</h1></a>

will go all right, whereas:

<h1><a name=foo>Header</a></h1>

will yield the warning:

warning (1): Nested anchor '<a name=foo>' within anchor '<a name=h-1>'.

since anchor names aren't allowed to be nested according to the HTML 4.01 specification.

AUTHOR

Freddy Vulto <"fvu@fvu.myweb.nl">

COPYRIGHT

Copyright (c) 2001 Freddy Vulto. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.