NAME

Pod::Simple::Words - Parse words and locations from a POD document

VERSION

version 0.07

SYNOPSIS

use Pod::Simple::Words;

my $parser = Pod::Simple::Words->new;

$parser->callback(sub {
  my($type, $filename, $line, $input) = @_;

  if($type eq 'word')
  {
    # $input is human language word
  }
  elsif($type eq 'stopword')
  {
    # $input is a stopword in tech speak
  }
  elsif($type eq 'module')
  {
    # $input is CPAN moudle (eg FFI::Platypus)
  }
  elsif($type eq 'url_link')
  {
    # $input   is the URL
  }
  elsif($type eq 'pod_link')
  {
    my($podname, $section) = @$input;
    # $podname is the POD document (undef for current)
    # $section is the section      (can be undef)
  }
  elsif($type eq 'man_link')
  {
    my($manname, $section) = @$input;
    # $manname is the MAN document
    # $section is the section      (can be undef)
  }
  elsif($type eq 'section')
  {
    # $input is the name of a documentation section
  }
  elsif($type eq 'error')
  {
    # $input is a POD error
  }
});

$parser->parse_file('lib/Foo.pm');

DESCRIPTION

This Pod::Simple parser extracts words from POD, with location information. Some other event types are supported for convenience. The intention is to feed this into a spell checker. Note:

stopwords

This module recognizes inlined stopwords. These are words that shouldn't be considered misspelled for the POD document.

head1 is normalized to lowercase

Since the convention is to uppercase =head1 elements in POD, and most spell checkers consider this a spelling error, we convert =head1 elements to lower case.

comments in verbatim blocks

Comments are extracted from verbatim blocks and their words are included, because misspelled words in the synopsis comments can be embarrassing!

unicode

Should correctly handle unicode, if the =encoding directive is correctly set.

CONSTRUCTOR

new

my $parser = Pod::Simple::Words->new;

This creates an instance of the parser.

PROPERTIES

callback

$parser->callback(sub {
  my($type, $filename, $line, $input) = @_;
  ...
});

This defines the callback when the specific input items are found. Types:

word

Regular human language word.

stopword

Word that should not be considered misspelled. This is often for technical jargon which is spelled correctly but not in the regular human language dictionary.

module

CPAN Perl module. Of the form Foo::Bar. As a special case Foo::Bar's is recognized as the possessive of the Foo::Bar module.

A regular internet URL link.

my($podname, $section) = @$input;

A link to another POD document. Usually a module or a script. The $podname is the name of the pod document to link to. If this is undef, it means that the link is to a section inside the current document. The $section is the section of the document to link to. The $section will be undef if not linking to a specific section.

my($manname, $section) = @$input;

A link to a UNIX man page. The $manname is the name of the man page. The $section is the section of the man page to link to, which will be undef if not linking to a specific section.

section

A section inside of the current document which can be linked to externally or internally. This is usually the title of a header like =head1, =head2, etc.

error

An error that was detected during parsing. This allows the spell checker to check the correctness of the POD at the same time if it so chooses.

Additional arbitrary types can be added to the splitter class in addition to these.

splitter

$parser->splitter($splitter);

The $splitter is an instance of Text::HumanComputerWords, or something that implements a split method exactly like it does. It is used to split text into human and computer words. The default is reasonable for Perl.

METHODS

skip_sections

$parser->skip_sections(@sections);

Skip the given =head1 level sections. Note that words from the section header itself will be included, but the content of the section will not. This is useful for skipping CONTRIBUTOR or similar sections which are usually mostly names and shouldn't be spell checked against a human language dictionary.

SEE ALSO

Pod::Spell

and other modules do similar parsing of POD for potentially misspelled words. At least internally. The usually explicitly exclude comments from verbatim blocks, and often split words on the wrong boundaries.

AUTHOR

Graham Ollis <plicease@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2021 by Graham Ollis.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.