NAME
xpathify - output HTML document as a flat XPath/content list
VERSION
version 0.019
SYNOPSIS
xpathify [options] (HTML file | URL | -)
DESCRIPTION
Represents a typical HTML document in a very verbose two-column mode. The first column is a XPath which locates each element inside the HTML tree. The second column is a respective content (if any).
/html/head/title/text() test 1
/html/body/h1/text() test 2
/html/body/p[1]/text() Lorem ipsum dolor sit amet, consectetur adipiscing elit.
OPTIONS
- --help
-
This.
- --encoding=name
-
Specify the HTML document encoding (
latin1
,utf8
). UTF-8 is assumed by default. - --[no]color
-
Enable syntax highlight for XPath. By default, enabled automatically on interactive terminals.
- --16
-
Use 16 system colors. By default, try to use 256-color ANSI palette.
- --[no]html
-
Disables the
--color
option and highlights using HTML/CSS. - --[no]shrink
-
Shrink the XPath to the minimal unique identifier. For example:
/html/body[@id='cpansearch']/form[@class='searchbox']/input[@name='query']
Could be shortened as:
//input[@name='query']
The shrinking is enabled by default.
- --[no]strict
-
Strict mode disables grouping by
id
,class
orname
attributes. The grouping is enabled by default. - --[no]weight
-
Print XPath weight on a second column.
EXAMPLES
xpathify http://metacpan.org
curl http://www.msn.com | xpathify -c --strict -
xpathify --nocolor --noshrink t/test.html
AUTHOR
Stanislaw Pusep <stas@sysd.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Stanislaw Pusep.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.