NAME

HTML::Embedded::Turtle - embedding RDF in HTML the crazy way

SYNOPSIS

use HTML::Embedded::Turtle;

my $het = HTML::Embedded::Turtle->new($html, $base_uri);
foreach my $graph ($het->endorsements)
{
  my $model = $het->graph($graph);
  
  # $model is an RDF::Trine::Model. Do something with it.
}

DESCRIPTION

RDF can be embedded in (X)HTML using simple <script> tags. This is described at http://esw.w3.org/N3inHTML. This gives you a file format that can contain multiple (optionally named) graphs. The document as a whole can "endorse" a graph by including:

<link rel="meta" href="#foo" />

Where "#foo" is a fragment identifier pointing to a graph.

<script type="text/turtle" id="foo"> ... </script>

The rel="meta" stuff is parsed using an RDFa parser, so equivalent RDFa works too.

This module parses HTML files containing graphs like these, and allows you to access them each individually; as a union of all graphs on the page; or as a union of just the endorsed graphs.

Despite the module name, this module supports a variety of <script type>s: text/turtle, application/turtle, application/x-turtle text/plain (N-Triples), text/n3 (Notation 3), application/x-rdf+json (RDF/JSON), application/json (RDF/JSON), and application/rdf+xml (RDF/XML).

The deprecated attribute "language" is also supported:

<script language="Turtle" id="foo"> ... </script>

Languages supported are (case insensitive): "Turtle", "NTriples", "RDFJSON", "RDFXML" and "Notation3".

Constructor

HTML::Embedded::Turtle->new($markup, $base_uri, \%opts)

Create a new object. $markup is the HTML or XHTML markup to parse; $base_uri is the base URI to use for relative references.

Options include:

  • markup

    Choose which parser to use: 'html' or 'xml'. The former chooses HTML::HTML5::Parser, which can handle tag soup; the latter chooses XML::LibXML, which cannot. Defaults to 'html'.

  • rdfa_options

    A set of options to be parsed to RDF::RDFa::Parser when looking for endorsements. See RDF::RDFa::Parser::Config. The default is probably sensible.

Public Methods

union_graph

A union graph of all graphs found in the document, as an RDF::Trine::Model. Note that the returned model contains quads.

endorsed_union_graph

A union graph of only the endorsed graphs, as an RDF::Trine::Model. Note that the returned model contains quads.

graph($name)

A single graph from the page.

graphs
all_graphs

A hashref where the keys are graph names and the values are RDF::Trine::Models. Some graph names will be URIs, and others may be blank nodes (e.g. "_:foobar").

graphs and all_graphs are aliases for each other.

endorsed_graphs

Like all_graphs, but only returns endorsed graphs. Note that all endorsed graphs will have graph names that are URIs.

endorsements

Returns a list of URIs which are the names of endorsed graphs. Note that the presence of a URI $x in this list does not imply that $het->graph($x) will be defined.

dom

Returns the page DOM.

uri

Returns the page URI.

BUGS

Please report any bugs to http://rt.cpan.org/.

Please forgive me in advance for inflicting this module upon you.

SEE ALSO

RDF::RDFa::Parser, RDF::Trine, RDF::TriN3.

http://www.perlrdf.org/.

AUTHOR

Toby Inkster <tobyink@cpan.org>.

COPYRIGHT AND LICENSE

Copyright (C) 2010-2011, 2013 by Toby Inkster.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

DISCLAIMER OF WARRANTIES

THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.