NAME Catalyst::Plugin::BigSitemap - Auto-generated Sitemaps for up to 2.5 billion URLs.
DESCRIPTION
A nearly drop-in replacement for Catalyst::Plugin::Sitemap that builds a Sitemap Index file as well as your normal Sitemap Files (to support websites with more than 50,000 urls). Additionally, some of the code for this plugin was forked from Catalyst::Plugin::Sitemap
Additionally, this method allows for storing your sitemap files to disk once they are built, and can automatically rebuild them for you at a specified interval
SYNOPSIS
#
# Actions you want included in your sitemap. In this example, there's a total of 10 urls that will be written
#
sub single_url_action :Local :Args(0) :Sitemap() { ... }
sub single_url_with_attrs : Local :Args(0) :Sitemap( loc => 'http://www.mysite/here', changefreq => 'daily', priority => '0.5' ) { ... }
sub multiple_url_action :Local :Args(1) :Sitemap('*') { ... }
sub multiple_url_action_sitemap {
my ( $self, $c, $sitemap ) = @_;
my $a = $c->controller('MyController')->action_for('multiple_url_action');
for (my $i = 0; $i < 8; $i++) {
my $uri = $c->uri_for($a, [ $i, ]);
$sitemap->add( $uri );
}
}
#
# Action to rebuild your sitemap -- you want to protect this!
# Best thing to do would be manually instantiate an instance of your
# application from the cron job, mark this method private and call it.
# You could also go crazy and use WWW::Mechanize .. or hell.. leave it
# public and call it from your browser.. your call. I wouldn't do that,
# though ;)
# Your old sitemap files will automatically be overwritten.
#
sub rebuild_cache :Private {
my ( $self, $c ) = @_;
$c->write_sitemap_cache();
}
#
# Serving the sitemap files is best to do directly through apache..
# New version of catalyst have depreciated regex actions, which
# makes doing sitemap files a little more difficult (though you
# can still manually include support for regex actions)
#
# Also, if you only have a single sitemap, and want to use this like
# Catalyst::Plugin::Sitemap, see sub single_sitemap below.
#
sub sitemap_index :Private {
my ( $self, $c ) = @_;
my $smi_xml = $c->sitemap_builder->sitemap_index->as_xml;
$c->response->body( $smi_xml );
}
sub single_sitemap :Private {
my ( $self, $c ) = @_;
my $sm_xml = $c->sitemap_builder->sitemap(0)->as_xml;
$c->response->body( $sm_xml );
}
CONFIGURATION
There are a few configuration settings that must be set for this application to function properly. Additionally, I would HIGHLY recommend (unless you have a relatively small sitemap), to not serve these directly.
- cache_dir - required
-
The absolute filesystem path to where your configuration file will be stored.
- url_base - optional: defaults to whichever base url the request is made to
-
This is the base url that will be used when building the urls for your application.
Note: This is important especially if your rebuild is being launched by a cronjob that's making a request to localhost. In that case, if you fail the specify this setting, all your urls will be resolved to http://localhost/my-action-here/ ... This probably doesn't help you.
Note: The trailing slash is important!
- sitemap_name_format - optional: defaults to sitemap%d.xml.gz
-
A sprintf format string. Your sitemaps will be named beginning with 1 up through the total number of sitemaps that are necessary to build your data. By default, this will end up being something like
Note: The file extension should either be
.xml
or.xml.gz
. The proper type of file will be built depending on which extension you specify. - sitema_index_name - optional: defaults to sitemap_index.xml
-
Note: Just like with sitename_name_format, .xml or .xml.gz should be specified as the file extension.
Config::General Example
<Plugin::BigSitemap>
cache_dir /var/www/myapp/root/sitemaps
url_base http://mywebsite/
sitemap_name_format sitemap%d.xml.gz
sitemap_index_name sitemap_index.xml
</Plugin::BigSitemap>
ATTRIBUTES
sitemap_builder
A lazy-loaded Catalyst::Plugin::BigSitemap::SitemapBuilder object. If you want access to the individual WWW::Sitemap::XML or the WWW::SitemapIndex::XML file, you'll do that through this object.
METHODS
- write_sitemap_cache()
-
Writes your sitemap_index and sitemap files to whichever cache_dir you've specified in your configuration.
INTERNAL USE METHODS
Methods you shouldn't be calling directly.. They're listed here for documentation purposes.
- _get_sitemap_builder()
-
Returns a sitemap builder object that's fully populated with all the sitemap urls registered. This can take quite some time depending on the number of urls you're registering with the sitemap and how they're being generated.
You shouldn't ever need to call this directly -- it's set as the builder method for the sitemap_builder attribute.
Note: This can take an incredibly long time especially if you have a lot of URLs! Use with care!
SEE ALSO
AUTHOR
Derek J. Curtis <djcurtis@summersetsoftware.com>
Summerset Software, LLC
http://www.summersetsoftware.com
COPYRIGHT
Derek J. Curtis 2013
LICENSE
This library is free software. You can redistribute it and/or modify it under the same terms as Perl itself.