The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

WWW::HtmlUnit - Inline::Java based wrapper of the HtmlUnit v2.7 library

SYNOPSIS

use WWW::HtmlUnit;
my $webClient = WWW::HtmlUnit->new;
my $page = $webClient->getPage("http://google.com/");
my $f = $page->getFormByName('f');
my $submit = $f->getInputByName("btnG");
my $query  = $f->getInputByName("q");
$page = $query->type("HtmlUnit");
$page = $query->type("\n");

my $content = $page->asXml;
print "Result:\n$content\n\n";

DESCRIPTION

This is a wrapper around the HtmlUnit library (HtmlUnit version 2.5 for this release). It includes the HtmlUnit jar itself and it's dependencies. All this library really does is find the jars and load them up using Inline::Java.

The reason all this is interesting? HtmlUnit has very good javascript support, so you can automate, scrape, or test javascript-required websites.

See especially the HtmlUnit documentation on their site for deeper API documentation, http://htmlunit.sourceforge.net/apidocs/.

INSTALLING

There is one problem that I fun into when installing Inline::Java, and thus WWW::HtmlUnit, which is telling the installer where to find your java home. It turns out this is really really easy, just define the JAVA_HOME environment variable before you start your CPAN shell / installer. I do this in Debian/Ubuntu:

apt-get install sun-java6-jdk
JAVA_HOME=/usr/lib/jvm/java-6-sun cpan WWW::HtmlUnit

and everything works the way I want! I should submit a patch to the error message that Inline::Java spits out...

MODULE IMPORT PARAMETERS

If you need to include extra .jar files, you can do:

use HtmlUnit jars => ['/path/to/blah.jar'];

and that wil be added to the list of jars for Inline::Java to autostudy.

METHODS

$webClient = WWW::HtmlUnit->new($browser_name)

This is just a shortcut for

$webClient = WWW::HtmlUnit::com::gargoylesoftware::htmlunit::WebClient->new;

The optional $browser_name allows you to specify which browser version to pass to the WebClient->new method. You could pass "FIREFOX_3" for example, to make the engine especially try to emulate Firefox 3 quirks, I imagine.

DEPENDENCIES

When installed using the CPAN shell, all dependencies besides java itself will be installed. This includes the HtmlUnit jar files, and in fact those files make up the bulk of the distribution.

TIPS

How do I do HTTP authentication?

my $credentialsProvider = $webclient->getCredentialsProvider;                           
$credentialsProvider->addCredentials($username, $password);                

How do I turn off SSL certificate checking?

$webclient->setUseInsecureSSL(1);

TODO

  • Capture HtmlUnit output to a variable

  • Use that to have a quiet-mode

SEE ALSO

http://htmlunit.sourceforge.net/, Inline::Java

AUTHOR

Brock Wilcox <awwaiid@thelackthereof.org> - http://thelackthereof.org/

COPYRIGHT

Copyright (c) 2009 Brock Wilcox <awwaiid@thelackthereof.org>. All rights
reserved.  This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

HtmlUnit library includes the following copyright:

    Copyright (c) 2002-2009 Gargoyle Software Inc.

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.