NAME
simple_scan - scan a set of Web pages for strings present/absent
ABSTRACT
App::SimpleScan - Mini-language for website testing
SYNOPSIS
simple_scan [--generate] [--run]
[--define key="value value ..." ] [--override] [--defer]
[--debug]
[--warn]
[--no-agent]
[--autocache]
[--status]
{file file file ...}
USAGE
# Run the tests in the files supplied on the command line.
# --run (or -run; we're flexible) is assumed if you give no switches.
% simple_scan file1 file2 file3
# Generate a set of tests and save them, then run them.
% <complex pipe> | simple_scan --generate > pipe_scan.t
# Run one simple test
% echo "http://yahoo.com yahoo Y Look for yahoo.com" | simple_scan -run
DESCRIPTION
simple_scan
is an extensible "little language" for static web page testing. It allows you to define tests in terms of test specs (which tell simple_scan
where to go and what to look for there) and pragmas (which define string substitutions, or alter the way that simple_scan
runs its tests).
simple_scan
is designed to be easy to use. If you know where your page is (what URL) and can write a basic regular expression to match text on that page, you can use simple_scan
.
simple_scan
itself is based on a pluggable Perl class; more sophisticated users can install plugins to extend the language itself, or even the command-line options that the simple_scan
command accepts.
Low-level access to web pages is done via WWW::Mechanize::Pluggable
and Test::WWW::Simple
, so it's even possible to build new methods to access your data into the language by writing plugins for Mech and simple_scan
.
simple_scan
is meant to be a simple web testing language, so it doesn't implement any control structures. You declare what tests are to be run, and simple_scan
then runs them all, telling you at the end which tests passed and which didn't. It uses TAP (Test Anything Protocol) to report on the tests, meaning that any Test::Harness
-based program can read and interpret the output.
BASICS
simple_scan
reads either files supplied on the command line, or standard input. It creates and runs, or prints, or even both, a Test::WWW::Simple test for the criteria supplied to it.
simple_scan
test specs should be in the following format, starting in column 1:
<URL> <pattern> <Y|N> <comment>
The URL is any URL; pattern is a Perl regular expression, delimited by slashes; Y|N is Y
if the pattern should match, or N
if the pattern should not match; and comment is any arbitrary text you like (as long as it's all on the same line as everything else).
simple_scan
will do its best to try to interpret your pattern; if it can't parse it as a regular expression, it will assume you meant to match against a literal character string instead; so a pattern like
/<b>this</b>/
would be interpreted as the literal string "<b>this</b>".
COMMAND-LINE SWITCHES
We use Getopt::Long to get the command-line options, so we're really very flexible as to how they're entered. You can use either one dash (as in -foo
) or two (as in --bar
). You only need to enter the minimum number or characters to match a given switch.
--run
-
--run
tellssimple_scan
to immediately run the tests it's created. Can be abbreviated to-r
.This option is mosst useful for one-shot tests that you're not planning to run repeatedly.
--generate
-
--generate
tellssimple_scan
to print the test it's generated on the standard output.This option is useful to build up a test suite to be reused later.
Both -r
and -g
can be specified at the same time to run a test and print it simultaneously; this is useful when you want to save a test to be run later as well as right now without having to regenerate the test.
--define
-
--define
allows you to predefine substitutions to be used during asimple_scan
run. To define a substitution, use this syntax:--define foo=bar --define baz="one two three"
The first example defines a single substitution; the second defines a multiple substitution. In conjunction with
--override
,--define
can makesimple_scan
ignore any definitions for variables in thesimple_scan
input file. Conversely, if--defer
is specified, any definitions on the command line will be altered if a definition for the variable is found in the input file.Note that
%%forget
can still makesimple_scan
forget a definition (ifApp::SimpleScan::Plugin::Forget
is installed).Also note that you define a variable with multiple values like this:
--define foo="bar baz quux"
but not like this:
--define foo=bar --define foo=baz --define foo=quux
since multiple definitions of a single substitution use only the last substitution defined; the example directly above (with the three "--define" entries) defines "foo" as "quux" and only as "quux".
--override
-
Makes any definitions entered on the command line override definitions found in the input file.
--defer
-
Makes any definitions entered on the command line defer to defintions found in the input file - the variables in question will be redefined by the command file.
--debug
-
Enables debugging for you
simple_scan
input file; this outputs a lot of extra code which, when executed bysimple_scan --run
, shows a lot more information as to what actually happened.Currently, the only extra debugging information is a list of variables which were not altered by substitution pragmas when
--override
was specified on the command line. --warn
-
Causes simple_scan to output code that gives you warnings (via diag()) in the run file about syntax errors, etc.
--no-agent
-
Tells simple_scan to not set up a default user agent. Some applications (e.g., mobile applications) actually go into a debug mode when talking to a detectable (known) browser. This turns off simple_scan's assumption that you want to look like a browser.
--autocache
-
Turns on caching immediately, whether or not the input file specifies
%%cache
or not. Note that a%%nocache
in the input file will turn caching off again. - c<--status>
-
Turns on status reporting. Sometimes
simple_scan
takes a while to run (especially if you've defined a lot of variables). This causes it to pop out a new status message as each input line is processed.
PRAGMAS
Pragmas are ways to influence what simple_scan
does when generating tests. They are specified with %%
in column 1 and the pragma name immediately following. Any arguments are supplied after a colon, like this:
%%foo: bar baz
This invokes the foo
pragma with the argument bar baz
. If you're really lazy, you can even leave out the colon.
Substitutions
Any pragma that's otherwise unrecognized by simple_scan
is treated as a substitution. Substitutions assume that you have a name and a set of strings following it; these strings wil be substituted into the test specs occuring between this set of substitutions and the next set. Any variables not redefined will continue to have their old values.
Here's a basic example.
%% user dconway chromatic petdance
%% use_perl_id Ovid pemungkah
http://search.cpan.org/~<user>
http://use.perl.org/~<use_perl_id>/journal/
http://search.yahoo.com/
...
This would fetch the CPAN index page for the users dconway, chromatic, and petdance, and the use.perl journals for users Ovid and pemungkah. Finally, it would (just once) fetch the Yahoo! search page - because there are no substitutions in that line, it would only be evaluated once.
Substitutions can occur anywhere in the line, including in the comment.
Here's another example: internationalization. For instance, let's assume that you want to substitute each of a list of two-character country codes into a string (most likely somewhere in the URL, but possibly in the comment too).
simple_scan
will do this for you, creating a test for each country code you specify. For instance:
%%xx: es au my jp
http://<xx>.mysite.com/ /blargh/ Y look for blargh (<xx>)
This would generate 4 tests, for es.mysite.com
, au.mysite.com
, c<my.mysite.com>, and jp.mysite.com
, all looking to match blargh
somewhere on the page.
Multiple substitutions in a single line
If you define multiple variables and use them in a test spec, simple_scan
will create all of the unique combinations of the values and substitute them into your test spec. For example:
%%foo bar baz %%quux zorch thud http://<foo>.yoursite.com?zz=<quux> /Search found/ Y check <quux> search
would generate all four alternatives and run tests for each one:
http://bar.yoursite.com?zz=zorch /Search found/ Y check zorch search http://baz.yoursite.com?zz=zorch /Search found/ Y check zorch search http://bar.yoursite.com?zz=thud /Search found/ Y check thud search http://baz.yoursite.com?zz=thud /Search found/ Y check thud search
This makes it very easy to generate many tests from very few input lines. simple_scan's substitution engine tracks the values of the variables and ensures that, for any given line, the substitution values stay consistent.
Nested substitutions
Substitutions can also reference other substitutions, so something like this is also possible:
%%mirror blonk whiz thud crunch %%welcome_msg 'Welcome to <mirror>' http://<mirror>.yoursite.org/ /<welcome_msg>/ Y <mirror> welcome
When the test spec is expanded, the string 'Welcome to <mirror>' is substituted in first, then the test spec is expanded again to create a test for each one of the mirrors.
Note that at present, checking for circular substitutions is not yet complete; if you write something like this:
%%foo <bar> %%bar <foo> http://<foo>.com /check/ Y Infinite loop
simple_scan
will substitute "<bar>" for "<foo>, then "<foo>" for "<bar>", and will continue to happily do so until you kill the process. At the moment, try not to do this; we'll have a fix in an upcoming release.
Single-quotes, double-quotes, and backticks
You can use single-quoted strings in substitutions to get exact strings containing spaces or tabs:
%%searchtext 'this one' 'that one' 'another one'
The spaces will be preserved in the values assigned to searchtext
.
If you want to eval
the contents of a string as if it were Perl code and use that as the value of a substitution, put double quotes around it:
%%language "$ENV{LANGUAGE}"
%%now "@{[scalar localtime]}"
The first example allows you to pass in a value from the environment variable $LANGUAGE
; the second gets the current date and time as a string (so its value would be something like "Tue Feb 14 14:21:56 2006").
Lastly, you can use backticked strings to denote a command to be executed by the shell; the command's output will be used in place of the quoted string.
As an example, if we have the script languages
which looks like this:
#!/bin/sh
echo "perl java python ruby"
and the substitution
%%language `languages`
then the values finally assigned to language
would be perl java ruby python
.
All of the different forms can be mixed on one line, so
%%try `some_command "value one" value2
would set try
to the output of some_command
, value one
, and value2
.
Finally, since quoted strings are embedded exactly as provided, it's possible to parameterize your test specs by using environment variables, like this:
%%language $ENV{LANGUAGE}
http://<language>.org/ /language/i Y <language> should be on the page
Now setting the enviroment variable LANGUAGE
in your shell to 'perl' will propagate 'perl' into the test spec as the language we're testing for.
OTHER PRAGMAS DEFINED BY SIMPLE_SCAN
There are a few other pragmas defined directly by simple_scan
. These are not plugins, but are implemented directly in the code.
agent
The agent
pragma allows you to switch user agents during the test. Test::WWW::Simple
's default is Windows IE 6
, but you can switch it to any of the other user agent aliases supported by WWW::Mechanize
.
http://gemal.dk/browserspy/basic.html /Explorer/ Y Should be Explorer
%%agent: Mac Safari
http://gemal.dk/browserspy/basic.html /Safari/ Y Should be Safari
(Note: gemal.dk actually does tell you what browser you're running, so feel free to try this test yourself.)
cache
The cache
pragma turns on URL caching; once enabled, the page returned on the first access to a URL is returned directly from a memory cache, without its being reaccessed from the Web.
Using cache
can result in major speedups for tests which repeatedly hit the same page.
nocache
The nocache
pragma turns off URL caching; this is useful if you have something like a REST interface that may return different values from repeated accesses to the same URL.
PLUGINS
simple_scan is extended via plugins in the App::SimpleScan::Plugin
namespace. Currently-released plugins:
App::SimpleScan::Plugin::Cache
- disk-based cachingApp::SimpleScan::Plugin::Snapshot
- HTML snapshots of testsApp::SimpleScan::Plugin::Plaintext
- check un-marked-up page textApp::SimpleScan::Plugin::Retry
- retries HTTP failuresApp::SimpleScan::Plugin::LinkCheck
- link counting/presence/absenceApp::SimpleScan::Plugin::Forget
- discard a substitutionRead the documentation for these plugin classes for information on pragmas and/or command-line options that they provide.
BUGS AND LIMITATIONS
Substitutions, especially when there are large numbers of variables with multiple values, are slow. (Welcome to the world of combinatory explosion.) A future release should use the dependency tree we're going to need anyway to detect circular references to eliminate variables that cannot possibly be substituted into the current string, thereby decreasing the load on the combination checker.
AUTHOR
Joe McMahon <mcmahon@cpan.org>
COPYRIGHT AND LICENSE
Copyright (c) 2005, 2006 by Yahoo!
This script is free software; you can redistribute it or modify it under the same terms as Perl itself, either Perl version 5.6.1 or, at your option, any later version of Perl 5 you may have available.