NAME
URI::ParseSearchString - parse search engine referrer URLs and extract keywords used
VERSION
Version 2.8 (Lich king edition)
SYNOPSIS
use URI::ParseSearchString ;
my $uparse = new URI::ParseSearchString();
my $ref = 'http://www.google.com/search?hl=en&q=a+simple+test&btnG=Google+Search';
my $query_terms = $uparse->se_term( $ref );
my $canonical = $uparse->se_name( $ref );
my $hostname = $uparse->se_host( $ref );
FUNCTIONS
new
Creates a new instance object of the module.
my $uparse = new URI::ParseSearchString() ;
parse_search_string
This module provides a simple function to parse and extract search engine query strings. It was designed and tested having Apache referrer logs in mind. It can be used for a wide number of purposes, including tracking down what keywords people use on popular search engines before they land on a site. Although a number of existing modules and scripts exist for this purpose, the majority of them are either outdated using obsolete search strings associated with each engine.
The default function exported is "parse_search_string" which accepts an unquoted referrer string as input and returns the search engine query contained within. It currently works with both escaped and un-escaped queries and will translate the search terms before returning them in the latter case. The function returns undef in all other cases and errors.
for example:
my $ref = 'http://www.google.com/search?hl=en&q=a+simple+test&btnG=Google+Search';
my $terms =
$uparse->parse_search_string( $ref );
would return 'a simple test'
whereas
my $ref = 'http://www.mamma.com/Mamma?utfout=1&qtype=0&query=a+more%21+complex_+search%24&Submit=%C2%A0%C2%A0Search%C2%A0%C2%A0';
my $terms =
$uparse->parse_search_string( $terms );
would return 'a more! complex_ search$'
Currently supported search engines include:
Abacho
AOL (UK)
AOLSEARCH
AllTheWeb
ASK.com
Blueyonder (UK)
BBC search
Categorico (IT)
Conduit
Cuil
Fastweb IT
Feedster Blog Search
Fireball (DE)
Froogle
Froogle (UK)
Google & 231 other TLD's
Google Blog Search
Godado
Godado (IT)
HotBot
Ice Rocket Blog Search
ICQ.com
ilMotore.com
Ithaki.net
Kataweb (IT)
Lycos
Lycos (ES)
Lycos (IT)
Libero (IT)
Mamma
Mahalo
Megasearching.net
Mirago (UK)
MyWebSearch.com
MSN
Microsoft live.com
MyWay
Netscape
NTLworld
Orange
Ozu ES
Paglo
Starware
Sweetim
Simpatico (IT)
Soso
Sproose
T-Online DE
Technorati Blog Search
Tesco Google search
Terra (ES)
Tiscali (UK)
TheSpider (IT)
VirginMedia
Web.de (DE)
Yahoo
Yahoo Japan
se_term
Same as parse_search_string().
findEngine
Returns a list with the hostname of the search engine as the first element and the canonical name as the second element.
my $ref = 'http://www.google.com/search?hl=en&q=a+simple+test&btnG=Google+Search';
my ($hostname, $canonical) = $uparse->findEngine( $ref ) ;
This will return 'google.com' as the search engine hostname and 'Google' as the name. This function will return undef on error.
se_host
Wrapper around findEngine - returns just the hostname. This function will return undef on error.
se_name
Wrapper around findEngine - returns just the canonical name; This function will return undef on error.
AUTHOR
Spiros Denaxas, <s.denaxas at gmail.com>
BUGS
This is my first CPAN module so I encourage you to send all comments, especially bad, to my email address.
This could not have been possible without the support of my co-workers at http://nestoria.co.uk - the easiest way of finding UK property.
SUPPORT
For more information, you could also visit my blog:
http://idaru.blogspot.com
COPYRIGHT & LICENSE
Copyright 2008 Spiros Denaxas, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.