NAME

Win32::UrlCache - parse Internet Explorer's history/cache/cookies

SYNOPSIS

use Win32::UrlCache;
my $index = Win32::UrlCache->new( 'index.dat' );
foreach my $url ( $index->urls ) {
  print $url->url, "\n";
}

Or, you can use callback function if you care memory usage.

use Win32::UrlCache;
my $index = Win32::UrlCache->new( 'index.dat' );
$index->urls( callback => \&callback )

sub callback {
  my $entry = shift;
  my $url = $entry->url;
     $url =~ s/^Visited: //;
  $entry->url( $url );

  print $entry->url, "\n";
  return;  # to prevent the entry from being kept in the object
}

If you want to know the title of the cached page (for Win32 only):

use Win32::UrlCache::Cache;
use Win32::UrlCache::Title;
use Encode;
my $cache = Win32::UrlCache::Cache->new;
   $cache->urls( callback => \&callback )

sub callback {
  my $entry = shift;

  print $entry->url, "\n";
  my $title = Win32::UrlCache::Title->extract( $entry->filename );
  print encode( shiftjis => $title ), "\n\n" if $title;

  return;
}

DESCRIPTION

This parses so-called "Client UrlCache MMF Ver 5.2" index.dat files, which are used to store Internet Explorer's history, cache, and cookies. As of writing this, I've only tested on Win2K + IE 6.0, but I hope this also works with some of the other versions of OS/Internet Explorer. However, note that this is not based on the official/public MSDN specification, but on a hack on the web. So, caveat emptor in every sense, especially for the redr entries ;)

Patches and feedbacks are welcome.

METHODS

new

receives a path to an 'index.dat', and parses it to create an object.

urls

returns URL entries in the 'index.dat' file. Each entry has url, filename, headers, filesize, last_modified, last_accessed, and optionally, title accessors (note that some of them would return meaningless values). As of 0.02, it can receive a callback function. See below. As of 0.04, you can also pass ( extract_title => 1 ) to extract title. However, this extraction is processed after a callback. So, if you want both to use a callback and to extract title, you might want to insert extraction code into the callback as shown in the synopsis.

leaks

almost the same as urls, but returns LEAK entries (if any) in the 'index.dat' file.

redrs

returns REDR entries (if any) in the 'index.dat' file. Each entry has a url accessor. As of 0.02, it can receive a callback function.

CALLBACK

Three methods shown above return all the entries found in the index by default, but this may eat lots of memory especially if you use IE as a main browser. As of 0.02, those methods may receive a callback function, which will take an entry for the first (and only, as of writing this) argument. If the callback returns true, the entry will be stored in the ::UrlCache object, and if the callback returns false, the entry will be discarded after the callback is executed.

SEE ALSO

http://www.latenighthacking.com/projects/2003/reIndexDat/

AUTHOR

Kenichi Ishigaki, <ishigaki at cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2007 by Kenichi Ishigaki.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.