NAME

distlinks -- check URL links, with database cache

SYNOPSIS

distlinks [--options] filename-or-dirname...

DESCRIPTION

Distlinks checks URLs found in files or a directory tree of files. An SQLite-3 database avoids rechecking links between multiple program runs. It's a bit rough but good for checking everything in a software distribution or similar.

Various file types are recognised and read appropriately to extract text parts to find URLs.

  • .gz and .bz2 gzip or bzip2.

  • .tar and .tar.gz Unix tar.

  • .zip

  • Text with UTF-16 or UTF-32 byte-order marker.

  • Image files per Image::ExifTool, so the text parts of PNG, JPEG, etc.

  • .mo message catalogue per gettext (recognised by content, so any filename).

  • Skip executables ELF, MS-DOS, etc as identified by File::Type.

URLs are distilled from text with free-form matching so they can be in plain text, program code, etc. The following specific forms are recognised,

  • Angles <http://foo.com> and <URL:http://foo.com> as sometimes recommended for mail messages etc.

  • Quotes `http://foo.com' per Emacs docstrings.

  • Bare foo.com/index.html taken to be http:.

  • Texinfo @url{http://foo.com}.

  • HTML href="foo.html", interpreted relative to a <base> or the file itself.

  • Skip variables $FOO in URLs, taken to be program code etc.

COMMAND-LINE OPTIONS

The command line options are

-V
--verbose
--verbose=N

Print some diagnostics about what's being done. With --verbose=2 or --verbose=3 print some technical details too. Eg.

distlinks --verbose
--version

Print the distlinks program version number. With --verbose=2 also print version numbers of some modules used.

CHECKING

news

Newsgroup references like "news:some.group.name" are checked by asking the news server whether the group exists. The news server used is per Net::NNTP, which means an NNTPSERVER or NEWSHOST environment variable or a Net::Config setup. For convenience distlinks tries "localhost" if none of those are set.

LWP comes with the usual http and ftp and secure variants built-in. Other schemas can be checked with add-on protocol back-ends, such as LWP::Protocol::ldap or LWP::Protocol::rsync.

ENVIRONMENT VARIABLES

NNTPSERVER
NEWSHOST

News server host name or IP number.

TMPDIR

Temporary directory per File::Temp and File::Spec, used for untarring archives etc and rsync temporaries.

FILES

~/.distlinks.sqdb

SQLite-3 database of information kept about checked URLs.

/etc/libnet.cfg
/etc/perl/Net/libnet.cfg

Net::Config configuration for news server.

BUGS

A .tar or similar archive is extracted into a directory under /tmp so that actual files can be reported on, but those temporary directories are never deleted.

SEE ALSO

Net::Config

chklinks(1), linkchecker(1)

HOME PAGE

http://user42.tuxfamily.org/distlinks/index.html

LICENSE

Copyright 2009, 2010, 2011, 2012, 2013, 2014 Kevin Ryde

Distlinks is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.

Distlinks is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Distlinks. If not, see http://www.gnu.org/licenses/.