NAME
distlinks -- check URL links, with database cache
SYNOPSIS
distlinks [--options] filename-or-dirname...
DESCRIPTION
Distlinks checks URLs found in files or a directory tree of files. An SQLite-3 database avoids rechecking links between multiple program runs. It's a bit rough but good for checking everything in a software distribution or similar.
Various file types are recognised and read appropriately to extract text parts to find URLs.
.gz and .bz2 gzip or bzip2.
.tar and .tar.gz Unix tar.
.zip
Text with UTF-16 or UTF-32 byte-order marker.
Image files per
Image::ExifTool
, so the text parts of PNG, JPEG, etc..mo message catalogue per
gettext
(recognised by content, so any filename).Skip executables ELF, MS-DOS, etc as identified by
File::Type
.
URLs are distilled from text with free-form matching so they can be in plain text, program code, etc. The following specific forms are recognised,
Angles
<http://foo.com>
and<URL:http://foo.com>
as sometimes recommended for mail messages etc.Quotes
`http://foo.com'
per Emacs docstrings.Bare
foo.com/index.html
taken to behttp:
.Texinfo
@url{http://foo.com}
.HTML
href="foo.html"
, interpreted relative to a<base>
or the file itself.Skip variables
$FOO
in URLs, taken to be program code etc.
COMMAND-LINE OPTIONS
The command line options are
- -V
- --verbose
- --verbose=N
-
Print some diagnostics about what's being done. With --verbose=2 or --verbose=3 print some technical details too. Eg.
distlinks --verbose
- --version
-
Print the distlinks program version number. With
--verbose=2
also print version numbers of some modules used.
CHECKING
- news
-
Newsgroup references like "news:some.group.name" are checked by asking the news server whether the group exists. The news server used is per
Net::NNTP
, which means anNNTPSERVER
orNEWSHOST
environment variable or aNet::Config
setup. For conveniencedistlinks
tries "localhost" if none of those are set.
LWP comes with the usual http
and ftp
and secure variants built-in. Other schemas can be checked with add-on protocol back-ends, such as LWP::Protocol::ldap or LWP::Protocol::rsync.
ENVIRONMENT VARIABLES
NNTPSERVER
NEWSHOST
-
News server host name or IP number.
TMPDIR
-
Temporary directory per
File::Temp
andFile::Spec
, used for untarring archives etc and rsync temporaries.
FILES
- ~/.distlinks.sqdb
-
SQLite-3 database of information kept about checked URLs.
- /etc/libnet.cfg
- /etc/perl/Net/libnet.cfg
-
Net::Config
configuration for news server.
BUGS
A .tar or similar archive is extracted into a directory under /tmp so that actual files can be reported on, but those temporary directories are never deleted.
SEE ALSO
HOME PAGE
http://user42.tuxfamily.org/distlinks/index.html
LICENSE
Copyright 2009, 2010, 2011, 2012, 2013, 2014 Kevin Ryde
Distlinks is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
Distlinks is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Distlinks. If not, see http://www.gnu.org/licenses/.