NAME
CPAN::Search::Lite::Populate - create and populate database tables
DESCRIPTION
This module is responsible for creating the tables (if setup
is passed as an option) and then for inserting, updating, or deleting (as appropriate) the relevant information from the indices of CPAN::Search::Lite::Info and CPAN::Search::Lite::PPM and the state information from CPAN::Search::Lite::State. It does this through the insert
, update
, and delete
methods associated with each table.
Note that the tables are created with the setup
argument passed into the new
method when creating the CPAN::Search::Lite::Index
object; existing tables will be dropped.
TABLES
The tables used are described below.
mods
This table contains module information, and is created as
mod_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
dist_id SMALLINT UNSIGNED NOT NULL
mod_name VARCHAR(100) NOT NULL
mod_abs TINYTEXT
doc bool
mod_vers VARCHAR(10)
dslip CHAR(5)
chapterid TINYINT(2) UNSIGNED
PRIMARY KEY (mod_id)
FULLTEXT (mod_abs)
KEY (dist_id)
KEY (mod_name(100))
mod_id
This is the primary (unique) key of the table.
dist_id
This key corresponds to the id of the associated distribution in the
dists
table.mod_name
This is the module's name.
mod_abs
This is a description, if available, of the module.
doc
This value, if true, signifies that documentation for the module exists, and is located, eg, in dist_name/Foo/Bar.pm for a module
Foo::Bar
in thedist_name
distribution.src
This value, if true, signifies that the source code for the module exists, and is located, eg, in dist_name/Foo/Bar.pm for a module
Foo::Bar
in thedist_name
distribution.mod_vers
This value, if present, gives the version of the module.
dslip
This is a 5 character string expressing the dslip (development, support, language, interface, public license) information.
chapterid
This number corresponds to the chapter id of the module, if present.
dists
This table contains distribution information, and is created as
dist_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
stamp TIMESTAMP(8)
auth_id SMALLINT UNSIGNED NOT NULL
dist_name VARCHAR(90) NOT NULL
dist_file VARCHAR(110) NOT NULL
dist_vers VARCHAR(20)
dist_abs TINYTEXT
size MEDIUMINT UNSIGNED NOT NULL
birth DATE NOT NULL
readme bool
changes bool
meta bool
install bool
PRIMARY KEY (dist_id)
FULLTEXT (dist_abs)
KEY (auth_id)
KEY (dist_name(90))
dist_id
This is the primary (unique) key of the table.
stamp
This is a timestamp for the table indicating when the entry was either inserted or last updated.
auth_id
This corresponds to the CPAN author id of the distribution in the
auths
table.dist_name
This corresponds to the distribution name (eg, for My-Distname-0.22.tar.gz,
dist_name
will beMy-Distname
).dist_file
This corresponds to the CPAN file name.
dist_vers
This is the version of the CPAN file (eg, for My-Distname-0.22.tar.gz,
dist_vers
will be0.22
).dist_abs
This is a description of the distribtion. If not directly supplied, the description for, eg,
Foo::Bar
, if present, will be used for theFoo-Bar
distribution.size
This corresponds to the size of the distribution, in bytes.
birth
This corresponds to the last modified time of the distribution, in the form YYYY/MM/DD.
readme
This value, if true, indicates that a README file for the distribution is available.
changes
This value, if true, indicates that a Changes file for the distribution is available.
meta
This value, if true, indicates that a META.yml file for the distribution is available.
install
This value, if true, indicates that an INSTALL file for the distribution is available.
auths
This table contains CPAN author information, and is created as
auth_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
cpanid VARCHAR(20) NOT NULL
fullname VARCHAR(40) NOT NULL
email TINYTEXT
PRIMARY KEY (auth_id)
FULLTEXT (fullname)
KEY (cpanid(20))
auth_id
This is the primary (unique) key of the table.
cpanid
This gives the CPAN author id.
fullname
This is the full name of the author.
email
This is the supplied email address of the author.
chaps
This table contains chapter information associated with distributions. PAUSE allows one, when registering modules, to associate a chapter id with each module (see the mods
table). This information is used here to associate chapters (and subchapters) with distributions in the following manner. Suppose a distribution Quantum-Theory
contains a module Beta::Decay
with chapter id 55
, and another module Laser
with chapter id 87
. The Quantum-Theory
distribution will then have two entries in this table - chapterid
of 55 and subchapter
of Beta, and chapterid
of 87 and subchapter
of Laser.
The table is created as follows.
chap_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
chapterid TINYINT UNSIGNED NOT NULL
dist_id SMALLINT UNSIGNED NOT NULL
subchapter TINYTEXT
KEY (dist_id)
chap_id
This is the primary (unique) key of the table.
chapterid
This number corresponds to the chapter id.
dist_id
This is the id corresponding to the distribution in the
dists
table.subchapter
This is the subchapter.
reqs
This table lists the prerequisites of the distribution, as found in the META.yml file (if supplied - note that only relatively recent versions of ExtUtils::MakeMaker
or Module::Build
generate this file when making a distribution). The table is created as
req_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
dist_id SMALLINT UNSIGNED NOT NULL
mod_id SMALLINT UNSIGNED NOT NULL
req_vers VARCHAR(10)
KEY (dist_id)
req_id
This is the primary (unique) key of the table.
dist_id
This corresponds to the id of the distribution in the
dists
table.mod_id
This corresponds to the id of the prerequisite module in the
mods
table.req_vers
This is the version of the prerequisite module, if specified.
ppms
This table contains information on Win32 ppm packages available in the repositories specified in $repositories
of CPAN::Search::Lite::Util. The table is created as
ppm_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
dist_id SMALLINT UNSIGNED NOT NULL
rep_id TINYINT(2) UNSIGNED NOT NULL
ppm_vers VARCHAR(20)
KEY (dist_id)
ppm_id
This is the primary (unique) key of the table.
dist_id
This is the id of the distribution appearing in the
dists
table.rep_id
This is the id of the repository appearing in the
$repositories
data structure.ppm_vers
This is the version of the ppm package found.
reps
This table contains information on the Win32 ppm repositories specified in $repositories
of CPAN::Search::Lite::Util. The table is created as
rep_id SMALLINT UNSIGNED NOT NULL
abs TINYTEXT
browse TINYTEXT
perl VARCHAR(10)
alias VARCHAR(20)
KEY (rep_id)
rep_id
This is the primary (unique) key of the table, and corresponds to the
rep_id
of theppms
table.abs
This is a description of the repository.
browse
This is a URL where one can browse the repository.
perl
This specifies the perl version the repository corresponds to.
alias
This specifies a short alias for the repository.
chapters
This contains information on the chapters. The table is created as
chapterid SMALLINT UNSIGNED NOT NULL
chap_link TINYTEXT
KEY (chapterid)
chapterid
This is the id of the distribution appearing in the
dists
table.This is the primary (unique) key of the table, and corresponds to the
chapterid
of thedists
,mods
, andchaps
table.chap_link
This is a description of the chapter that
chapterid
corresponds to (eg,File_Handle_Input_Output
).
CATEGORIES
When uploading a module to PAUSE, there exists an option to assign it to one of 24 broad categories. However, many modules have not been assigned such a category, for one reason or another. When populating the tables, the AI::Categorizer module is used to guess a possible category for those modules that haven't been assigned one, based on a training set based on the modules that have been assigned a category (see <AI::Categorizer> for general details). If this guess is above a configurable threshold (see CPAN::Search::Lite::Index, the guess is accepted and subsequently inserted into the database, as well as updating the categories associated with the module's distribution.