NAME
DBD::PgLite::MirrorPgToSQLite - Mirror tables from PostgreSQL to SQLite
SUMMARY
use DBD::PgLite::MirrorPgToSQLite qw(pg_to_sqlite);
pg_to_sqlite(
sqlite_file => '/var/pg_mirror/news.sqlite',
pg_dbh => $dbh,
schema => 'news',
tables => [ qw(news cat img /^x_news/)],
views => [ 'v_newslist' ],
indexes => 1,
verbose => 1,
snapshot => 1,
);
USAGE
The purpose of this module is to facilitate mirroring of tables from a PostgreSQL dataabse to a SQLite file. The module has only be tested with PostgreSQL 7.3 and SQLite 3.0-3.2. SQLite 2.x will probably not work; as for PostgreSQL, any version after 7.2 is supposed to work. If it doesn't, please let me know.
As seen above, options to the pg_to_sqlite() function (which is exported on request) are passed in as a hash. These options are described below. The default values can be changed by overriding the DBD::PgLite::MirrorPgToSQLite::defaults() subroutine.
Required options
Obviously, the mirroring function needs either a PosgtgreSQL database connection or enough information to be able to connect to the database by itself. It also needs the name of a target SQLite file, and a list of tables to copy between the two databases.
- pg_dbh, pg_user, pg_pass, pg_dsn
-
If a database handle is specified in pg_dbh, it takes precedence. Otherwise we try to connect using pg_dsn, pg_user, and pg_pass (which are assigned defaults based on the environment variables PGDATABASE, PGUSER and PGPASSWORD, if any of these is present).
- tables
-
The value of the required tables option should be an arrayref of strings or a string containing a comma-separated list of tablenames and tablename patterns. A tablename pattern is a string or distinct string portion delimited by forward slashes. To clarify: Suppose that a database contains the tables news, img, img_group, cat, users, comments, news_read_log, x_news_cat, x_news_img, and x_img_group; and that we want to mirror news, img, cat, x_news_img and x_news_cat, leaving the other tables alone. To achieve this, you would set the tables option to any of the following (there are of course also other possibilities):
(1) [qw(news img cat x_news_img x_news_cat)] (2) 'news, img, cat, x_news_img, x_news_cat' (3) [qw(news /img$/ /cat$/)] (4) 'news,/img$/,/cat/'
The purpose of this seemingly unneccesary flexibility in how the table list is specified is to make the functionality of the module more easily accessible from the command line.
Please note that the patterns between the slash delimiters are not Perl regular expressions but rather POSIX regular expressions, used to query the PostgreSQL system tables directly.
- sqlite_file
-
This should specify the full path to a SQLite file. While the mirroring takes place, the incoming data is not written directly to this file, but to a file with the same name except for a '.tmp' extension. When the operation has finished, the previous file with the name specified (if any) is renamed with a '.bak' extension, and the .tmp file is renamed to the requested filename. Unless you use the append option, the information previously in the file will be totally replaced.
Other options
- schema
-
This signifies the schema from which the tables on the PostgreSQL side are to be fetched. Default: 'public'. Only one schema can be specified at a time.
- where
-
A WHERE-condition appended to the SELECT-statement used to get data from the PostgreSQL tables.
- views
-
A list of views, specified in the same manner as the list of tables for the tables option. An attempt is made to define corresponding views on the SQLite side (though this functionality is far from reliable).
- indexes
-
A boolean option indicating whether to create indexes for the same columns in SQLite as in PostgreSQL. Default: false. (Normally only the primary key is created).
- functions
-
A boolean indicating whether to attempt to create functions on the SQLite side corresponding to any SQL language (NOT PL/pgSQL or other procedural language) functions in the PostgreSQL database. This is for use with DBD::PgLite only, since these functions are put into the pglite_functions table. Default: false.
- page_limit
-
Normally the information from the PostgreSQL tables is read into memory in one go and transferred directly to the SQLite file. This is, however, obviously not desireable for very large tables. If the PostgreSQL system tables report that the page count for the table is above the limit specified by page_limit, the table is instead transferred row-by-row. Default value: 5000; since each page normally is 8K, this represents about 40 MB on disk and perhaps 70-100 MB of memory usage by the Perl process. For page_limit to work, the table must have a primary key.
NB! Do not set this limit lower than necessary: it is orders of magnitude slower than the default "slurp into memory" mode.
- append
-
If this boolean option is true, then instead of creating a new SQLite file, the current contents of the sqlite_file are added to. If a table which is being mirrored existed previously in the file, it is dropped and recreated, but any tables not being copied from PostgreSQL in the current run are left alone. (This is primarily useful for mirroring some tables in toto, and others only in part, into the same file). Default: false. Incompatible with the snapshot option.
- snapshot
-
If this is true, then the copying from PostgreSQL takes place in serialized mode (transaction isolation level serializable), which should ensure consistency of relations between tables linked by foreign key constraints. Currently, foreign keys are not created on the SQLite side, however. Default: false. Incompatible with the append option.
- cachedir
-
The current method for getting information about table structure in PostgreSQL is somewhat slow, especially for databases with very many tables. To offset this, table definitions are cached in a temporary directory so that subsequent mirrorings of the same table will go faster. The downside is, of course, that if the table structure changes, the cache needs to be cleared manually. The cache directory can be specified using this option; the default is /tmp/sqlite_mirror_cache (with separate subdirectories for each user).
- verbose
-
If this is true, a few messages will be output to stderr during the mirroring process.
TODO
Support for foreign keys is missing.
The method used to read tables bigger than page_limit needs to be improved.
It would be nice to have a quick way of telling whether the cached table definition of a specific table is still valid.
Tests.
AUTHOR
Baldur Kristinsson (bk@mbl.is), 2004-2006.
Copyright (c) 2006 Baldur Kristinsson. All rights reserved.
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.