NAME
Net::Wire10 - Pure Perl driver for MySQL, Sphinx and Drizzle.
DESCRIPTION
Net::Wire10 is a Pure Perl connector that talks to MySQL, Sphinx and Drizzle servers.
Net::Wire10 implements the low-level network protocol, alias the MySQL wire protocol version 10, necessary for talking to one of the aforementioned servers without using an external client library such as for example libmysqlclient or libdrizzle.
SYNOPSIS
use Net::Wire10;
my $wire = Net::Wire10->new(
host => 'localhost',
user => 'test',
password => 'test',
database => 'test'
);
$wire->connect;
# CREATE TABLE example
$wire->query(
"CREATE TABLE foo (id INT, message TEXT)"
);
# INSERT example
$wire->query(
"INSERT INTO foo (id, message) VALUES (1, 'Hello World')"
);
printf "Affected rows: %d\n", $wire->get_no_of_affected_rows;
# SELECT example
$wire->query("SELECT * FROM foo");
my $results = $wire->create_result_iterator;
while (my $row = $results->next_hash) {
printf
"Id: %s, Message: %s\n",
$row->{id},
$row->{message};
}
$wire->disconnect;
INSTALLATION
Net::Wire10 is installed like any other CPAN perl module:
$ perl -MCPAN -e shell
cpan> install Net::Wire10
For Perl installations where the CPAN module (used above) is missing, you can also just download the .tar.gz from this site and drop the Net folder in the same folder as the Perl file you want to use the driver from.
Some (particularly commercial) Perl distributions may have their own package management systems. Refer to the documentation that comes with your particular Perl distribution for details.
USAGE
From Perl you can begin using the driver like this:
use Net::Wire10;
After that you can connect to a server and start interacting.
Take a look at SYNOPSIS for a complete code example, or individual examples below.
Example: connect
Connection parameters are specified in the constructor to Net::Wire10. After constructing an instance using new(), call connect() to begin talking to the server.
$wire = Net::Wire10->new(
host => $host,
user => $user,
password => $password,
);
$wire->connect();
The most frequently used parameters are shown above. For additional parameters, refer to new.
Example: create table
$wire->query(
"CREATE TABLE foo (id INT, message TEXT)"
);
Example: insert data
$wire->query(
"INSERT INTO foo (id, message) VALUES (1, 'Hello World')"
);
Example: retrieve data
For retrieving results, you need to create an iterator:
$wire->query(
"SELECT id, message FROM foo"
);
my $results = $wire->create_result_iterator;
Data is traversed in a row-by-row fashion. To retrieve each row of data, call next_hash() or next_array():
my $row = $results->next_hash;
Do so in a loop to process all rows:
while (my $row = $results->next_hash) {
printf
"Id: %d, Message: %s",
$row->{id},
$row->{message};
}
Example: disconnect
$wire->disconnect;
FEATURES
Features in the Net::Wire10 driver
new
Creates a new driver instance using the specified parameters.
use Net::Wire10;
use strict;
use warnings;
# Connect to Sphinx server on localhost
my $wire_tcp = Net::Wire10->new(
host => 'localhost',
port => 3307,
user => 'test',
password => 'test',
);
The argument hash can contain the following parameters:
- host
-
Host name or IP address of the server to which a connection is desired.
- port
-
TCP port where the server daemon listens. The port differs depending on the server type, so that you can have more than one type of server installed on the same machine.
MySQL uses port 3306, Sphinx uses port 3307, Drizzle uses port 4427.
The default is 3306 (MySQL), which is the most commonly used at the moment.
Sphinx needs to be at least version 0.9.9-rc2 for SphinxQL to work. There's more information about SphinxQL here: http://sphinxsearch.com/docs/current.html#sphinxql.
- database
-
Name of the initial default database, eg the database used for a query when that query does not specifically mention a database. The server may deny login if a database is given and the user does not have the necessary privileges to access that database.
- user
-
Username for identifying the service or user to the database server. This will show up in the process list, which is useful if you need to see what or who is hogging resources on the server.
- password
-
Password for authenticating the service or user to the database server.
- timeout
-
How long to wait before a connection attempt fails, and also how long to wait before a query is aborted.
- debug
-
Various informational messages will be printed to the console if a value of 1 is given. The exchanged network protocol messages ("mackets") will be printed to the console if a value of 2 is given. The exchanged TCP packets will be printed to the console if a value of 4 is given.
connect
Establishes or re-establishes a connection to the server.
ping
Pings the daemon on the connected server over the wire protocol.
After connecting, a ping() is useful to ensure that the connection is still alive. The current status of the connection can be looked up using is_connected().
query
The query() method transmits the specified SQL string to the server and obtains the response, including the full set of results.
A result set can be obtained with create_resultset_iterator(). The obtained result set will be disconnected from the driver. Being disconnected means that after retrieving the result set, you can fire more queries or close the connection.
Use query() if you want a small amount of records that you can use at an arbitrary later point in time. If you want to stream data to a live result set, including large amounts of data, see stream().
stream
The stream() method transmits the specified SQL string to the server, and obtains initial information about the response, but does not begin downloading data.
A result set can be obtained with create_resultset_iterator(). The obtained result set will be live with the driver. After retrieving the result set, you must traverse all of its rows before you can fire another query using the driver it is associated with.
Use stream() for large result sets, as it has a smaller memory footprint compared to query(). If you want to download data to a disconnected result set, use query().
Note that stream() will lock the driver until the whole result set has been retrieved. To fetch another set of results while streaming to a live result set, create another driver object.
Also note that if you are using MySQL with the default storage engine, MyISAM, the entire table on the server will be locked for the duration of the live result set, that is until all rows has been retrieved.
cancel (thread safe)
Cancels the running query. Safe to call asynchronously (thread safe).
use threads;
# abort on some condition (sleep for demonstration purposes)
threads->create(sub {
$wire->cancel if sleep 2;
})->detach;
# run query for 10 seconds (will be interrupted by above thread)
query("SELECT SLEEP(10)");
Run the example above to see how cancel() works to interrupt the SLEEP(10) statement before it finishes, after only 2 seconds instead of the full 10.
Another common way to kill a running query is to create another connection to the server, if possible, and use KILL QUERY (or KILL CONNECTION for older servers). Using the KILL commands require a number of things, for example that the server has available connection slots, server privileges, etc.
has_results
Returns true if the previous query returned a set of results. When the return value is true, create_result_iterator() can be used to traverse the results.
create_result_iterator
When a type of SQL query is specified that returns a result set, eg. a SELECT type query, a Net::Wire10::Results object can be used to traverse the results.
$wire->query("SELECT * FROM foo");
my $results = $wire->create_result_iterator();
See "Features in the Net::Wire10::Results iterator" for more.
is_connected
Returns true after connect() has been called, for as long as no fatal errors has occurred.
After a fatal error, a successful call to connect() causes is_connected to return true once again.
is_error
Returns true if an error code and message is available from the server or the driver.
get_no_of_affected_rows
Returns the number of rows influenced by the UPDATE, DELETE or similar query.
my $affected = $wire->get_no_of_affected_rows;
get_no_of_selected_rows
Returns the number of rows in the result set of a SELECT or similar query. Supported only for disconnect result sets, live/streaming result sets are unaware of the total number of records.
my $selected = $wire->get_no_of_selected_rows;
get_insert_id
MySQL and Drizzle has the ability to choose unique key values automatically, by enabling auto_increment for a column. When this happens, the newly assigned id value for the last inserted row is stored in this attribute.
get_server_status
After a query, this returns various status flags from the server. If the query is streaming to a live result set, the status information is available after the last row of data has been read.
get_server_version
After connecting, this returns the server version.
get_connection_id
After connecting, this returns a unique identifier for the thread on the server that handles the current connection.
get_warning_count
After a query, this returns the number of warnings generated on the server. If the query is streaming to a live result set, the warning count is available after the last row of data has been read.
disconnect
Transmits a goodbye message to the server to indicate intentions, then closes the underlying socket.
Features in the Net::Wire10::Results iterator
A Net::Wire10::Results object is created by calling create_result_iterator(). Depending on whether query() or stream() was used to execute the SQL, create_result_iterator() will return either a disconnect result set or a live (streaming) result set.
next_array
The next_array() method returns a whole row, with individual field values packed into an array. undef
is returned once all rows has been extracted.
while (my $row = $results->next_array) {
printf
"Value 1: %s Value 2: %s Value 3: %s\n",
$row->[0],
$row->[1],
$row->[2];
}
When the retrieved columns has been specifically named in the SELECT statement (rather than using the SELECT *
wildcard), the position of each individual field in result set rows are known, and next_array() can be used to access field values based on their position.
Column name and order for a SELECT *
query can be retrieved using get_column_names().
next_hash
The next_hash() method returns a whole row, with individual field values packed into a hash. The key of the hash is the name of the column that each field belongs to. undef
is returned once all rows has been extracted.
while (my $row = $results->next_hash) {
printf
"Id: %s Name: %s Data: %s\n",
$row->{id},
$row->{name},
$row->{data};
}
Using next_hash() instead of next_array() usually makes the code a bit more readable, especially in cases where a SELECT with column names is not nearby.
flush
Reads the remaining rows of the result set and discards them. When done on a live result set, this frees the driver for use.
get_no_of_columns
Returns the number of columns in the set of results.
get_column_names
Return the names of the result set's columns as an array.
Error handling features
There are two kind of errors in Net::Wire10, fatal and non-fatal errors. Fatal errors causes the connection to close, while non-fatal errors do not.
catching errors
All errors can be caught with an eval {} construct. To differentiate between a fatal and a non-fatal error, use is_connected().
# Create driver and connect
$wire = Net::Wire10->new(host=>'localhost', user=>'test', password=>'test');
$wire->connect;
# Execute nonsensical query
eval { $wire->query('Argle-bargle, glyp-glof?!'); };
warn $@ if $@;
print ($wire->is_connected ? "is" : "is not") . " connected.";
recognizing fatal errors
Here's a query that causes a fatal error:
# Execute query that kills the current connection
eval { $wire->query('KILL CONNECTION CONNECTION_ID()'); };
warn $@ if $@;
print ($wire->is_connected ? "is" : "is not") . " connected.";
After running the above code, it is necessary to reconnect the driver before doing additional work.
reconnecting
Once a fatal error has happened, it is trivial to reestablish a connection to the server.
# Reconnect if necessary
$wire->connect unless $wire->is_connected;
Notice that the connection id changes after this.
multiple jobs
If stability is sought, always wrap your code with guard blocks in reasonable spots. For example:
foreach $job (@processing_items) {
# Process job
eval {
$wire->connect unless $wire->is_connected;
$wire->query("SELECT ... interesting data ...");
$wire->query("SELECT ... more interesting data ...");
... more processing ...
$wire->query("UPDATE ... whatever ...");
};
# Handle errors
warn "Failed to process item, continuing with next item: $@" if $@;
}
In the above, if any of the jobs in processing_items fail, a warning is printed, and the program reconnects (if necessary) and continues with the next job.
long term connections
If you have long-running jobs, always do a ping to check the connection after periods of inactivity. For example:
# Process some job, then sleep and repeat
while (1) {
eval { $wire->ping if $wire->is_connected; };
eval {
# If the ping (above) failed, reconnect
$wire->connect unless $wire->is_connected;
# Actual process code
$wire->query("SELECT ... interesting data ...");
$wire->query("SELECT ... more interesting data ...");
... more processing ...
$wire->query("UPDATE ... whatever ...");
};
# Handle errors
warn "Something went wrong: $@" if $@;
# Wait a while before going again
sleep 300;
}
In the above, some job is run once every 5 minutes. Before the job is run, the connection is checked with ping(). Should the connection have failed silently, the ping will provoke a fatal error and soon thereafter, connect() will reestablish the connection before processing starts.
These are much superior approaches to automatically reconnecting before firing each query (which some other drivers do). Automatic reconnect schemes hides error messages, looses connection state, drops transactions, can cause half a transaction to complete, can cause table locks to be silently dropped, and similar malfunctions.
If you really want, you can amend the driver object with an extra version of query() (give it a different name) that automatically reconnects, or you can wrap the entire driver in your own object to the same effect, but I much recommend against it. Proper guard blocks, with a connection check at the beginning (as above), is a much healthier practice.
TROUBLESHOOTING
Supported operating system and Perl versions
This module has been tested on these platforms.
Windows Server 2008
with ActivePerl 5.10.0 build 1004, 32 and 64-bit
Feel free to send in reports of success or failure using different platforms!
Unsupported features
The following features are not supported.
Protocols other than version 10
Only protocol version 10 is supported.
Transports other than TCP/IP
Shared memory is not supported. Might be easy to add, my guess would be that it's just a ring buffer and one or two locks to synchronize readers and writers.
Named pipe is not supported.
Unix socket is not supported.
Character sets other than UTF-8
Only the UTF-8 character set is supported. Query strings that are not representible in UTF-8 needs special treatment (see below). Result data is binary strings for binary data and UTF-8 strings for textual data.
The server considers characters in the query received from the client to be in the character set indicated by the character_set_client variable. This variable is implicitly set to UTF-8 during login. Subsequently, the driver automatically converts all query text to UTF-8, making it simple to execute queries without considering the character set.
The simplest method of circumventing this in order to send binary data to the server is to use a text-to-binary notation such as "0x010203". For example:
(Note: if the second character in "dæmons" below is not Unicode U+00E6, then the documentation compiler failed to correctly determine the character set of its input files. For comparison, see http://www.fileformat.info/info/unicode/char/00e6/.)
# Helper that converts binary to ASCII hex notation.
sub raw_to_hex { return unpack("H*", shift); }
# Create a demonstration table.
$wire->query("CREATE TABLE dæmons (name TEXT, raw BLOB)");
# Demonstration SQL query and binary data.
my $sql = "INSERT INTO dæmons (name, raw) VALUES ('dæmons be here', _binary x'!')";
my $bindata = pack("CCCCC", 1, 2, 3, 254, 255);
# Add hex transliteration of the binary data to query
# (in place of the ! exclamation mark).
my $hex = raw_to_hex($bindata);
$sql =~ s/!/$hex/;
# Run the query.
$wire->query($sql);
A more economical encoding than hex notation would be to use BASE64, although the server may not support this natively.
Another more complex method exists, which uses less network bandwidth. It involves stuffing binary data into the query string in raw form. The server will actually consider any string literal following a "_binary" token to be raw binary data, while the rest of the query is still considered UTF-8. Thus it is also possible to send raw binary data to the server like this:
use Encode;
# Create a demonstration table.
$wire->query("CREATE TABLE dæmons (name TEXT, raw BLOB)");
# Demonstration SQL query and binary data.
my $sql = "INSERT INTO dæmons (name, raw) VALUES ('dæmons be here', _binary '!')";
my $bindata = pack("CCCCC", 1, 2, 3, 254, 255);
# Make sure the SQL is in UTF-8, and the binary data is included untouched.
my @parts = split(/!/, $sql);
my $raw_query = join('',
Encode::encode_utf8($parts[0]),
$bindata,
Encode::encode_utf8($parts[1])
);
# Tell the driver that the string is already UTF-8'ified,
# so it will refrain from automatically upgrading it.
Encode::_utf8_on($raw_query);
# Run the delicately constructed query.
$wire->query($raw_query);
This notation works only for string literals. Schema literals such as table names are always in UTF-8, the server does not accept _binary (or other character set tokens, for that matter) in front of schema literals.
A subtle difference between the two methods is that some servers will not check whether the input is valid UTF-8 if the second method is used, but will check validity if the first method is used.
See also notes on prepared statement support.
Protocol compression via zlib
There is not much documentation regarding protocol compression, therefore it has not been implemented.
It is possible to add compression at the network level instead using a tunnel rather than the protocol's own support, similarly to stunnel described below, if network bandwidth is at a premium.
Another option is to selectively compress data with zlib before handing it off to the driver, and using the function DECOMPRESS() to expand the data again once it reaches the server. See notes on LOAD DATA LOCAL INFILE for an example related to zlib compression. See notes on character sets for examples related to transmitting binary (eg. compressed) data to the server.
Verification of authenticity using SSL certificates
Verifying the server requires SSL support which is not currently implemented.
Verifying the client is not supported by the wire protocol.
SSL/TLS encrypted communication
There is not much documentation regarding protocol support for SSL/TLS, therefore it has not been implemented.
It is possible to add encryption at the network level by using a SSL/TLS wrapper such as "stunnel".
Stunnel provides a richer set of features than the current MySQL protocol supports, such as certificate-based authentication of the client in addition to the server.
Integrated support would be desirable because of simpler error handling and the possibility of performing key creation and certificate signing tasks via SQL.
Server-side prepared statements
Server-side prepared statements would be very nice to have, but are currently not implemented. The protocol details are described in the manual.
High-granularity streaming
Streaming data along the way as it is consumed or delivered by the client application can lead to dramatical decreases in memory usage.
Streaming outgoing data could be accomplished with server-side prepared statements, because the wire protocol allows prepared statement parameters to be sent one at a time, and even in chunks. (See above.)
The driver API currently allows client applications to stream incoming data one row at a time using the iterator. The highest supported granularity for streaming is one whole row at a time. Streaming at a higher granularity is not part of the current protocol design.
Streaming incoming data in a row-by-row fashion is also known in other drivers as "use_result mode".
Multiple statements per query
Currently unsupported. Multiple statements can often cause more trouble than gain by making SQL injection (a security risk) much easier and in return providing diminutive performance gains when compared to other approaches.
If you want to run multiple queries, one method is to create two separate connections. This can also give a performance boost, because one query does not wait for the other to finish. Another related advantage is that multi-core servers can actually perform both queries simultaneously.
Two or more queries can be started simultaneously without resorting to multiple threads on the client. This is done by starting the queries using the stream() call, which does not wait for the server to return data.
See also notes on connection pooling, which can be a useful technique to avoid initialization overhead when creating multiple connections.
Multiple result sets per query
Currently unsupported. Support would be nice to have because MySQL Server requires this for executing stored procedures (but not stored functions) that digs out result set data: before the actual result set, MySQL Server will for any CALL statement generate an extra result set which contains some sort of status data about how the execution went.
Normally this kind of information is sent back in protocol status fields, but that differs when using CALL. The implicit extra result set is very poorly documented, so it is hard to say if it is useful or not. Many utilities seems to completely ignore it.
Multiple result sets per query is documented to be incompatible with prepared statements, but the reason why is not.
If you get the error message "can't return a result set in the given context", this is the server telling you that it would like to generate an extra result set, but can't because the connector does not support it.
Non-query related commands
Various non-query-related protocol commands that may be useful are not currently supported, for example COM_SHUTDOWN to initiate a server shutdown and COM_DEBUG to trigger the server into dumping a bunch of debug information.
Out of scope features
Connection pooling
Connection pooling can improve performance by removing the need for a TCP handshake and SSL/TLS handshake (for encrypted sessions) when opening a connection, and can also alleviate TIME_WAIT problems in environments with many short-lived connections.
There is no particular advantage to putting pooling code in the driver core itself, but a standard implementation that comes along with the driver would be nice. Currently no such thing exists for this driver.
The LOAD DATA LOCAL INFILE client statement
Poorly documented and therefore unsupported. If necessary, you can emulate the LOAD DATA LOCAL INFILE client statement using two other queries in unison, namely SELECT INTO DUMPFILE and LOAD DATA INFILE.
For optimal network performance, compress the data using zlib before sending it to the driver.
Binary data can be sent in raw form via the driver, see notes on character set support. In some situations it is less complex to send binary data in a BASE64 encoding. Here's an example that does just that.
First, add a BASE64 decoder to MySQL Server, for example using this stored procedure: http://wi-fizzle.com/downloads/base64.sql. Next, in the client, compress using zlib and then encode using base64 the file you want to upload. Last, upload the file and ask the server to decode and parse it:
use Compress::Zlib;
use MIME::Base64 qw(encode_base64);
use Net::Wire10;
# Connect to database server
$wire = Net::Wire10->new(host=>'localhost', user=>'test', password=>'test');
$wire->connect;
# Go!
upload_file('mydata.csv', '`test`.`sometable`');
upload_file('mydata2.csv', '`test`.`othertable`');
sub upload_file {
my $filename = shift;
my $table = shift;
# Load file
open(FILE, $filename) or die "$!";
@rawdata = <FILE>;
# Compress with zlib
my $compressed = compress(@rawdata);
# Encode with base64
my $textformat = encode_base64($compressed);
# Upload file to server
$wire->query("SELECT UNCOMPRESS(BASE64_DECODE('$textformat')) INTO DUMPFILE '/tmp/upload.csv'");
# Load data - notice that this is server-side, there is no LOCAL keyword.
$wire->query("LOAD DATA INFILE '/tmp/upload.csv' INTO TABLE $table");
# Reclaim disk space used by temporary file.
$wire->query("SELECT '' INTO DUMPFILE '/tmp/upload.csv'");
}
When a similar solution was benchmarked, the performance was identical to a client-side version using the LOCAL feature (tested using the C API).
Be careful to escape data in the CSV file. MySQL Server does not use the same CSV format as various Office applications. One difference is that backslashes in CSV data are considered escape characters by MySQL Server.
The DELIMITER client statement
Mostly useful for CLIs, and requires a lot of client-side SQL parsing. Can be implemented in a CLI if necessary, there is no advantage to putting this in the driver.
Miscellaneous other limitations
Due to the way the Perl interpreter works, the following limitations may also apply.
Result sets limited to ~2 billion results on 32-bit Perl or ~9 quintillion results on 64-bit Perl.
Field values limited to under 2 GiB on 32-bit Perl or under 8 EiB on 64-bit Perl.
get_insert_id() values limited to ~2 billion on 32-bit Perl or ~9 quintillion on 64-bit Perl.
If you need BIGINT UNSIGNED for AUTO_INCREMENT columns on 64-bit Perl (or INT UNSIGNED on 32-bit Perl), use "SELECT LAST_INSERT_ID()" rather than get_insert_id().
LAST_INSERT_ID() returns the sequence number as a string instead of as an integer, avoiding any overflow issues with Perl's built-in types. You may also need to disable parsing of the insert_id protocol field in the driver to avoid an overflow error being thrown.
Known bugs
There should be a bug tracker where you can find known bugs.
Bugs in the design of the over-the-wire protocol may affect how the driver works. For example:
No character set info is sent with metadata such as column names, forcing the driver to assume that the client and result character sets are constant, when in fact they are not.
Selecting data larger than max_packet_size server variable causes silent truncation (inserting does not).
Refer to the MySQL issue tracker for more server and protocol issues.
Bugs and design issues in Perl itself may also affect the driver. For example:
Using an alarm to invoke cancel() often does not work, especially on Windows (use a thread instead).
Integer scalars are always signed.
Bugs in IO::Socket or IO::Select may also affect the driver.
SEE ALSO
AUTHOR
Driver core hacked together by Kim Christensen and David Dindorp at Dubex A/S. Password hashing routines by Hiroyuki OYAMA E, Japan.
COPYRIGHT AND LICENCE
Copyright (C) 2002 and (C) 2009 as described in AUTHORS.
This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
WARRANTY
Because this software is licensed free of charge, there is absolutely no warranty of any kind, expressed or implied.