NAME
MySQL::Replication - Decentralised, peer-to-peer, multi-master MySQL replication
DESCRIPTION
What is MySQL::Replication
MySQL::Replication is a replacement for MySQL's built-in replication. The reason for this module is that there are a number of issues with MySQL's built-in replication:
Multi-Master Replication
By design, MySQL can only replicate from a single master. There are some approaches to achieving multi-master replication, but they have problems:
Ring Topologies Are Fragile
Multi-master replication is possible by using a ring topology:
+-----------------+ +-----------------+ | db1.example.com | ----> | db2.example.com | +-----------------+ +-----------------+ ^ | | | | v +-----------------+ +-----------------+ | db4.example.com | <---- | db3.example.com | +-----------------+ +-----------------+
Each master replicates its queries to the connected slave. The way this achieves multi-master replication is that along with locally generated queries, incoming replicated queries are also replicated out to the connected slave e.g:
db1 replicates its queries to db2
db2 replicates its queries to db3 (including queries from db1)
db3 replicates its queries to db4 (including queries from db1 and db2)
db4 replicates its queries to db1 (including queries from db1, db2 and db3)
Queries replicate all the way around the ring until a master sees an incoming query that was generated locally. These are discarded to prevent queries infinitely replicating around the ring.
The problem with ring topologies is that if a master dies, queries don't progress around the ring and so replication lag builds:
+-----------------+ +-----------------+ | db1.example.com | ----> | db2.example.com | +-----------------+ +-----------------+ | | v +-----------------+ | db3.example.com | +-----------------+
We can fix a broken ring by bypassing the dead master:
+-----------------+ +-----------------+ | db1.example.com | ----> | db2.example.com | +-----------------+ +-----------------+ ^ | | | | v | +-----------------+ +--------------- | db3.example.com | +-----------------+
The problem here is that if there are queries still replicating around the ring from the dead master, the queries will not be filtered out and so an infinite replication loop occurs.
Since MySQL::Replication clients do not binlog incoming queries, there is no risk of serving non-locally generated queries and thus no risk of infinite replication loops.
Time Slicing Wastes Time
Multi-master replication is possible by time slicing:
@t1: +-----------------+ +-----------------+ +-----------------+ | db1.example.com | ----> | db2.example.com | | db3.example.com | +-----------------+ +-----------------+ +-----------------+ @t2: +-----------------+ +-----------------+ +-----------------+ | db1.example.com | | db2.example.com | <---- | db3.example.com | +-----------------+ +-----------------+ +-----------------+ @t3: +-----------------+ +-----------------+ +-----------------+ | db1.example.com | ----> | db2.example.com | | db3.example.com | +-----------------+ +-----------------+ +-----------------+
A timer is used to stop replicating from the currently connected master and to switch to another e.g.:
the timer is started
db2 replicates from db1 for a timeslice
the timer fires
db2 stops replicating from db1
db2 is reconfigured to replicate from db3
the timer is started
db2 replicates from db3 for a timeslice
The problem with time slicing is that time is wasted:
The slave does no work when the currently connected master doesn't have anything to replicate
Replication latency builds up on the blocked masters while they wait for their share of the timer i.e. given
N
masters with a timeslice oft
seconds each, the slave will be at least(N-1) * t
seconds behind each master.
Since MySQL::Replication achieves multi-master replication by running multiple instances of the client in parallel, there is no time slicing via a timer and thus no time wasted by time slicing.
Queries May Get Replayed After A Server Crash
A slave's master position is recorded in the
relay-log.info
file however writes to the InnoDB tablespace andrelay-log.info
are not atomically synced to disk. If a slave dies and comes back online, files may be in an inconsistent state. If the InnoDB tablespace was flushed to disk before the crash butrelay-log.info
wasn't, the slave will restart replication from a stale position and so will replay queries.A workaround can be found at http://bugs.mysql.com/bug.php?id=40337 but note:
"Although such solution has been proposed to reduce the probability of corrupted files due to a slave-crash, the performance penalty introduced by it has made the approach impractical for highly intensive workloads."
MySQL::Replication clients store their server positions inside the InnoDB tablespace (i.e. the
Replication.SourcePosition
table by default). Since updates are done within the same transaction as replicated queries are executed in, writes are atomic. If a slave dies and comes back online, we will still be in a consistent state since either the transaction was committed or it will be rolled back.Moving Slaves To Different Masters Is Hard
A slave's master position is relative to the directly connected master's binlogs. Given a multi-layer replication topology e.g. a tree topology, a slave's master position is still relative to the directly connected master's binlogs and not relative to the root master's binlogs. If a master in a middle layer dies, moving its slaves to a different master is non-trivial since they will all need their master positions translated to the new master's binlogs.
MySQL::Replication always deals with canonical binlog positions. In a multi-layer replication topology e.g. a tree topology, positions are always relative to the root server's binlogs. If a relay in a middle layer dies, moving its clients to a different relay is a simple configuration item change since no translation is needed.
How Does MySQL::Replication Work
A MySQL::Replication replication topology is made up of:
MySQL::Replication servers
Servers provide an API to retrieve queries from its local binlogs. A client can request queries from arbitrary binlogs and file positions.
MySQL::Replication clients
Clients retrieve queries from a server and applies them to the local database.
MySQL::Replication relays
Relays provide a transparent caching query proxy for clients to connect to. Relays are useful for reducing data transfers between multiple data centers and reducing load on the servers.
MySQL::Replication Servers
A MySQL master runs a MySQL::Replication server, which serves queries from its local binlogs e.g.:
db1.example.com:~$ MySQLReplicationServer.pl --binlog db1:/var/lib/mysql/binlogs/mysql-bin.index
The server running on db1.example.com
will serve queries from the binlogs listed in mysql-bin.index
.
See MySQLReplicationServer.pl for more information on servers.
MySQL::Replication Clients
A MySQL slave runs the MySQL::Replication client e.g.:
db2.example.com:~$ MySQLReplicationClient.pl --srchost db1.example.com --srcbinlog db1
The client running on db2.example.com
will:
Get the server position for
db1.example.com
from the local databaseConnect to the server running on
db1.example.com
Request queries, starting from its server position
Read the query response from the server
Execute the query on the local database
Update the server position in the local database
Wait for the next query response
+-----------------+ +-----------------+
| db1.example.com | ----> | db2.example.com |
+-----------------+ +-----------------+
To replicate from multiple masters, run multiple instances of the client e.g.:
db2.example.com:~$ MySQLReplicationClient.pl --srchost db1.example.com --srcbinlog db1
db2.example.com:~$ MySQLReplicationClient.pl --srchost db3.example.com --srcbinlog db3
+-----------------+ +-----------------+ +-----------------+
| db1.example.com | ----> | db2.example.com | <---- | db3.example.com |
+-----------------+ +-----------------+ +-----------------+
Note that there is no restriction on where the client and server run. e.g. having all databases replication to and from each other is possible:
+-----------------+ +-----------------+ +-----------------+
| db1.example.com | <---> | db2.example.com | <---> | db3.example.com |
+-----------------+ +-----------------+ +-----------------+
^ ^
| |
+---------------------------------------------------+
See MySQLReplicationClient.pl for more information on clients.
MySQL::Replication Relays
A MySQL::Replication relay acts as a proxy cache. In a multi-layer replication topology, middle layers run a MySQL::Replication relay e.g.:
relay.example.com:~$ MySQLReplicationRelay.pl
The relay running on relay.example.com
will:
Accept requests from connecting clients
If the relay can fulfill the request from its cache, it will serve them to the client
If the relay cannot fulfill the request from its cache, it will:
Connect directly to the server, or if specified, the next relay
Relay the request to the next layer
Read the query response
Cache the query response for future requests
Send the query response to the client
Wait for the next query response
+-----------------+ +-------------------+ +-----------------+
| db1.example.com | ----> | relay.example.com | ----> | db2.example.com |
+-----------------+ +-------------------+ +-----------------+
By using relays:
Bandwidth is saved since multiple clients in one data center need only connect to the local relay, while the relay goes over the WAN to fulfill requests
Load is saved on the server since the number of connecting clients is reduced
Note that there is no restriction on the number of layers of relays e.g. a tree of relays is possible:
+-----------------+
| db2.example.com | <---------------+
+-----------------+ |
|
+-----------------+ +--------------------+
| db3.example.com | <---- | relay2.example.com | <----------------+
+-----------------+ +--------------------+ |
|
+--------------------+ +-----------------+
| relay1.example.com | <---- | db1.example.com |
+--------------------+ +-----------------+
|
+-----------------+ +--------------------+ |
| db4.example.com | <---- | relay3.example.com | <----------------+
+-----------------+ +--------------------+
|
+-----------------+ |
| db5.example.com | <---------------+
+-----------------+
See MySQLReplicationRelay.pl for more information on relays.
FAQs
What Happens If There Is A Race Condition On A Record
e.g. An insert to the users
table occurs on two seperate databases at the same time for the username
'alfie'. The problem here is that username
is the primary key. Since the inserts happened at the same time, both inserts succeeded. It is only when they replicate will a primary key constraint fail. If this happens, replication will stop and manual intervention is necessary.
The only way to prevent this is to avoid the race in the first place:
Use an external arbiter to protect access to shared resources
e.g. before inserting into the
users
table, each program performing the insert contacts the arbiter and request the inserting of 'alfie'. The first request is granted the insert while the others fail.Shard your data so that race conditions cannot occur
e.g. the
users
table is sharded based on the first letter of the usename. Inserts for 'alfie' only happen on the database with write access to the 'a' records.Don't use
AUTO_INCREMENT
keys, use UUIDs insteadNot useful in the
users
example, but for tables whereAUTO_INCREMENT
ids are used, switch to UUIDs to avoid clashes.
BUGS
The relay is still in development and have not been released yet
Communication over the wire is in plain text. Only use MySQL::Replication over a secure channel (e.g. stunnel, IPSec etc)
Row-based replication is not supported yet
LOAD DATA events are not supported yet
Filtering on queries, tables and schemas are not supported yet
SEE ALSO
AUTHOR
Alfie John, alfiej@opera.com
LICENSE AND COPYRIGHT
Copyright (c) 2011, Opera Software Australia Pty. Ltd. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.