Overview
I recently needed to set up database replication inside RDS, with a MySQL instance as the master and an Aurora cluster writer as the slave. If I was using the AWS console for all this, I could simply select the MySQL instance in the RDS console, drop down the Actions menu and choose Create Aurora read replica
and that would be the end of this. However, because of reasons, I couldn't. What follows is the process undertaken to get this running.
This is purely written for my own edification and documents a process I recently undertook that worked just terrifically. I had my own reasons for undertaking this task in this specific manner, and you may have your own as well, or you may also have your own as to why this isn't the way you'd do things. Not my concern!
Anyway.
Assumptions
I am making a handful of assumptions of things outside the scope of this document. If your current setup doesn't cover these assumptions, this process may not work for you as written, and you may need to adjust commands accordingly (which will not be covered here) - proceed with caution.
- You already have an active and in-use RDS MySQL instance, that is configured to be Multi-AZ
- You have already created an Aurora cluster to replace it, and it has no data in it yet
- Those two databases are in the same AWS VPC subnet, or if not, you have configured security groups to allow port 3306 access between them
- You know how to use at the very least a shell terminal and the MySQL CLI client
- You have an EC2 instance or other server that can connect to both databases on port 3306
- You have the MySQL client installed in that instance/server
- You understand that RDS costs money, and this process will add to that cost
- Your data set is contained inside a single database in MySQL
All good? Let's begin!
TL;DR
Quick overview of commands if you're lazy and enjoy flying by the seat of your pants. Code block lines prefixed with $
are shell commands, mysql
are run in the MySQL client and have either (aurora)
, (replica)
or (master)
which are respectively: the new Aurora cluster writer, the MySQL read replica you will create, and your original MySQL instance.
-
Configure RDS instance to keep binary logs
mysql (aurora)> CALL mysql.rds_set_configuration('binlog retention hours', 24);
-
Create a replication user on your master
mysql (master)> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* to 'aurora'@'%' IDENTIFIED BY 'randompasswordhere'; FLUSH PRIVILEGES;
-
Create a MySQL read replica via RDS
This is completed in the AWS RDS console, via the Action menu => Create read replica -
Stop replication on the replica
mysql (replica)> CALL mysql.rds_stop_replication; mysql (replica)> SHOW SLAVE STATUS\G
-
Take a database dump of the replica
$ mysqldump --set-gtid-purged=OFF --single-transaction --triggers --routines --events --allow-keywords --order-by-primary -u [adminuser] -p [rds_mysql_master_hostname] database_name > dump.sql
-
Delete the read replica (optional)
This is completed in the AWS RDS console, via the Action menu => Delete -
Import database dump on your Aurora writer instance
mysql (aurora)> CREATE DATABASE [database_name]; ^D $ pv dump.sql | mysql -u [adminuser] -p [rds_aurora_writer_hostname] [database_name]
-
Set the new master details on Aurora writer instance to match the slave details from the replica (from step 4)
mysql (aurora)> CALL mysql.rds_set_external_master ( '[rds_mysql_master_hostname]', 3306, '[replication_username]', '[replication_password]', '[Master_Log_File value from replica]', [Exec_Master_Log_Pos value from replica], 0 );
-
Start replication on the Aurora writer
mysql (aurora)> CALL mysql.rds_start_replication;
Setup
1. Configure RDS instance to keep binary logs
By default, RDS MySQL instances do not even generate binary logs. Fixing that is simple - just enable automated backups. I'm assuming that everyone is using this already, so I'm skipping that entirely.
Beyond that, RDS MySQL instances also do not keep their binary logs for any useful amount of time, and since we need to keep binary logs for replication slaves to use, the first step was fixing that.
There are two primary ways to go about this:
- Just create a read replica. RDS will then keep binary logs as long as the read replica has processed the logs, and they are cleaned up in 5 minute intervals.
Once a read replica has started, you can simply tell it to stop replication and as long as the replica is stopped and alive, the master instance will not clean up binary logs. - Use the
mysql.rds_set_configuration
stored procedure to update thebinlog retention hours
setting, which is the method I used.
I wasn't sure how long it was going to take for the database dump to import on the Aurora writer, and I didn't particularly want to keep the read replica running the whole time ($$$) so asking the master to retain binlogs for 24 hours seemed the simplest method forward:
mysql> CALL mysql.rds_set_configuration('binlog retention hours', 24);
You can verify it has worked with the aptly named mysql.rds_show_configuration
stored proc:
mysql> CALL mysql.rds_show_configuration\G
*************************** 1. row ***************************
name: binlog retention hours
value: 24
description: binlog retention hours specifies the duration in hours before binary logs are automatically deleted.
*************************** 2. row ***************************
name: source delay
value: 0
description: source delay specifies replication delay in seconds between current instance and its master.
*************************** 3. row ***************************
name: target delay
value: 0
description: target delay specifies replication delay in seconds between current instance and its future read-replica.
3 rows in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
2. Create a replication user on your master
Now we need to setup a user that the Aurora writer will use to connect to the MySQL master in order to retrieve replication logs. You need to pick a username, a password, and remember to set both REPLICATION CLIENT
and REPLICATION SLAVE
grants. You can also choose to limit the address range this user can connect from, or just use %
as in the example below. I did, but I'm not showing that here. Don't forget to flush privileges after the user is created, in order for the user to work immediately.
mysql> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* to '[replication_username]'@'%' IDENTIFIED BY '[replication_password]';
Query OK, 0 rows affected (0.01 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.01 sec)
3. Create a MySQL read replica via RDS
Time for the easy bit! In the RDS console, find and drill down into your MySQL instance, then choose the Actions
drop down and select Create read replica
. Fill in all the details, give it an obvious name and create!
Wait a little bit and you'll see the replica appear in your RDS console. Give it a bit of time to fully create and start, and the replication status is up to date. This process took about 20 minutes all up for my database, but of course YMMV.
4. Stop replication on the replica
At this point, you will need to connect to your new MySQL read replica directly. Once you're in there, using one of the RDS stored procedures, stop replication:
mysql (replica)> CALL mysql.rds_stop_replication;
+---------------------------+
| Message |
+---------------------------+
| Slave is down or disabled |
+---------------------------+
1 row in set (1.01 sec)
Query OK, 0 rows affected (1.01 sec)
Now that the read replica has completed stopping replication, ask for the current slave status, and take note of the values for Relay_Master_Log_File
and Exec_Master_Log_Pos
as these will be used later for starting replication from Aurora.
mysql (replica)> SHOW SLAVE STATUS\G
... snip ...
Relay_Master_Log_File: mysql-bin-changelog.000955
Exec_Master_Log_Pos: 1355
... snip ...
5. Take a database dump of the replica
Now you'll need to be in a place where you can connect to the replica database instance, and also write an SQL dump file that is at least as large as your dataset, probably much more. Same EC2 instance would be ideal, just make sure you have the storage available!
We're going to take this dump using the mysqldump
tool, which is commonly available.
$ mysqldump --set-gtid-purged=OFF --single-transaction --triggers --routines --events --allow-keywords --order-by-primary -u [adminuser] -p [rds_mysql_replica_hostname] database_name > dump.sql
Let's break down those parameters, shall we?
--set-gtid-purged=OFF
- Regardless of whether or not you are using GTIDs, not providing this option will cause your dump to fail during import, as the variable it attempts to set in the dump requiresSUPER
permission and RDS does not provide it. Required.--single-transaction
- Dumps all tables in a single transaction, ensuring consistency of the data snapshot. This isn't really required here, since this read replica isn't in use and not getting writes, so the data should be consistent regardless, but it's still a good practice to use anyway. Optional, but recommended.--triggers --routines --events
- Includes all triggers, routines, and events in the dump. My dataset has both triggers and events, so this was required for me. Optional.--allow-keywords
- Ensures a valid dump if any of your table column names are reserved keywords, e.g. if a column name isshow
this is required. Optional.--order-by-primary
- Makes mysqldump sort all table rows by the primary or first unique key if available. Causes the dump to take considerably longer, but usually makes the import faster.-u [adminuser] -p
- Username and password, fill these in with the adminuser you configured when setting up RDS.[rds_mysql_replica_hostname]
- Should be the endpoint address of the read replica instance.database_name
- Your database name.
Once you've finished running this command, you should have a file called dump.sql
that contains all your data.
6. Delete the read replica (optional)
You can now delete the read replica. You don't really have to do this, it just saves some money. It has served it's purpose.
Feel free to keep it around if you're worried your dump is corrupt or won't work, otherwise you'll have to start back again at step 3.
7. Import database dump on your Aurora writer instance
Time to import! For this step, you'll need to still be on your server or instance with the new dump.sql
file available.
Connect to your Aurora cluster writer instance via the MySQL client, and create your database(s):
mysql> CREATE DATABASE [database_name];
Query OK, 0 rows affected (0.01 sec)
Log out of there, and do a simple import of the data. Here, I'm using the pv
tool in place of cat
simply because I like to see the import progress. It's a great tool.
$ pv dump.sql | mysql -u [adminuser] -p [rds_aurora_writer_hostname] [database_name]
Once this is done, your Aurora database should be ready to complete the replication setup and full of data. Feel free to poke around in there and verify that your data set is intact.
8. Set the new master details on Aurora writer instance
Once you're happy with that the data import was successful, log back into the Aurora writer instance via the MySQL client and using yet another RDS stored procedure, we configure replication:
mysql> CALL mysql.rds_set_external_master (
-> '[rds_mysql_master_hostname]',
-> 3306,
-> '[replication_username]',
-> '[replication_password]',
-> '[Master_Log_File value from replica]',
-> [Exec_Master_Log_Pos value from replica],
-> 0
-> );
Query OK, 0 rows affected (0.14 sec)
Breaking it down again:
- The stored procedure is
mysql.rds_set_external_master
which, as the name states, configures the database to use an external master for replication. This effectively replaces callingCHANGE MASTER TO
and uses the same values, so if you're familiar with that, you'll figure this out before finishing reading this sentence. [rds_mysql_master_hostname]
- Replace this with the instance endpoint for your MySQL master database.- 3306 - this is the MySQL port, probably won't need to change it
[replication_username]
and[replication_password]
- These are the username and password details you set up in step #2.[Master_Log_File value from replica]
- This is the value ofRelay_Master_Log_File
from step #4.[Exec_Master_Log_Pos value from replica]
- This is the value ofExec_Master_Log_Pos
from step #4. As this is a number, ensure it is NOT wrapped in quotes.- 0 - This value determines whether replication will use SSL encrypted connection. 0 means no, 1 means yes. I chose not to encrypt, for reasons.
Once you run this, you can verify the slave configuration has been added by querying the slave status via the MySQL client
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: [rds_mysql_master_hostname]
Master_User: [replication_username]
Master_Port: 3306
... snip ...
Master_Log_File: [Master_Log_File value from replica]
Read_Master_Log_Pos: [Exec_Master_Log_Pos value from replica]
... snip ...
Making sure those details all line up with what you configured and you're good to continue and finish up.
9. Start replication on the Aurora writer
Final steps, let's kick replication off. Yet again, this is achieved by using an RDS stored procedure, this time mysql.rds_start_replication
. Of note, there's also mysql.rds_stop_replication
if at any point in the future you need to, well, stop replication.
mysql> CALL mysql.rds_start_replication;
+-------------------------+
| Message |
+-------------------------+
| Slave running normally. |
+-------------------------+
1 row in set (1.00 sec)
Query OK, 0 rows affected (1.00 sec)
This stored proc will output a message which you want to verify says that the slave is running normally. You can then check the slave status again.
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
... snip ...
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
... snip ...
The two rows you're mainly looking for here are Slave_IO_State
and Slave_SQL_Running_State
which give you status information on how replication is running. Running correctly and up-to-date should look like the output above - it's waiting for more queries from master to write into the relay log, and all relay log lines currently written have been read and the slave is waiting for more.
All done. Your Aurora instance is now replicating live from your MySQL master.
What you do from here is up to you.
Credit
AWS documentation used while researching this process:
- MySQL on Amazon RDS SQL Reference https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.MySQL.SQLRef.html
- Replication Between Aurora and MySQL or Between Aurora and Another Aurora DB Cluster https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Replication.MySQL.html