At Meta, we run one of the largest deployments of MySQL in the world. The deployment powers the social graph along with many other services, like Messaging, Ads, and Feed. Over the last few years, we have implemented MySQL Raft, a Raft consensus engine that was integrated with MySQL to build a replicated state machine. By Anirban Rahut, Abhinav Sharma, Yichen Shen, Ahsanul Haque.
To allow for high availability, fault tolerance, and scaling reads, Meta’s MySQL datastore is a massively sharded, geo-replicated deployment with millions of shards, holding petabytes of data. The deployment includes thousands of machines running over several regions and data centers across multiple continents.
Tah article then explains:
- Why was MySQL Raft necessary?
- The Raft library and the MySQL Raft plugin
- MySQL Raft replication topologies
- Replicated log
- Write transaction on MySQL primary using Raft
- Crash recovery
- Raft-initiated state machine transitions
- Monitoring the MySQL Raft rollout
- Performance
… and more. The biggest win of MySQL Raft was simplification of the operation and making MySQL servers take care of promotions and membership. This gave the provable safety of Raft and reduced significant operational pain. Our goals of having a hands off-management of MySQL consistency, and having tools for the rare cases of availability loss, are mostly met. Good read!
[Read More]