Understanding Galera Synchronous Replication

Galera Cluster implements true synchronous multi-master replication by decoupling storage engine execution from network consensus. Unlike traditional asynchronous binlog shipping, Galera guarantees that a transaction is either committed on all nodes or rolled back entirely before returning control to the client. This architecture eliminates replication lag as a failure vector, but introduces strict latency and ordering constraints that infrastructure teams must engineer around. The operational model is deeply rooted in the foundational principles outlined in MariaDB Galera Core Architecture & Fundamentals, where the wsrep (Write-Set Replication) API intercepts InnoDB/XtraDB transaction boundaries and translates them into globally ordered write-sets.

The Synchronous Commit Lifecycle & Parameter Tuning

When a client issues a COMMIT, the local node does not immediately persist the transaction to disk. Instead, the storage engine generates a write-set containing row-level changes, metadata, and dependency vectors. This payload is broadcast via the Group Communication System (GCS) to all cluster members. Each node independently applies the write-set to a temporary buffer and runs a deterministic certification check against its local state. The Write-Set Certification Process Explained details how Galera uses a global transaction ID (gtid) and a deterministic ordering algorithm to detect conflicts before disk I/O occurs. If certification succeeds on a quorum of nodes, the transaction is marked as COMMITTED locally and applied to the storage engine. If any node detects a primary key or unique constraint violation, the originating node receives a ROLLBACK directive, ensuring strict consistency across the cluster.

Figure: the synchronous commit handshake — the write-set is broadcast and ordered before the client is acknowledged.

sequenceDiagram
    actor App as Client
    participant O as Origin node
    participant G as GCS total order
    participant P as Peer nodes
    App->>O: COMMIT
    O->>O: Build write-set
    O->>G: Broadcast write-set
    G->>O: Deliver in global order
    G->>P: Deliver in global order
    O->>O: Deterministic certification
    P->>P: Deterministic certification
    O-->>App: COMMIT acknowledged
    Note over O,P: Each node reaches the same verdict independently

This synchronous handshake introduces measurable latency proportional to the slowest node in the consensus group. Platform teams must tune wsrep_slave_threads to parallelize apply operations and enforce explicit primary keys on every table at the schema level (leaving wsrep_certify_nonPK at its default ON), which keeps certification deterministic and efficient. When flow control triggers, the cluster temporarily halts new transactions on faster nodes to allow lagging members to catch up, preventing state divergence. Monitor wsrep_flow_control_paused (a 0–1 ratio) and wsrep_flow_control_sent to quantify backpressure. Sustained wsrep_flow_control_paused values above 0.5 indicate network saturation, disk I/O bottlenecks, or undersized gcs.fc_limit thresholds.

Certification, Conflict Resolution & Deadlock Handling

Synchronous replication fundamentally changes how application traffic should be routed. Because every node accepts writes, connection pooling strategies must account for certification overhead and network round-trip times. When Designing Multi-Master Topologies, engineers must align node placement with network topology to minimize inter-node latency. Cross-datacenter deployments typically require dedicated replication bridges or read-only replicas rather than extending the synchronous group, as WAN latency will trigger persistent flow control and degrade throughput.

Concurrent write patterns to identical row ranges across different nodes will trigger certification failures. The originating node receives ER_WSREP_DEADLOCK (Error 1213) or ER_LOCK_WAIT_TIMEOUT (Error 1205). Application layers must implement exponential backoff with jitter and transaction retry logic. Python automation builders should leverage the standard DB-API specification Python Database API Specification v2.0 to wrap connection execution in retry decorators that explicitly catch mysql.connector.errors.OperationalError codes 1213 and 1205. Avoid application-level read-modify-write loops; instead, use INSERT ... ON DUPLICATE KEY UPDATE or UPDATE ... WHERE with deterministic ordering to minimize certification collisions.

Network Latency, Flow Control & Topology Constraints

Galera’s GCS layer relies on multicast or UDP/TCP unicast for state transfer and group membership. Production deployments must guarantee bidirectional latency under 2ms for synchronous writes and provision dedicated 10GbE+ links for inter-node traffic. The How Galera Handles Concurrent Writes in Multi-Master documentation outlines how the cluster arbitrates write ordering when multiple nodes submit conflicting transactions simultaneously. To prevent certification storms, enforce wsrep_provider_options="gcs.fc_limit=100;gcs.fc_factor=0.8" and tune wsrep_max_ws_size to cap write-set payloads. Oversized transactions (bulk INSERT/UPDATE without batching) will exhaust apply queues and stall the entire cluster.

Network partitioning triggers automatic quorum evaluation. If a node loses connectivity to the majority, it transitions to Non-Primary state and rejects all writes. Platform teams must configure pc.ignore_sb=false (default) to prevent split-brain scenarios and implement automated fencing via wsrep_cluster_address health probes. State Snapshot Transfer (SST) and Incremental State Transfer (IST) fallbacks require precise firewall alignment: TCP 3306 (client), 4567 (GCS), 4568 (IST), and 4444 (SST) must remain open between all cluster members. Restrict SST methods to mariabackup or rsync for production stability, and disable mysqldump SST due to table locking and extended downtime.

Automation, Telemetry & Operational Dependencies

Infrastructure automation must treat Galera nodes as stateful, consensus-bound entities rather than interchangeable database instances. Python-based orchestration scripts should query SHOW GLOBAL STATUS LIKE 'wsrep%' at 5-second intervals to track wsrep_local_state, wsrep_cluster_status, and wsrep_ready. A robust health check validates wsrep_ready=ON, wsrep_cluster_size >= 2, and wsrep_flow_control_paused < 0.1. During rolling upgrades or maintenance, execute SET GLOBAL wsrep_desync=ON to temporarily remove a node from the synchronous apply path without disrupting cluster quorum.

Prometheus exporters and OpenTelemetry collectors should scrape wsrep_ metrics from the information_schema.GLOBAL_STATUS table (or SHOW GLOBAL STATUS). Alert thresholds must be calibrated to production baselines: wsrep_cluster_status != Primary triggers P1, wsrep_local_state != 4 (Synced) triggers P2, and sustained wsrep_flow_control_paused > 0.3 triggers capacity review. For detailed protocol specifications and GCS tuning matrices, reference the official Galera Cluster Documentation.

Platform teams must enforce strict dependency ordering: network fabric and DNS resolution must stabilize before mysqld initialization. Use wsrep_cluster_address=gcomm://node1,node2,node3 with explicit IPs, and bootstrap the first node with wsrep_new_cluster only during initial provisioning or full cluster recovery. Never run wsrep_on=OFF in production unless performing isolated schema migrations with explicit application read-only routing.