How Galera Handles Concurrent Writes in Multi-Master

MariaDB Galera Cluster operates as a true multi-master synchronous replication system, meaning any node can accept write traffic simultaneously without relying on a designated primary. The architectural guarantee of consistency across distributed nodes hinges on a deterministic certification process rather than traditional asynchronous binary log shipping. When multiple nodes ingest concurrent transactions targeting overlapping data pages, Galera’s replication layer intercepts the commit phase, bundles the row-level changes into a write-set, and broadcasts it across the cluster for validation. Platform engineers designing resilient database topologies must understand that Galera does not serialize writes at the network edge; instead, it enforces consistency at the storage engine boundary through a strict certification index. For a deeper breakdown of the underlying replication guarantees and state machine transitions, refer to Understanding Galera Synchronous Replication.

The Certification Engine and Write-Set Ordering

At the core of concurrent write handling is the Galera certification index. Before a transaction commits locally, the node generates a globally unique transaction ID (GTID) and a write-set containing modified primary keys, unique constraints, and row hashes. This payload is multicast via the wsrep API to all peer nodes using total order broadcast (TOB). Each receiving node runs the payload through a deterministic certification algorithm that checks for key collisions against its local state. If no conflicts exist, the transaction is queued for apply. If collisions are detected, the certification fails, and the originating node receives a WSREP: Certification failed signal. The cluster’s foundation relies on strict ordering protocols to prevent phantom reads and lost updates, which is thoroughly documented in MariaDB Galera Core Architecture & Fundamentals.

The certification process is entirely in-memory and executes before the actual row data is applied. This means that high-throughput environments with heavy concurrent writes to the same logical partition will experience predictable certification rejections rather than silent data divergence. The wsrep_certify_nonPK parameter dictates whether tables lacking explicit primary keys participate in this index. When set to 0, Galera skips certification for tables that lack a primary key, so conflicting writes to those tables can go undetected and silently diverge across nodes. Production deployments must keep the default wsrep_certify_nonPK=1 and mandate explicit primary keys across all write-heavy schemas.

Conflict Resolution Mechanics and First-Committer-Wins

Galera employs a strict first-committer-wins policy. When two concurrent transactions modify the same primary key or unique index, the node that completes the certification phase first applies the write-set locally and broadcasts the commit. The competing transaction, arriving milliseconds later, fails certification and triggers an automatic rollback at the originating node. The client receives MySQL error code 1213 (ER_LOCK_DEADLOCK) or 1205 (ER_LOCK_WAIT_TIMEOUT), depending on the storage engine’s lock timeout configuration.

Crucially, Galera certification failures are distinct from InnoDB row-level deadlocks. InnoDB deadlocks occur during lock acquisition before the transaction reaches the prepare phase, while Galera conflicts occur during the certification phase after local execution. DevOps teams monitoring application logs must differentiate between these error codes to avoid misrouting traffic or triggering unnecessary failovers. When a certification failure occurs, the originating node immediately releases the transaction’s locks, frees the applier thread, and returns control to the client. No partial writes persist, and the cluster maintains strict serializable isolation across all masters.

Diagnostic Telemetry for Concurrent Write Failures

Accurate root-cause analysis requires isolating certification bottlenecks from network latency or storage I/O constraints. DBAs should monitor the following status variables using exact diagnostic queries:

SHOW GLOBAL STATUS LIKE 'wsrep_cert_failures';
SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';
SHOW GLOBAL STATUS LIKE 'wsrep_last_committed';
SHOW GLOBAL STATUS LIKE 'wsrep_local_send_queue_avg';
  • wsrep_cert_failures: Increments when a write-set is rejected during certification. A sustained upward trend indicates heavy key-space contention or misconfigured wsrep_certify_nonPK.
  • wsrep_local_state_comment: Should read Synced. If it shows Joiner or Donor, the node is not participating in concurrent write certification.
  • wsrep_last_committed: Tracks the GTID of the last successfully applied transaction. Divergence across nodes indicates a certification backlog or network partition.
  • wsrep_local_send_queue_avg: Measures the average queue depth for outgoing write-sets. Values consistently above 10 suggest network saturation or insufficient applier threads.

To isolate the offending schema, query the InnoDB metrics table:

SELECT NAME, COUNT, MAX_VALUE, AVG_VALUE 
FROM information_schema.innodb_metrics 
WHERE NAME LIKE 'wsrep_%' OR NAME LIKE 'trx_%';

High trx_rollback counts paired with wsrep_cert_failures confirm application-level contention rather than infrastructure degradation.

Automation Patterns for Conflict Mitigation & Recovery

Python automation builders can implement resilient retry logic and dynamic routing to absorb certification spikes without manual intervention. The following pattern demonstrates production-safe conflict handling using mysql-connector-python:

import mysql.connector
import time
import logging
from mysql.connector import errorcode

def execute_with_cert_retry(conn, query, params, max_retries=3):
    cursor = conn.cursor()
    for attempt in range(max_retries):
        try:
            cursor.execute(query, params)
            conn.commit()
            return cursor.lastrowid
        except mysql.connector.Error as err:
            if err.errno == errorcode.ER_LOCK_DEADLOCK:
                backoff = 0.1 * (2 ** attempt)
                logging.warning(f"Galera certification conflict (attempt {attempt+1}). Retrying in {backoff:.2f}s...")
                time.sleep(backoff)
                conn.rollback()
            else:
                raise
    raise RuntimeError("Max retries exceeded for concurrent write")

For platform teams managing connection pools, integrate wsrep_local_recv_queue polling into health checks. When the queue exceeds a threshold (e.g., >50), temporarily mark the node as read-only in your proxy layer (ProxySQL, HAProxy, or custom API gateway) to prevent new writes from hitting a saturated applier thread. This fallback routing strategy prevents cascading certification failures during peak ingestion windows.

Production Tuning & Topology Safeguards

Optimizing concurrent write throughput requires aligning Galera parameters with workload characteristics. The following configuration directives should be applied uniformly across all nodes:

[mysqld]
wsrep_slave_threads = 4
wsrep_certify_nonPK = 1
wsrep_retry_autocommit = 2
innodb_autoinc_lock_mode = 2
wsrep_provider_options = "gcs.fc_limit=16; gcs.fc_factor=0.8"
  • wsrep_slave_threads: Controls parallel applier concurrency. Set to roughly 2–4x the number of CPU cores as a starting point for write-heavy workloads, then tune empirically.
  • wsrep_retry_autocommit: Automatically retries autocommit=1 transactions that fail certification. Limit to 2 to prevent infinite loops on structurally conflicting workloads.
  • innodb_autoinc_lock_mode = 2: Enables interleaved auto-increment locking, preventing AUTO_INCREMENT bottlenecks during multi-master inserts.
  • gcs.fc_limit / gcs.fc_factor: Flow control thresholds that throttle incoming writes when applier queues approach saturation.

Network security and firewall rules must permit bidirectional TCP traffic on ports 4567 (Galera replication), 4568 (IST), and 4444 (SST) across all cluster members. Asymmetric routing or stateful firewall timeouts will corrupt TOB sequencing, causing phantom certification failures. For environments requiring strict read isolation, deploy dedicated fallback routing to read-only nodes that synchronize via asynchronous replication or Galera’s wsrep_sst_method with wsrep_on=OFF. This isolates analytical workloads from certification overhead while preserving the multi-master write surface.