Why do I get Error 1213 deadlocks on COMMIT under concurrent load?

In Galera, Error 1213 on COMMIT is usually a certification conflict, not a single-node lock deadlock: another node committed an overlapping write-set that won the total order, so your transaction was aborted. Wrap writes in a bounded retry with jittered backoff that catches error codes 1213 and 1205, and route hot-row writes to a single node so they serialize locally.

Why does a read on one node not immediately reflect a write acknowledged on another?

Galera guarantees consistency and global ordering, not zero-latency read visibility across every node. A write can be durably ordered before it has finished applying on the node you read from. Set wsrep_sync_wait=1 on sessions that need read-your-writes semantics, or pin the read to the same node that took the write.

Understanding Galera Synchronous Replication

This page builds on the concepts introduced in MariaDB Galera Core Architecture & Fundamentals and solves one specific operational problem: how to reason about — and safely tune — the synchronous commit path so that a multi-master group stays consistent without collapsing under latency or flow control. Galera implements true synchronous, virtually-synchronous replication by decoupling storage-engine execution from network consensus. Unlike traditional asynchronous binlog shipping, a transaction is either certified on every member and committed cluster-wide, or it is rejected on the originating node before control returns to the client. That guarantee removes replication lag as a silent data-loss vector, but it replaces it with hard latency and ordering constraints that database administrators, DevOps engineers, and platform teams must engineer around rather than ignore.

Concept: What “Synchronous” Actually Means in Galera

Galera is precisely certification-based virtually-synchronous replication, and the distinction matters operationally. When a client issues a COMMIT, the local node does not immediately persist the transaction. The storage engine (InnoDB/XtraDB) collects the transaction’s row-level changes into a write-set — a compact payload of the modified rows, their primary-key references, and the dependency metadata needed to detect conflicts. The wsrep (Write-Set Replication) API intercepts the commit boundary and hands that write-set to the Group Communication System (GCS), which broadcasts it to every member and assigns it a global, total-order sequence number (seqno).

The word “synchronous” refers to replication and ordering, not to apply. Every node is guaranteed to have received and ordered the write-set before the originating node acknowledges the commit — this is what makes the group virtually synchronous. The actual apply on remote nodes happens asynchronously afterward, which is why flow control exists: it is the mechanism that stops fast writers from outrunning slow appliers. The conflict decision itself is made by the Write-Set Certification Process, which runs deterministically and independently on every node against the same globally ordered stream, so all members reach an identical verdict without any node-to-node negotiation.

Figure: the synchronous commit handshake — the write-set is broadcast and ordered before the client is acknowledged.

The critical consequence is that commit latency is bounded below by the round-trip time to the slowest reachable member of the primary component. A three-node group on a 0.3 ms LAN behaves very differently from the same group stretched across a 40 ms WAN link. This is the single most important fact to internalize before designing topology or tuning any parameter on this page.

Prerequisites & Environment Requirements

Reasoning about the synchronous path assumes a correctly provisioned group. Confirm the following before tuning anything described here:

MariaDB 10.6 LTS or later (11.4 LTS recommended for new builds) with the Galera 4 provider libgalera_smm.so installed from the server package on every node. Mixed minor versions can change provider option defaults and produce asymmetric behavior.
An odd number of voting members (typically 3 or 5), or an added garbd arbitrator, so the group can always compute a majority. Quorum arithmetic is the foundation of the write-availability guarantee discussed below.
Explicit primary keys on every table. Certification is keyed on primary-key references; a table without one forces Galera to synthesize keys and degrades conflict detection. Keep wsrep_certify_nonPK at its default ON only as a safety net, not as a license to ship PK-less schemas.
The Galera port set reachable between all members. Client SQL on TCP 3306, group communication on 4567 (TCP and UDP), Incremental State Transfer (IST) on 4568, and State Snapshot Transfer (SST) on 4444. Locking these to known peers is covered in Network Security & Firewall Rules for Galera; a blocked port turns a routine node rejoin into a stalled SST.
Low, predictable inter-node latency. Sub-millisecond RTT on a dedicated link is the target for a co-located synchronous group. WAN spans should terminate at a read replica or a separate cluster rather than extending the synchronous membership.

A first-time deployment that satisfies these prerequisites is walked through in Bootstrapping Your First Galera Cluster; this page assumes a group already exists and focuses on the replication mechanics on top of it.

Step-by-Step: Trace and Tune the Synchronous Commit Path

The goal of this procedure is to make the commit path observable first, then tune it from evidence rather than folklore. Each step explains why it matters, not just the command.

1. Confirm the node is a full, synced member of the primary component. A node only participates in synchronous commits when it is Synced inside a Primary component. Anything else means your latency numbers are measuring the wrong thing.

SHOW GLOBAL STATUS WHERE Variable_name IN
  ('wsrep_cluster_status', 'wsrep_local_state_comment',
   'wsrep_cluster_size', 'wsrep_ready');

You want wsrep_cluster_status = Primary, wsrep_local_state_comment = Synced, wsrep_ready = ON, and a wsrep_cluster_size matching your expected member count.

2. Establish a commit-latency baseline. Because commit latency is dominated by consensus, measure it with a realistic write, not a SELECT. Run a short write loop and read the replication timing counters before and after:

SHOW GLOBAL STATUS WHERE Variable_name IN
  ('wsrep_replicated', 'wsrep_replicated_bytes',
   'wsrep_repl_data_bytes', 'wsrep_avg_replication_latency');

wsrep_avg_replication_latency (available on modern MariaDB builds) is the clearest signal of the network cost of a commit. A baseline lets you tell “the app got slower” apart from “the network got slower.”

3. Parallelize apply on the receiving side. Remote apply is the part of the path that flow control throttles, so widening it directly reduces backpressure. Set wsrep_slave_threads to roughly the number of CPU cores available for apply, but never below the number of independent write streams your workload generates:

[mysqld]
wsrep_slave_threads = 8

More apply threads let non-conflicting write-sets commit concurrently on peers; they do nothing for a workload that serializes on one hot row.

4. Size flow control so it protects consistency without stalling writers. Flow control pauses new replication when a node’s receive queue exceeds a limit, giving slow appliers time to catch up. Set the limit through the provider option string:

[mysqld]
wsrep_provider_options = "gcs.fc_limit=64;gcs.fc_factor=0.8"

The full anatomy of this string — how it merges, why a second declaration silently replaces the first — is covered in the wsrep.cnf Configuration Deep Dive. Keep every provider option in a single declaration.

5. Cap write-set size to prevent one transaction from stalling the group. A single multi-gigabyte transaction serializes the entire apply pipeline. Bound it:

[mysqld]
wsrep_max_ws_rows = 0
wsrep_max_ws_size = 2147483647

Then fix the workload: batch large INSERT/UPDATE/DELETE operations into chunks of a few thousand rows so no single write-set dominates the total order.

6. Re-measure and iterate. Repeat steps 2 and confirm wsrep_flow_control_paused trends toward zero under load. Tuning the synchronous path is a loop, not a one-shot edit — apply one change, restart or reload where required, and compare against the baseline.

Parameter Deep-Dive: The Knobs That Move Synchronous Behavior

These are the parameters that actually change commit-path behavior, with production-tuned reasoning. The exact defaults track your MariaDB release, so confirm them with SHOW GLOBAL VARIABLES LIKE 'wsrep_%'.

Parameter	Type	Typical value	Why it matters
`wsrep_slave_threads`	integer	4–16 (≈ apply cores)	Width of the parallel apply pipeline; the primary lever against flow-control pauses on receivers.
`gcs.fc_limit`	provider option	32–128	Receive-queue depth that triggers flow control. Higher tolerates bursts but widens the consistency/visibility window; lower reacts sooner.
`gcs.fc_factor`	provider option	0.8	Fraction of `fc_limit` the queue must drain back to before replication resumes, damping oscillation between paused and running.
`wsrep_sync_wait`	bitmask	0, or 1 for RYW	Forces a session to wait for prior write-sets to apply before reading, trading latency for causal read-your-writes consistency.
`wsrep_max_ws_size`	integer (bytes)	≤ 2 GB	Hard ceiling on a single write-set; prevents one giant transaction from monopolizing the total order.
`evs.suspect_timeout`	provider option	PT5S	How long a silent peer is tolerated before it is suspected; too low evicts healthy nodes on transient jitter, too high delays failure detection.

The subtle one is wsrep_sync_wait. By default (0), a read on any node may not yet reflect a write acknowledged elsewhere, even though the write is durably ordered — Galera guarantees consistency, not instantaneous global read visibility. Setting wsrep_sync_wait = 1 per session buys read-your-writes semantics at the cost of an extra apply-wait, which is the correct trade for read-after-write flows and the wrong default for bulk analytics. Routing reads to lagging or intentionally desynced members is handled by Fallback Routing with Read-Only Nodes.

Verification & Health Checks

A node that is “up” (process running, port 3306 open) can still be useless to the synchronous group — desynced, joining, or in a non-primary component. Verify membership state explicitly.

Quick shell probe suitable for a load-balancer health endpoint:

mariadb -N -B -e "SHOW GLOBAL STATUS WHERE Variable_name IN \
  ('wsrep_ready','wsrep_cluster_status','wsrep_local_state_comment')"

A healthy synchronous member returns wsrep_ready ON, wsrep_cluster_status Primary, and wsrep_local_state_comment Synced.

A Python 3.9+ probe that a monitor or orchestrator can call, with explicit handling for the two certification-conflict error codes the synchronous path produces:

import mysql.connector
from mysql.connector import errors

def galera_health(host: str) -> dict:
    conn = mysql.connector.connect(
        host=host, user="monitor", password="secret",
        database="information_schema", connection_timeout=3,
    )
    try:
        cur = conn.cursor()
        cur.execute(
            "SELECT VARIABLE_NAME, VARIABLE_VALUE FROM GLOBAL_STATUS "
            "WHERE VARIABLE_NAME IN "
            "('wsrep_ready','wsrep_cluster_status',"
            " 'wsrep_local_state','wsrep_flow_control_paused')"
        )
        s = {name.lower(): val for name, val in cur.fetchall()}
        healthy = (
            s.get("wsrep_ready") == "ON"
            and s.get("wsrep_cluster_status") == "Primary"
            and s.get("wsrep_local_state") == "4"        # 4 == Synced
            and float(s.get("wsrep_flow_control_paused", 1)) < 0.1
        )
        return {"host": host, "healthy": healthy, "status": s}
    except errors.OperationalError as exc:
        # 1213 = deadlock (cert conflict), 1205 = lock wait timeout
        if exc.errno in (1213, 1205):
            return {"host": host, "healthy": False, "conflict": exc.errno}
        raise
    finally:
        conn.close()

The wsrep_local_state = 4 check is the one most operators miss: a node can be Primary and wsrep_ready = ON while sitting in state 2 (Donor/Desynced) during an SST, at which point it is serving stale reads and should be drained. Continuous, alerting-grade collection of these variables is the subject of Automated Node Health Monitoring.

Automation Integration

Treat every synchronous-path parameter as version-controlled configuration, never a live SET GLOBAL that drifts away at the next restart.

Ansible — render the provider options and apply threads from group variables so every member is provably identical, and notify a rolling restart handler:

- name: Render Galera replication tuning
  ansible.builtin.template:
    src: galera-repl.cnf.j2
    dest: /etc/mysql/mariadb.conf.d/60-galera-repl.cnf
    owner: root
    group: root
    mode: "0644"
  notify: rolling restart galera

The template body sets wsrep_slave_threads and a single wsrep_provider_options line from galera_fc_limit / galera_fc_factor variables. The rolling-restart handler must restart one node at a time and wait for wsrep_local_state_comment = Synced before proceeding — the same discipline described in Graceful Node Join and Leave Procedures.

CI gate — validate the merged configuration before it ever reaches a node, so a malformed provider string fails the pipeline rather than the group:

mariadbd --defaults-file=/etc/mysql/mariadb.cnf --validate-config \
  && echo "config OK"

Application layer — wrap writes in retry logic that specifically catches certification conflicts. Because any node accepts writes, concurrent updates to the same row on different nodes surface as error 1213 on COMMIT, and the correct response is a bounded retry with jittered backoff, following the standard Python DB-API 2.0 exception model:

import random, time
from mysql.connector import errors

def commit_with_retry(cursor, conn, sql, params, attempts=5):
    for i in range(attempts):
        try:
            cursor.execute(sql, params)
            conn.commit()
            return
        except errors.OperationalError as exc:
            if exc.errno in (1213, 1205) and i < attempts - 1:
                conn.rollback()
                time.sleep((2 ** i) * 0.05 + random.random() * 0.05)
                continue
            raise

Prefer INSERT ... ON DUPLICATE KEY UPDATE and deterministically ordered UPDATE ... WHERE statements over application-side read-modify-write loops; they collapse the window in which two nodes can certify against the same rows.

Troubleshooting

ER_LOCK_DEADLOCK (Error 1213) on COMMIT under concurrent load. This is not a classic single-node deadlock — it is a certification conflict, meaning another node committed an overlapping write-set that won the total order. The originating transaction is aborted. Fix: implement the retry-with-backoff pattern above, and steer writes for hot rows to a single node (or a single sharded key range) so they serialize locally instead of colliding across the group.

wsrep_flow_control_paused stuck near 1.0 and throughput collapses. One member cannot apply fast enough, so it is holding the whole group with flow control. Fix: identify the slow node with SHOW GLOBAL STATUS LIKE 'wsrep_flow_control_recv' across members, raise wsrep_slave_threads on it, check for a storage I/O bottleneck, and break up oversized transactions. Persistent pauses across a WAN link mean the topology, not the tuning, is wrong.

WSREP has not yet prepared node for application use on connect. The node is not Synced — it is joining, donating, or in a non-primary component. Fix: check wsrep_local_state_comment; if it is Donor/Desynced, an SST is in progress and the node will recover on its own; if it is Initialized with wsrep_cluster_status = non-Primary, the node lost quorum and must rejoin a primary component.

Reads return stale data immediately after a write acknowledged on another node. Expected default behavior — Galera guarantees ordering, not zero-latency global read visibility. Fix: set wsrep_sync_wait = 1 on sessions that require read-your-writes, or pin the read to the same node that took the write.

A single large transaction stalls every node at once. The write-set exceeded a healthy size and is serializing the total order during apply. Fix: batch the DML into chunks, confirm wsrep_max_ws_size is enforced, and never run an unbounded bulk load against a live synchronous group without chunking.

Write-Set Certification Process Explained — how the deterministic conflict verdict that ends the commit path is computed.
Designing Multi-Master Topologies — node placement and latency budgeting that bound synchronous commit cost.
Fallback Routing with Read-Only Nodes — steering reads and using wsrep_sync_wait for consistency control.
Network Security & Firewall Rules for Galera — locking down the 3306/4567/4568/4444 port set the replication path depends on.
Initial Data Synchronization Methods — SST/IST, the state-transfer side of keeping members synchronous.

Understanding Galera Synchronous Replication

Concept: What “Synchronous” Actually Means in Galera #

Prerequisites & Environment Requirements #

Step-by-Step: Trace and Tune the Synchronous Commit Path #

Parameter Deep-Dive: The Knobs That Move Synchronous Behavior #

Verification & Health Checks #

Automation Integration #

Troubleshooting #

Related #