How does Galera decide which node wins when two nodes write the same row at once?

Galera uses first-committer-wins based on position in the global total order, not wall-clock time. Both write-sets are broadcast and assigned sequence numbers; the one with the lower seqno certifies and applies on every node, and the later write-set fails certification everywhere and is rolled back on its originating node.

How should an application handle Galera error 1213 under concurrent writes?

Catch error codes 1213 and 1205, roll back, and retry the transaction with jittered exponential backoff, since the losing write-set usually certifies successfully once the conflicting key set has settled. For persistently hot rows, route their writes to a single node so they serialize locally instead of racing across nodes.

How Galera Handles Concurrent Writes in Multi-Master

This page builds on the commit path described in Understanding Galera Synchronous Replication and answers one focused question: when two nodes accept writes to the same rows at the same instant, how does Galera decide which one wins, and what does the losing client actually see? MariaDB Galera Cluster is a true multi-master system — every node accepts writes simultaneously with no designated primary — so consistency cannot come from routing writes through a single leader. It comes instead from a deterministic certification test that every node runs against the same globally ordered stream of write-sets. Understanding that test is the difference between treating concurrent-write errors as random failures and engineering an application that absorbs them cleanly.

Context: Why Concurrent Writes Are a Distinct Problem Here

In a single-server or primary-replica deployment, two transactions that touch the same row are serialized by InnoDB row locks on one machine, and the second transaction simply waits. In a multi-master group there is no shared lock manager across nodes: node A and node B can each acquire local locks on the same primary key and each execute to the point of COMMIT before either has heard of the other. Something has to break that tie without a network negotiation on every row, because per-row coordination would reintroduce the very latency synchronous replication is designed to bound.

Galera resolves this at the commit boundary rather than the lock-acquisition boundary. Each node builds a write-set — the modified rows, their primary-key references, and the metadata needed to detect overlap — and broadcasts it through the Group Communication System, which stamps every write-set with a global total-order sequence number. Conflict detection is then a local, deterministic comparison against that ordered stream, performed identically on every member by the Write-Set Certification Process. Because the ordering is global and the test is deterministic, all nodes independently reach the same verdict — no node ever asks another “did you also change this row?”

Solution: First-Committer-Wins Through Deterministic Certification

The rule Galera applies is first-committer-wins, decided by position in the total order rather than wall-clock time. When two write-sets modify the same primary key or unique index, the one that received the lower seqno certifies successfully and applies everywhere. The later write-set fails certification on every node, including its own origin, and the originating transaction is rolled back before control returns to the client.

The sequence for two colliding writes looks like this:

Node A and node B each execute a transaction that updates WHERE id = 42 and reach COMMIT.
Both build a write-set and hand it to GCS. GCS assigns A sequence N and B sequence N+1.
Every node certifies N first: no earlier write-set touched id = 42, so A passes and is queued for apply.
Every node then certifies N+1 against the now-recorded key set: it overlaps A’s key, so B fails certification.
On node B, the transaction is aborted, its locks are released, the applier thread is freed, and the client receives error 1213 (ER_LOCK_DEADLOCK) — or 1205 (ER_LOCK_WAIT_TIMEOUT) depending on where the abort landed. No partial write persists.

A critical distinction for anyone reading application logs: a Galera certification conflict is not an InnoDB deadlock. A classic InnoDB deadlock happens during lock acquisition, before the transaction is prepared, and is detected by the local lock graph. A Galera conflict happens after local execution, during certification against the replicated total order. Both surface as error 1213, so teams that assume 1213 always means “retry the local deadlock” will misdiagnose a multi-node contention pattern as a single-node one.

Absorbing conflicts in application code

The correct response to a certification failure is a bounded retry with backoff, not a failover. Because the conflict is caused by ordering, an immediate retry of the losing transaction will usually succeed the second time — its write-set now certifies against a settled key set. The pattern below targets Python 3.9+ and handles both wsrep error codes explicitly:

import time
import logging
import mysql.connector
from mysql.connector import errorcode

# wsrep certification conflicts surface as standard lock errors.
RETRYABLE = (errorcode.ER_LOCK_DEADLOCK,        # 1213
             errorcode.ER_LOCK_WAIT_TIMEOUT)     # 1205

def execute_with_cert_retry(conn, query, params, max_retries=3):
    """Run a write, retrying Galera certification conflicts with jittered backoff."""
    for attempt in range(max_retries):
        cursor = conn.cursor()
        try:
            cursor.execute(query, params)
            conn.commit()
            return cursor.lastrowid
        except mysql.connector.Error as err:
            if err.errno in RETRYABLE:
                conn.rollback()
                backoff = 0.05 * (2 ** attempt)  # 50ms, 100ms, 200ms
                logging.warning(
                    "Galera cert conflict (errno=%s, attempt %d) - retry in %.2fs",
                    err.errno, attempt + 1, backoff,
                )
                time.sleep(backoff)
            else:
                raise
        finally:
            cursor.close()
    raise RuntimeError("Max retries exceeded for concurrent write")

Two design choices matter here. The exponential-with-jitter backoff prevents a thundering herd of retries from re-colliding in lockstep. And because retries cost latency, the durable fix for a genuinely hot key is to route all writes to that key to a single node so they serialize locally under InnoDB rather than fighting for the total order across nodes — a pattern reinforced by Designing Multi-Master Topologies.

Parameter Reference: Knobs That Govern Concurrent-Write Behavior

These are the directives that shape how the group tolerates and reports concurrent writes. Apply them uniformly across every node, under a [mysqld] section:

Parameter	Type	Default	Recommended	Effect on concurrent writes
`wsrep_certify_nonPK`	boolean	`1` (ON)	`1`	When ON, Galera appends a key for tables without an explicit primary key so they can still be certified. Setting `0` lets conflicting writes to PK-less tables silently diverge.
`wsrep_retry_autocommit`	integer	`1`	`1`–`2`	Number of times the server itself retries an `autocommit` statement that fails certification, before returning the error to the client. Cap it low to avoid masking structural contention.
`wsrep_slave_threads`	integer	`1`	≈ apply CPU cores	Parallel applier threads. More threads drain the receive queue faster, lowering the window in which a remote write-set can conflict with a local one.
`innodb_autoinc_lock_mode`	integer	`2`	`2`	Interleaved auto-increment; required so concurrent multi-node inserts do not serialize on the `AUTO_INCREMENT` table lock.
`wsrep_provider_options` (`gcs.fc_limit`, `gcs.fc_factor`)	string	`gcs.fc_limit=16`	tune to workload	Flow-control thresholds that throttle fast writers when appliers fall behind, keeping certification decisions timely under burst load.

A representative baseline:

[mysqld]
wsrep_certify_nonPK      = 1
wsrep_retry_autocommit   = 2
wsrep_slave_threads      = 4
innodb_autoinc_lock_mode = 2
wsrep_provider_options   = "gcs.fc_limit=16; gcs.fc_factor=0.8"

Note that wsrep_provider_options is a single semicolon-delimited string read only at provider load, so a second declaration replaces the whole value rather than merging — the loading-order details and low-latency tuning live in configuring wsrep_provider_options for low latency.

Verification: Confirm the Conflict Behavior You Expect

After tuning, confirm that conflicts are being certified — not silently swallowed — by reading the wsrep status counters on each node:

SHOW GLOBAL STATUS WHERE Variable_name IN (
  'wsrep_local_cert_failures',
  'wsrep_local_bf_aborts',
  'wsrep_local_state_comment',
  'wsrep_last_committed',
  'wsrep_local_send_queue_avg'
);

wsrep_local_cert_failures — write-sets this node rejected during certification. A steady climb signals key-space contention or a misconfigured wsrep_certify_nonPK.
wsrep_local_bf_aborts — local transactions aborted because a higher-priority replicated write-set won the total order; this is the counter that rises when your own writes lose a first-committer race.
wsrep_local_state_comment — must read Synced. A node showing Donor/Desynced or Joiner is not certifying concurrent writes.
wsrep_last_committed — the seqno of the last applied write-set; it should advance in lockstep across nodes. Persistent divergence points to a backlog or partition.
wsrep_local_send_queue_avg — sustained values above ~10 indicate the replication path, not certification, is the bottleneck.

To attribute conflicts to a specific workload rather than infrastructure, cross-check the InnoDB metrics table:

SELECT NAME, COUNT
FROM information_schema.innodb_metrics
WHERE NAME LIKE 'trx_%' AND STATUS = 'enabled';

A high trx_rollback count that tracks with rising wsrep_local_bf_aborts confirms application-level hot-row contention rather than a network or storage fault.

Edge Cases & Gotchas

Tables without a primary key certify on a synthetic key. With wsrep_certify_nonPK=1 (the default), Galera can still certify PK-less tables, but the appended key is coarse and produces false conflicts under concurrency. Never disable it to “fix” those conflicts — add real primary keys to every write-heavy table instead. Disabling it lets overlapping writes diverge with no error at all.

Large transactions widen the conflict window. A single multi-megabyte write-set takes longer to certify and apply, during which more concurrent transactions pile up behind it in the total order and lose their certification races. Chunk bulk DML into small batches so no one transaction serializes the stream; enforce a ceiling with wsrep_max_ws_size rather than trusting callers.

Docker and systemd nodes must agree on identical parameters. Certification is only deterministic if every member evaluates the same rules. A containerized node that inherits a different innodb_autoinc_lock_mode from an image default, or a systemd drop-in that overrides wsrep_slave_threads, will apply and abort at a different rate and can surface as phantom conflicts. Confirm firewall state on ports 4567/4568/4444 too — a stateful timeout that drops group traffic corrupts total-order sequencing, which is covered in Network Security & Firewall Rules for Galera.

Understanding Galera Synchronous Replication — the parent topic: how the synchronous commit path is ordered and acknowledged.
Write-Set Certification Process Explained — the deterministic test that computes every concurrent-write verdict.
Designing Multi-Master Topologies — routing hot keys to a single node to serialize contention away.
Fallback Routing with Read-Only Nodes — steering reads and using wsrep_sync_wait for consistency control.

How Galera Handles Concurrent Writes in Multi-Master

Context: Why Concurrent Writes Are a Distinct Problem Here #

Solution: First-Committer-Wins Through Deterministic Certification #

Absorbing conflicts in application code #

Parameter Reference: Knobs That Govern Concurrent-Write Behavior #

Verification: Confirm the Conflict Behavior You Expect #

Edge Cases & Gotchas #

Related #