wsrep.cnf Configuration Deep Dive
The wsrep.cnf file serves as the operational control plane for MariaDB Galera’s synchronous multi-master replication architecture. Unlike traditional asynchronous replication setups, Galera’s parameter set governs consensus protocols, state transfer mechanics, flow control thresholds, and certification boundaries. For database administrators, DevOps engineers, and platform teams building automated infrastructure, treating wsrep.cnf as a declarative configuration artifact is essential for predictable cluster behavior, deterministic rollouts, and rapid incident resolution. Comprehensive topology planning and baseline parameter alignment are documented in the foundational Galera Cluster Setup & Node Management guidelines, which establish the architectural prerequisites before tuning begins.
Configuration Architecture & Loading Precedence
MariaDB does not natively parse wsrep.cnf as an isolated file. It is injected into the primary configuration tree via !include or !includedir directives within mariadb.cnf or my.cnf. Understanding load precedence prevents silent overrides and configuration drift:
- Base server defaults (
/etc/mysql/mariadb.conf.d/50-server.cnf) - Galera-specific overrides (
/etc/mysql/wsrep.cnf) - Runtime injection via systemd
EnvironmentFile,MYSQLD_OPTS, or container entrypoint arguments
Parameters in wsrep.cnf override earlier declarations, but command-line flags and environment variables take final precedence. Platform teams should enforce strict hierarchy and validate parsing before any service restart:
# Verify effective wsrep_ parameters before restart
mariadbd --verbose --help 2>/dev/null | grep -A 30 "wsrep_"
# Validate syntax without starting the daemon
mariadbd --validate-config --defaults-file=/etc/mysql/mariadb.cnf
When integrating with infrastructure-as-code pipelines, avoid hardcoding paths. Instead, leverage systemd drop-ins or templated configuration management. Detailed patterns for idempotent deployment are covered in Automating Node Provisioning with Ansible, which demonstrates how to render wsrep.cnf dynamically from inventory variables.
Core Parameter Matrix & Production Tuning
Galera parameters are logically grouped by operational domain. Misalignment across these groups is the primary cause of certification failures, SST timeouts, and split-brain scenarios.
Cluster Identity & Topology
These parameters establish the consensus namespace and node addressing. Incorrect values prevent quorum formation or force nodes into standalone mode.
wsrep_cluster_name="prod-galera-primary"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="db-node-01"
wsrep_node_address="10.0.1.10"
Production Notes:
wsrep_cluster_addressrequires thegcomm://prefix. Omitting it disables replication.wsrep_node_namemust be deterministic and map directly to infrastructure inventory (e.g., Ansible hostnames, Kubernetes pod labels, or cloud instance IDs).- During initial cluster formation, the first node requires
wsrep_cluster_address="gcomm://"to bootstrap safely. Detailed sequencing for this phase is covered in Bootstrapping Your First Galera Cluster.
State Transfer & Synchronization Mechanics
State Snapshot Transfer (SST) and Incremental State Transfer (IST) dictate how new or lagging nodes synchronize with the cluster.
wsrep_sst_method=mariabackup
wsrep_sst_auth="sstuser:SecurePassphrase!"
wsrep_provider_options="ist.recv_addr=10.0.1.10:4568; gcache.size=2G; gcache.page_size=256M"
Production Notes:
mariabackupis the recommended SST method for MariaDB 10.3+. It supports parallel streaming and avoids full table locks during transfer.gcache.sizemust exceed the maximum write volume expected during a node outage. If IST fails due to cache overflow, the node falls back to full SST, causing significant I/O and network saturation.ist.recv_addrshould explicitly bind to the node’s primary replication interface to prevent routing conflicts in multi-homed environments.- Node lifecycle operations, including controlled SST triggers and cache preservation, are detailed in Graceful Node Join and Leave Procedures.
Flow Control & Certification Boundaries
Galera uses certification-based replication. When write rates exceed apply capacity, flow control throttles incoming transactions to prevent queue exhaustion.
wsrep_slave_threads=4
wsrep_certify_nonPK=1
wsrep_provider_options="gcs.fc_limit=256; gcs.fc_factor=0.8; gcs.fc_master_slave=YES"
Production Notes:
wsrep_slave_threadsshould match the number of CPU cores available for parallel apply, but never exceedinnodb_thread_concurrency. Over-provisioning increases lock contention.gcs.fc_limitdefines the write-set queue threshold before throttling triggers. Default (16) is too low for high-throughput OLTP workloads. Values between 128–512 are typical for production.gcs.fc_factor(0.0–1.0) determines when throttling releases.0.8means flow control lifts once the queue drops to 80% offc_limit.wsrep_certify_nonPK=1enforces primary key requirements, preventing certification failures on tables lacking unique indexes.
Network & Timeout Hardening
Cross-AZ and cloud deployments require explicit timeout tuning to prevent false-positive node evictions.
wsrep_provider_options="evs.suspect_timeout=PT5S; evs.inactive_timeout=PT15S; evs.join_retrans_period=PT1S; evs.send_window=1024; evs.user_send_window=512"
Production Notes:
evs.suspect_timeoutandevs.inactive_timeoutcontrol how long the cluster waits before declaring a node unresponsive. Default values (5s/15s) are often too aggressive for high-latency networks.evs.send_windowandevs.user_send_windowmanage the number of in-flight write-sets. Increasing these improves throughput but raises memory consumption.- Reference the official MariaDB wsrep_provider_options Documentation for provider-specific EVS (Extended Virtual Synchrony) parameter matrices.
Automation & Validation Pipelines
Platform teams must treat wsrep.cnf as a version-controlled, testable artifact. Manual edits introduce drift and invalidate cluster consistency guarantees.
Python-Based Configuration Validation
The following pattern demonstrates how to parse, validate, and enforce parameter alignment before deployment:
import configparser
import subprocess
import sys
def validate_wsrep_config(config_path: str) -> bool:
config = configparser.ConfigParser()
config.read(config_path)
required_keys = {
"wsrep_cluster_name", "wsrep_cluster_address",
"wsrep_node_name", "wsrep_node_address", "wsrep_sst_method"
}
if not config.has_section("mysqld"):
print("ERROR: Missing [mysqld] section")
return False
missing = required_keys - set(config.options("mysqld"))
if missing:
print(f"ERROR: Missing required parameters: {', '.join(missing)}")
return False
# Validate syntax via MariaDB binary
result = subprocess.run(
["mariadbd", "--validate-config", "--defaults-file=/etc/mysql/mariadb.cnf"],
capture_output=True, text=True
)
if result.returncode != 0:
print(f"CONFIG SYNTAX ERROR:\n{result.stderr}")
return False
return True
if __name__ == "__main__":
if validate_wsrep_config("/etc/mysql/wsrep.cnf"):
print("Configuration validated. Proceeding with deployment.")
sys.exit(0)
sys.exit(1)
CI/CD Integration & Drift Detection
- Store
wsrep.cnftemplates in Git with parameterized placeholders forwsrep_node_addressandwsrep_node_name. - Use CI pipelines to run syntax validation and unit tests against a staging Galera container.
- Implement periodic drift detection by comparing live
SHOW GLOBAL VARIABLES LIKE 'wsrep_%';output against the rendered configuration. - Official MariaDB system variable documentation provides authoritative defaults and scope definitions: MariaDB Galera System Variables.
Production Hardening & Incident Response
Common Failure Modes & Remediation
| Symptom | Root Cause | Immediate Action |
|---|---|---|
WSREP: Failed to read uuid:seqno from joiner |
SST authentication mismatch or wsrep_sst_method unsupported |
Verify wsrep_sst_auth credentials and ensure mariabackup is installed on all nodes |
WSREP: Flow control paused |
gcs.fc_limit too low or wsrep_slave_threads undersized |
Increase gcs.fc_limit to 256+, monitor wsrep_local_recv_queue_avg |
WSREP: view(view_id(NON_PRIM,...)) |
Network partition or evs.inactive_timeout too aggressive |
Verify inter-node connectivity, adjust EVS timeouts, check firewall rules for ports 4567-4568 |
WSREP: gcache page size mismatch |
gcache.page_size changed without full cluster restart |
Revert to original value or perform rolling restart with wsrep_provider_options aligned |
Log Analysis & Telemetry
Galera logs to mysqld.log or journalctl -u mariadb. Filter critical events:
journalctl -u mariadb -f | grep -E "WSREP:|Galera|gcomm"
Monitor these status variables in real-time:
wsrep_cluster_size: Active node countwsrep_local_state_comment: Node state (Synced,Donor/Desynced,Joining)wsrep_flow_control_paused_ns: Cumulative flow control stall durationwsrep_last_committed: Latest certified transaction ID
Align monitoring thresholds with your SLA requirements. Automated alerting should trigger when wsrep_flow_control_paused_ns exceeds baseline or when wsrep_cluster_size drops below quorum.