wsrep.cnf Configuration Deep Dive

The wsrep.cnf file serves as the operational control plane for MariaDB Galera’s synchronous multi-master replication architecture. Unlike traditional asynchronous replication setups, Galera’s parameter set governs consensus protocols, state transfer mechanics, flow control thresholds, and certification boundaries. For database administrators, DevOps engineers, and platform teams building automated infrastructure, treating wsrep.cnf as a declarative configuration artifact is essential for predictable cluster behavior, deterministic rollouts, and rapid incident resolution. Comprehensive topology planning and baseline parameter alignment are documented in the foundational Galera Cluster Setup & Node Management guidelines, which establish the architectural prerequisites before tuning begins.

Configuration Architecture & Loading Precedence

MariaDB does not natively parse wsrep.cnf as an isolated file. It is injected into the primary configuration tree via !include or !includedir directives within mariadb.cnf or my.cnf. Understanding load precedence prevents silent overrides and configuration drift:

  1. Base server defaults (/etc/mysql/mariadb.conf.d/50-server.cnf)
  2. Galera-specific overrides (/etc/mysql/wsrep.cnf)
  3. Runtime injection via systemd EnvironmentFile, MYSQLD_OPTS, or container entrypoint arguments

Parameters in wsrep.cnf override earlier declarations, but command-line flags and environment variables take final precedence. Platform teams should enforce strict hierarchy and validate parsing before any service restart:

# Verify effective wsrep_ parameters before restart
mariadbd --verbose --help 2>/dev/null | grep -A 30 "wsrep_"
# Validate syntax without starting the daemon
mariadbd --validate-config --defaults-file=/etc/mysql/mariadb.cnf

When integrating with infrastructure-as-code pipelines, avoid hardcoding paths. Instead, leverage systemd drop-ins or templated configuration management. Detailed patterns for idempotent deployment are covered in Automating Node Provisioning with Ansible, which demonstrates how to render wsrep.cnf dynamically from inventory variables.

Core Parameter Matrix & Production Tuning

Galera parameters are logically grouped by operational domain. Misalignment across these groups is the primary cause of certification failures, SST timeouts, and split-brain scenarios.

Cluster Identity & Topology

These parameters establish the consensus namespace and node addressing. Incorrect values prevent quorum formation or force nodes into standalone mode.

wsrep_cluster_name="prod-galera-primary"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="db-node-01"
wsrep_node_address="10.0.1.10"

Production Notes:

  • wsrep_cluster_address requires the gcomm:// prefix. Omitting it disables replication.
  • wsrep_node_name must be deterministic and map directly to infrastructure inventory (e.g., Ansible hostnames, Kubernetes pod labels, or cloud instance IDs).
  • During initial cluster formation, the first node requires wsrep_cluster_address="gcomm://" to bootstrap safely. Detailed sequencing for this phase is covered in Bootstrapping Your First Galera Cluster.

State Transfer & Synchronization Mechanics

State Snapshot Transfer (SST) and Incremental State Transfer (IST) dictate how new or lagging nodes synchronize with the cluster.

wsrep_sst_method=mariabackup
wsrep_sst_auth="sstuser:SecurePassphrase!"
wsrep_provider_options="ist.recv_addr=10.0.1.10:4568; gcache.size=2G; gcache.page_size=256M"

Production Notes:

  • mariabackup is the recommended SST method for MariaDB 10.3+. It supports parallel streaming and avoids full table locks during transfer.
  • gcache.size must exceed the maximum write volume expected during a node outage. If IST fails due to cache overflow, the node falls back to full SST, causing significant I/O and network saturation.
  • ist.recv_addr should explicitly bind to the node’s primary replication interface to prevent routing conflicts in multi-homed environments.
  • Node lifecycle operations, including controlled SST triggers and cache preservation, are detailed in Graceful Node Join and Leave Procedures.

Flow Control & Certification Boundaries

Galera uses certification-based replication. When write rates exceed apply capacity, flow control throttles incoming transactions to prevent queue exhaustion.

wsrep_slave_threads=4
wsrep_certify_nonPK=1
wsrep_provider_options="gcs.fc_limit=256; gcs.fc_factor=0.8; gcs.fc_master_slave=YES"

Production Notes:

  • wsrep_slave_threads should match the number of CPU cores available for parallel apply, but never exceed innodb_thread_concurrency. Over-provisioning increases lock contention.
  • gcs.fc_limit defines the write-set queue threshold before throttling triggers. Default (16) is too low for high-throughput OLTP workloads. Values between 128–512 are typical for production.
  • gcs.fc_factor (0.0–1.0) determines when throttling releases. 0.8 means flow control lifts once the queue drops to 80% of fc_limit.
  • wsrep_certify_nonPK=1 enforces primary key requirements, preventing certification failures on tables lacking unique indexes.

Network & Timeout Hardening

Cross-AZ and cloud deployments require explicit timeout tuning to prevent false-positive node evictions.

wsrep_provider_options="evs.suspect_timeout=PT5S; evs.inactive_timeout=PT15S; evs.join_retrans_period=PT1S; evs.send_window=1024; evs.user_send_window=512"

Production Notes:

  • evs.suspect_timeout and evs.inactive_timeout control how long the cluster waits before declaring a node unresponsive. Default values (5s/15s) are often too aggressive for high-latency networks.
  • evs.send_window and evs.user_send_window manage the number of in-flight write-sets. Increasing these improves throughput but raises memory consumption.
  • Reference the official MariaDB wsrep_provider_options Documentation for provider-specific EVS (Extended Virtual Synchrony) parameter matrices.

Automation & Validation Pipelines

Platform teams must treat wsrep.cnf as a version-controlled, testable artifact. Manual edits introduce drift and invalidate cluster consistency guarantees.

Python-Based Configuration Validation

The following pattern demonstrates how to parse, validate, and enforce parameter alignment before deployment:

import configparser
import subprocess
import sys

def validate_wsrep_config(config_path: str) -> bool:
    config = configparser.ConfigParser()
    config.read(config_path)
    
    required_keys = {
        "wsrep_cluster_name", "wsrep_cluster_address", 
        "wsrep_node_name", "wsrep_node_address", "wsrep_sst_method"
    }
    
    if not config.has_section("mysqld"):
        print("ERROR: Missing [mysqld] section")
        return False
        
    missing = required_keys - set(config.options("mysqld"))
    if missing:
        print(f"ERROR: Missing required parameters: {', '.join(missing)}")
        return False
        
    # Validate syntax via MariaDB binary
    result = subprocess.run(
        ["mariadbd", "--validate-config", "--defaults-file=/etc/mysql/mariadb.cnf"],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        print(f"CONFIG SYNTAX ERROR:\n{result.stderr}")
        return False
        
    return True

if __name__ == "__main__":
    if validate_wsrep_config("/etc/mysql/wsrep.cnf"):
        print("Configuration validated. Proceeding with deployment.")
        sys.exit(0)
    sys.exit(1)

CI/CD Integration & Drift Detection

  • Store wsrep.cnf templates in Git with parameterized placeholders for wsrep_node_address and wsrep_node_name.
  • Use CI pipelines to run syntax validation and unit tests against a staging Galera container.
  • Implement periodic drift detection by comparing live SHOW GLOBAL VARIABLES LIKE 'wsrep_%'; output against the rendered configuration.
  • Official MariaDB system variable documentation provides authoritative defaults and scope definitions: MariaDB Galera System Variables.

Production Hardening & Incident Response

Common Failure Modes & Remediation

Symptom Root Cause Immediate Action
WSREP: Failed to read uuid:seqno from joiner SST authentication mismatch or wsrep_sst_method unsupported Verify wsrep_sst_auth credentials and ensure mariabackup is installed on all nodes
WSREP: Flow control paused gcs.fc_limit too low or wsrep_slave_threads undersized Increase gcs.fc_limit to 256+, monitor wsrep_local_recv_queue_avg
WSREP: view(view_id(NON_PRIM,...)) Network partition or evs.inactive_timeout too aggressive Verify inter-node connectivity, adjust EVS timeouts, check firewall rules for ports 4567-4568
WSREP: gcache page size mismatch gcache.page_size changed without full cluster restart Revert to original value or perform rolling restart with wsrep_provider_options aligned

Log Analysis & Telemetry

Galera logs to mysqld.log or journalctl -u mariadb. Filter critical events:

journalctl -u mariadb -f | grep -E "WSREP:|Galera|gcomm"

Monitor these status variables in real-time:

  • wsrep_cluster_size: Active node count
  • wsrep_local_state_comment: Node state (Synced, Donor/Desynced, Joining)
  • wsrep_flow_control_paused_ns: Cumulative flow control stall duration
  • wsrep_last_committed: Latest certified transaction ID

Align monitoring thresholds with your SLA requirements. Automated alerting should trigger when wsrep_flow_control_paused_ns exceeds baseline or when wsrep_cluster_size drops below quorum.