Designing Multi-Master Topologies

Designing multi-master topologies in MariaDB Galera requires a disciplined approach to synchronous replication mechanics, network topology constraints, and automated lifecycle management. Unlike traditional primary-replica architectures, Galera’s active-active model exposes every node to concurrent write workloads, shifting the operational burden from failover orchestration to deterministic state synchronization and conflict resolution. For database administrators, DevOps engineers, and platform teams, the design phase must account for latency boundaries, certification throughput, and automated recovery workflows before the first node boots. This guide outlines production-grade topology design, configuration tuning, and implementation patterns tailored for infrastructure automation and high-availability database platforms.

Topology Design Principles

A resilient multi-master topology begins with strict latency budgeting. Galera relies on synchronous commit propagation, meaning write latency is directly proportional to the round-trip time between the slowest node in the cluster. Geographic distribution across availability zones or regions requires careful placement of nodes to maintain sub-10ms intra-cluster latency for optimal throughput. When designing node distribution, prioritize odd-numbered clusters to simplify quorum calculations and prevent split-brain scenarios during network partitions. The foundational behavior of these topologies is governed by the MariaDB Galera Core Architecture & Fundamentals, which dictates how state machines synchronize across distributed nodes.

For cross-datacenter deployments, avoid stretching a single synchronous cluster across high-latency WAN links. Instead, deploy a three-node synchronous cluster in the primary site and attach a fourth node via asynchronous replication to the secondary site. This architecture absorbs WAN latency without impacting write certification or triggering flow control. Network segmentation must enforce strict firewall rules: TCP ports 3306 (MySQL), 4444 (SST), 4567 (Galera replication), and 4568 (IST) must be open bidirectionally between all cluster members. Multicast is rarely used in modern deployments; configure gmcast.listen_addr (within wsrep_provider_options) with explicit unicast endpoints to ensure deterministic peer discovery.

Figure: a three-node synchronous cluster in the primary site, with an asynchronous replica absorbing WAN latency in the secondary site.

flowchart LR
    subgraph P["Primary site (synchronous)"]
        N1["Galera node 1"]
        N2["Galera node 2"]
        N3["Galera node 3"]
        N1 --- N2
        N2 --- N3
        N1 --- N3
    end
    subgraph SEC["Secondary site"]
        R["Async replica"]
    end
    N1 -. "async replication" .-> R

Core Configuration & Tuning

Multi-master synchronization depends on precise tuning of the wsrep parameter set. The wsrep_provider_options string is the primary control surface for replication behavior. Key parameters include gcache.size, which must be sized to accommodate Incremental State Transfer (IST) during node restarts. A common production baseline is 2–4GB, but calculate this based on peak write throughput multiplied by maximum expected downtime: gcache.size = peak_writes_per_sec * avg_ws_size * max_downtime_sec.

Flow control is managed via wsrep_flow_control_paused and wsrep_flow_control_sent. When these metrics consistently exceed 0.1, the cluster is experiencing write throttling due to slow appliers or network congestion. The mechanics of Understanding Galera Synchronous Replication reveal how transaction ordering and global transaction IDs (GTIDs) are maintained across concurrent writers. To prevent certification bottlenecks, ensure every table has a primary key; wsrep_certify_nonPK=ON (the default) lets Galera generate surrogate keys so PK-less tables can still be certified deterministically, but explicit primary keys remain strongly recommended. Increase wsrep_slave_threads to match available CPU cores, but cap it at 16–32 to avoid thread contention. For heavy write workloads, tune wsrep_max_ws_rows=131072 and wsrep_max_ws_size=1073741824 to prevent oversized write-set fragmentation.

Automation & Lifecycle Management

Platform teams must treat Galera configuration as immutable infrastructure. Configuration drift between nodes causes immediate certification failures and IST fallbacks. Use configuration management tools (Ansible, SaltStack, or Terraform with cloud-init) to enforce identical my.cnf and wsrep.cnf files across all nodes. Bootstrap sequences require strict ordering: the first node is started with the --wsrep-new-cluster flag (via the galera_new_cluster wrapper) so it forms the initial Primary Component, while subsequent nodes join normally with wsrep_cluster_address="gcomm://node1,node2,node3".

Python automation builders should implement pre-flight validation before executing rolling restarts or configuration pushes. A production-ready script must query SHOW STATUS LIKE 'wsrep_%' to verify wsrep_cluster_size matches the expected node count and wsrep_ready=ON before proceeding. Use the Python subprocess module to safely execute mysqladmin commands with timeouts and structured error handling. The Write-Set Certification Process Explained demonstrates how conflicting transactions are resolved deterministically, meaning automation must never force node joins during active write bursts. Implement exponential backoff in join scripts and validate wsrep_local_state_comment=Synced before routing production traffic to newly provisioned nodes.

Operational Dependencies & Validation

Multi-master topologies introduce operational dependencies that must be codified into monitoring and alerting pipelines. Deploy exporters like mysqld_exporter or prometheus-mariadb-exporter to scrape wsrep_flow_control_paused, wsrep_cert_deps_distance, and wsrep_evs_state. Alert thresholds should trigger at wsrep_flow_control_paused > 0.05 and wsrep_cluster_size < expected_nodes. Network partitions require automated fallback routing: configure HAProxy or ProxySQL with health checks that route writes only to nodes reporting wsrep_local_state_comment=Synced and wsrep_ready=ON.

Validation scripts must run after every deployment or network maintenance window. A minimal validation routine should:

  1. Verify bidirectional connectivity on ports 4567/4568 using nc -zv or Python socket checks.
  2. Confirm wsrep_provider_version matches across all nodes to prevent protocol mismatches.
  3. Execute a synthetic write transaction and verify replication latency via wsrep_last_committed deltas.
  4. Validate SST fallback behavior by temporarily disabling IST and forcing a full state transfer in a staging environment.

Refer to the official Galera Cluster documentation for parameter matrices, version compatibility tables, and certified network driver configurations. Platform teams must integrate these validation steps into CI/CD pipelines to enforce topology integrity before merging infrastructure changes.