Galera Cluster Hardware Requirements: Production Sizing & Validation Workflows
Synchronous multi-master replication fundamentally shifts the performance bottleneck from local storage to network latency and CPU-bound certification. Unlike asynchronous replication, where a primary node commits independently, MariaDB Galera Cluster requires every transaction to be certified, replicated, and applied across all nodes before returning success to the client. This architectural reality means hardware sizing is not a capacity exercise; it is a deterministic latency engineering problem. For database administrators, DevOps engineers, and platform teams building automation pipelines, establishing baseline hardware profiles before cluster formation prevents cascading flow control events, certification backlogs, and split-brain recovery scenarios. Properly dimensioned infrastructure directly dictates the stability of your Galera Cluster Setup & Node Management workflows.
CPU Topology & NUMA Alignment
Galera’s certification and apply threads scale linearly with available physical cores. Hyperthreading introduces unpredictable context-switching latency during high-concurrency write bursts. Production deployments should prioritize dedicated physical cores and disable simultaneous multithreading (SMT) where sub-millisecond commit latency is required. The wsrep_slave_threads parameter must align with physical core counts to avoid thread thrashing during parallel apply.
NUMA topology misalignment is a frequent source of intermittent latency spikes. When the InnoDB buffer pool spans multiple NUMA nodes, memory access penalties compound during write-set certification. Validate topology alignment before provisioning:
#!/usr/bin/env bash
# validate_cpu_topology.sh
# Checks physical cores, SMT status, and NUMA node distribution
set -euo pipefail
PHYSICAL_CORES=$(lscpu | awk -F: '/^Core\(s\) per socket/ {print $2}' | tr -d ' ')
SOCKETS=$(lscpu | awk -F: '/^Socket\(s\)/ {print $2}' | tr -d ' ')
SMT_ACTIVE=$(lscpu | awk -F: '/^Thread\(s\) per core/ {print $2}' | tr -d ' ')
echo "Physical Cores: $((PHYSICAL_CORES * SOCKETS))"
echo "SMT Threads per Core: $SMT_ACTIVE"
if [ "$SMT_ACTIVE" -gt 1 ]; then
echo "WARNING: SMT detected. Disable via BIOS or 'echo off > /sys/devices/system/cpu/smt/control' for latency-sensitive workloads."
fi
NUMA_NODES=$(numactl --hardware | grep "available:" | awk '{print $2}')
echo "NUMA Nodes Available: $NUMA_NODES"
if [ "$NUMA_NODES" -gt 1 ]; then
echo "ACTION REQUIRED: Pin mysqld and galera processes via numactl --cpunodebind --membind in systemd unit file."
fi
For platform teams automating node provisioning, integrate this check into your configuration management pipeline. Fail the deployment if SMT is active or if numactl cannot isolate memory nodes.
Memory Allocation & gcache Sizing
Memory provisioning must account for three distinct consumers: the InnoDB buffer pool, the operating system page cache, and Galera’s in-memory write-set cache (gcache). The gcache.size parameter dictates how many pending transactions can be queued during network partitions or slow node apply cycles. If the write-set volume exceeds gcache.size, the lagging node triggers an Incremental State Transfer (IST) failure and falls back to a full State Snapshot Transfer (SST), which blocks the donor node and degrades cluster throughput.
Calculate baseline gcache.size using peak write throughput and expected apply latency:
gcache.size (MB) = Peak_Write_MB/s × Max_Apply_Latency_s × 1.5 (Safety Factor)
For example, a cluster sustaining 250 MB/s writes with a 3-second worst-case apply latency requires a minimum gcache.size of 1125 MB. Round up to 2G or 4G to accommodate burst traffic. Memory limits must be explicitly defined in wsrep_provider_options to prevent OOM kills during certification storms. Detailed parameter mapping and provider-level tuning are covered in the wsrep.cnf Configuration Deep Dive.
Reserve 20% of total RAM for OS page cache and Galera overhead before allocating innodb_buffer_pool_size. Never allocate 100% of system memory to InnoDB.
Network Fabric & Latency Constraints
Synchronous replication demands a low-latency, high-bandwidth network fabric. Maximum round-trip time (RTT) must remain under 1ms for geographically localized clusters. Cross-DC deployments require WAN-optimized topologies or asynchronous fallback; synchronous Galera across >50ms latency will trigger constant flow control and certification timeouts.
Jumbo frames (MTU 9000) reduce packet overhead for large write-sets and significantly lower CPU interrupt load. Validate end-to-end MTU consistency to prevent fragmentation:
ping -M do -s 8972 <peer_ip>
TCP stack tuning prevents head-of-line blocking during certification bursts. Apply these kernel parameters across all cluster nodes:
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_congestion_control = bbr
Reference the Linux Kernel IP/TCP sysctl Documentation for congestion algorithm behavior under sustained throughput. Network readiness is a hard prerequisite before executing Bootstrapping Your First Galera Cluster, as bootstrap failures frequently trace back to asymmetric routing or MTU mismatches.
Storage I/O & Write-Ahead Log Alignment
Disk I/O directly impacts SST performance, local transaction commit latency, and recovery speed. NVMe storage is mandatory for production Galera nodes. Use hardware RAID 10 or enterprise SSDs with battery-backed write cache (BBWC). Format volumes with XFS or ext4 using noatime,nodiratime mount options to eliminate unnecessary metadata writes.
Align InnoDB log sizing with gcache capacity to prevent double-write bottlenecks during heavy load:
innodb_log_file_size >= gcache.size / 4
Set innodb_flush_log_at_trx_commit=1 and sync_binlog=1 for strict ACID compliance. Bypassing these settings sacrifices data integrity for marginal throughput gains, which defeats the purpose of synchronous multi-master architecture. Validate storage performance using fio before deployment:
fio --name=galera_validation --ioengine=libaio --rw=randwrite --bs=16k --direct=1 --size=10G --numjobs=4 --runtime=60 --time_based --group_reporting --output-format=json
Target sustained 4k random write IOPS > 50,000 and latency < 0.5ms at p99.
Pre-Deployment Validation & Automation Hooks
Platform teams should embed hardware validation into Infrastructure-as-Code pipelines. The following Python script automates baseline checks for CPU cores, memory availability, network RTT, and storage latency, returning structured JSON for CI/CD gating:
#!/usr/bin/env python3
import subprocess
import json
import sys
def run_cmd(cmd):
return subprocess.check_output(cmd, shell=True).decode().strip()
def validate_galera_hardware():
report = {}
# CPU Check — nproc reports logical CPUs (including SMT threads), not
# physical cores. Derive physical cores from `lscpu` if a stricter gate is needed.
cpus = int(run_cmd("nproc"))
report["logical_cpus"] = cpus
if cpus < 4:
report["cpu_status"] = "FAIL: Minimum 4 logical CPUs required"
else:
report["cpu_status"] = "PASS"
# Memory Check
mem_kb = int(run_cmd("awk '/MemTotal/ {print $2}' /proc/meminfo"))
mem_gb = mem_kb / 1024 / 1024
report["total_memory_gb"] = round(mem_gb, 2)
if mem_gb < 16:
report["memory_status"] = "FAIL: Minimum 16GB RAM required"
else:
report["memory_status"] = "PASS"
# Network RTT Check (example peer)
peer = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
ping_out = run_cmd(f"ping -c 3 -W 1 {peer} | grep rtt")
avg_rtt = float(ping_out.split("/")[4])
report["avg_rtt_ms"] = avg_rtt
if avg_rtt > 1.0:
report["network_status"] = "WARN: RTT > 1ms. Verify local network topology."
else:
report["network_status"] = "PASS"
print(json.dumps(report, indent=2))
sys.exit(0 if all(v == "PASS" for v in [report.get("cpu_status"), report.get("memory_status")]) else 1)
if __name__ == "__main__":
validate_galera_hardware()
Execute this validation during node provisioning, before MariaDB package installation. Integrate exit codes into your orchestration tool to halt deployment if hardware baselines are unmet. Automated validation eliminates guesswork and ensures every node entering the cluster meets deterministic latency thresholds.