How to Bootstrap MariaDB Galera on Ubuntu 22.04
Bootstrapping a MariaDB Galera cluster on Ubuntu 22.04 requires strict adherence to state initialization protocols, deterministic network topology, and precise systemd service orchestration. For platform engineers and database administrators, the bootstrap sequence is not merely a startup routine; it is the foundational transaction that establishes the Primary Component (PC) and dictates the cluster’s initial Global Transaction ID (GTID) baseline. Missteps during this phase routinely manifest as split-brain scenarios, wsrep_local_state_comment deadlocks, or safe_to_bootstrap flag corruption. This guide details the production-grade execution path, diagnostic verification, and automated recovery workflows required for multi-master synchronization on Ubuntu 22.04.
Pre-Flight: System Hardening & Network Topology
Before initiating the bootstrap sequence, Ubuntu 22.04’s default AppArmor profiles and systemd resource limits must be reconciled with Galera’s synchronous replication requirements. The MariaDB service runs under strict confinement; verify that /etc/apparmor.d/usr.sbin.mysqld permits read/write access to /var/lib/mysql/ and /var/lib/mysql/ib_logfile*. If SST (State Snapshot Transfer) stalls with Permission denied errors, reload the AppArmor cache:
sudo systemctl reload apparmor
sudo systemctl restart mariadb
Systemd resource limits must accommodate high-concurrency page cache operations. Create a drop-in override to prevent Too many open files during initial SST:
sudo systemctl edit mariadb.service
Insert:
[Service]
LimitNOFILE=1048576
LimitMEMLOCK=infinity
LimitNPROC=65535
Reload the daemon: sudo systemctl daemon-reload.
Network prerequisites demand bidirectional TCP/UDP port 4567 (Galera group communication), TCP 4568 (Incremental State Transfer), TCP 4444 (State Snapshot Transfer), and TCP 3306 (client traffic). Ubuntu’s ufw must permit these ranges without NAT interference. Crucially, hostname resolution must be deterministic; /etc/hosts or internal DNS must map each node’s wsrep_node_address to a static IP. Ambiguous DNS caching during bootstrap frequently triggers WSREP: failed to open gcomm backend connection: 110 (Connection timed out) errors. When configuring the initial node topology, reference the established patterns in Galera Cluster Setup & Node Management to ensure wsrep_provider_options align with your underlying storage IOPS and network MTU.
Deterministic wsrep.cnf Configuration
The wsrep.cnf configuration file must reside in /etc/mysql/mariadb.conf.d/ and explicitly define the cluster address, node address, and state transfer method. Modern deployments should utilize wsrep_sst_method=mariabackup rather than the deprecated xtrabackup-v2, as MariaDB 10.6+ bundles native backup utilities with improved page-level recovery.
Create /etc/mysql/mariadb.conf.d/99-galera.cnf:
[mysqld]
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://"
wsrep_node_address="192.168.10.10"
wsrep_node_name="galera-node-01"
wsrep_sst_method=mariabackup
wsrep_sst_auth="sstuser:StrongSSTPassword!"
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
Note the empty gcomm:// address. This signals the daemon to initialize a new cluster rather than attempt to join an existing one. For comprehensive parameter validation, consult Bootstrapping Your First Galera Cluster before proceeding.
Bootstrap Execution: Systemd & CLI Orchestration
The bootstrap process must be executed on exactly one node — the seed node that forms the initial Primary Component (not to be confused with a donor, which is the role a node takes when serving an SST/IST to a joiner). On Ubuntu 22.04, systemd manages MariaDB via mariadb.service. Direct invocation of mariadbd --wsrep-new-cluster bypasses systemd’s environment variable injection and socket activation, which can cause AppArmor denials or missing MYSQLD_OPTS. Instead, use the official wrapper:
sudo systemctl stop mariadb
sudo galera_new_cluster
This wrapper starts the service with the --wsrep-new-cluster option (it exports _WSREP_NEW_CLUSTER='--wsrep-new-cluster' for the systemd unit), instructing the daemon to form a new Primary Component instead of attempting to join an existing one. The safe_to_bootstrap flag in /var/lib/mysql/grastate.dat is managed by the server itself on clean shutdown — the wrapper does not rewrite it. Monitor the journal for successful PC formation:
sudo journalctl -u mariadb -f --no-pager | grep -E "WSREP|gcomm|Primary"
Expected output sequence:
[Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '192.168.10.10:'
[Note] WSREP: declaring 192.168.10.10 at tcp://192.168.10.10 stable
[Note] WSREP: New cluster view: global state: 1234abcd-0000-0000-0000-000000000000:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
State Validation & GTID Baseline Verification
Immediately after execution, verify the bootstrap state using:
SHOW GLOBAL STATUS LIKE 'wsrep_%';
The critical indicators are:
wsrep_cluster_size: Must equal1wsrep_ready:ONwsrep_local_state_comment:Syncedwsrep_cluster_status:Primarywsrep_gtid_domain_id&wsrep_gtid_mode: Confirm GTID baseline is active
Cross-verify the filesystem state:
sudo cat /var/lib/mysql/grastate.dat
Output must show safe_to_bootstrap: 1. If this flag reads 0 post-bootstrap, the initialization was interrupted or the node crashed before completing PC formation.
Automated Recovery & Python Integration Hooks
Platform teams and DevOps engineers should wrap bootstrap logic in idempotent automation. The following Python 3.10+ snippet demonstrates a production-safe bootstrap validator using mariadb connector and pathlib:
import mariadb
import pathlib
import subprocess
import sys
GRSTATE = pathlib.Path("/var/lib/mysql/grastate.dat")
def is_safe_to_bootstrap() -> bool:
if not GRSTATE.exists():
return False
for line in GRSTATE.read_text().splitlines():
if line.startswith("safe_to_bootstrap:"):
return line.split(":")[1].strip() == "1"
return False
def verify_cluster_state():
conn = None
cur = None
try:
conn = mariadb.connect(user="root", unix_socket="/run/mysqld/mysqld.sock")
cur = conn.cursor()
cur.execute("SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status'")
status = dict(cur.fetchall()).get("wsrep_cluster_status", "")
return status == "Primary"
except mariadb.Error as e:
print(f"DB connection failed: {e}", file=sys.stderr)
return False
finally:
if cur is not None:
cur.close()
if conn is not None:
conn.close()
if __name__ == "__main__":
if is_safe_to_bootstrap():
subprocess.run(["sudo", "galera_new_cluster"], check=True)
if verify_cluster_state():
print("Bootstrap successful. Primary Component established.")
else:
print("Bootstrap failed. Check journalctl -u mariadb.", file=sys.stderr)
sys.exit(1)
else:
print("Node not safe to bootstrap. Verify grastate.dat or force recovery.", file=sys.stderr)
sys.exit(2)
Diagnostic Root-Cause Analysis & Production Recovery
Split-Brain & safe_to_bootstrap Corruption
If multiple nodes attempt bootstrap simultaneously, Galera’s quorum algorithm rejects secondary attempts, leaving safe_to_bootstrap=0. Recovery requires manual intervention:
- Identify the node with the most recent GTID sequence:
sudo cat /var/lib/mysql/grastate.dat | grep uuid - Force the flag on the authoritative node:
sudo sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1/' /var/lib/mysql/grastate.dat - Re-run
sudo galera_new_cluster. Do not force this on multiple nodes; it guarantees data divergence.
AppArmor & SST Failures
If mariabackup SST fails with WSREP_SST: [ERROR] xtrabackup finished with error: 1, check /var/log/mysql/error.log. Common causes include missing sstuser privileges or incorrect wsrep_sst_auth syntax. Grant exact SST permissions:
CREATE USER IF NOT EXISTS 'sstuser'@'localhost' IDENTIFIED BY 'StrongSSTPassword!';
GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
FLUSH PRIVILEGES;
Restart the donor node and trigger SST on joining nodes via sudo systemctl start mariadb.
gcomm Timeout & Network Partitioning
Ubuntu 22.04’s default net.ipv4.tcp_keepalive_time (7200s) is too high for Galera’s failure detection. Tune kernel parameters for rapid partition detection:
sudo sysctl -w net.ipv4.tcp_keepalive_time=30
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=10
sudo sysctl -w net.ipv4.tcp_keepalive_probes=3
Persist in /etc/sysctl.d/99-galera-net.conf. Verify connectivity before bootstrap: nc -zv <peer_ip> 4567.
Graceful Rollback Path
If bootstrap fails and data integrity is uncertain, isolate the node, stop the service, and restore from a verified backup before re-attempting initialization. Never run galera_new_cluster on a node with active client connections or pending transactions. For detailed SST recovery and point-in-time alignment, review official MariaDB backup documentation at MariaDB Backup Overview and systemd execution constraints at systemd.exec Resource Limits.