High Availability with HAproxy , Apache2 and MariaDB Galera Cluster Ubuntu 18.04 LTS & Centos 7
In this post we create a highly available web & database cluster using a compbination of:
- MariaDB Galera Cluster of three nodes on Ubuntu 18.04
- HAproxy for load balancing (requests enter at this endpoint) on Centos 7
- Two Apache2 instancesfor web server(s)
The config was used in a real world scenario for a UK television show Dragons Den, where we kept one of the contestants websites functioning smoothly whilst sustaining over 133,000 visits in the space of 15 minutes. The next contestant wasn't so prepared!
MariaDB Galera Cluster
You need a minimum of three instances/servers to form a Galera Cluster.
If you're using a VPS provider, consider putting your MariaDB instances on their own virtual private network. Also, locality is important here to reduce latency. The closer togeather, the less latency- at the cost of avalability.
The latest versions of MariaDB include the needeed wsrep API for Galera Cluster. There's no need to install the wsrep wrapper seperatly.
Create three mariadb instances
If you need a VPS provider, you can deploy this entire config on Vultr, and create three instances from there for your MariaDB Galera Cluster following this.
Install MariaDB on all your MariaDB Instances
On all three MariaDB instances, install on configure MariaDB.
Update & upgrade your new instance:
sudo apt update; apt upgrade -y
Install MariaDB Server:
sudo apt install -y mariadb-server mariadb-client
Set MariaDB Cluster client config
Uncomment bind_address
in /etc/mysql/mariadb.conf.d/50-server.cnf
:
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
#bind-address = 127.0.0.1
This will is replaced by the bind_address
in /etc/mysql/conf.d/cluster.cnf
which is not exposed to the internet (only the load balancer will be).
Why? this is because our /etc/mysql/conf.d/cluster.cnf
config will ask MariaDB to bind to an internal nework address (by default Mariadb will only listen on localhost).
On each MariaDB instance, enter the following config.
Changing the:
- bind-address
- wsrep_cluster_name (the same on all instances)
- wsrep_cluster_address (comma seperated list of each MariaDB instance in your cluster)
- wsrep_node_address
- wsrep_node_name
vim /etc/mysql/conf.d/cluster.cnf
[mysqld]
query_cache_size=0
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=0
bind-address=<this_name_ip>
#Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
#wsrep_provider_options="gcache.size=32G"
#Galera Cluster Configuration
wsrep_cluster_name="our_cluster"
wsrep_cluster_address="gcomm://first_ip,second_ip,third_ip"
#Galera Synchronization Congifuration
wsrep_sst_method=rsync
#wsrep_sst_auth=user:pass
#Galera Node Configuration
wsrep_node_address="this_node_ip"
wsrep_node_name="maria1"
Netplan Networking Config Maria1,2 and 3
vim sudo vim /etc/netplan/10-ens3.yaml
network:
version: 2
renderer: networkd
ethernets:
ens7:
match:
macaddress: <mac-address>
mtu: 1450
dhcp4: no
addresses: [ip-address/mask]
Apply netplan configuration.
netplan apply
Run mysql_secure_installation
mysql_secure_installation
Allow access from apache2 instances
In preperation for apache, give access to the database instances from apache.
Be aware than the Mariadb mysql.User
table cannot be replicated because of the storage engine it uses (this might make you think that user account creation cannot be replicated- but they can!). At a higher level, DDL statements are replicated accross the cluster, so statements like CREATE USER
are replicatee, whereas statements like INSERT INTO mysql.User
are not. For the full story read: "User changes not replicating"" at: http://galeracluster.com/documentation-webpages/userchanges.html
For example, on your private network:
GRANT ALL PRIVILEGES ON *.* TO 'root'@'10.8.96.%'
IDENTIFIED BY 'your-password' WITH GRANT OPTION;
Remember your individual database users will need to have their own user, and grant permissions. For example, a user tom
with a database called tomdb
:
CREATE DATABASE tom;
CREATE USER 'tom'@'10.8.96.%' IDENTIFIED BY 'your-password';
GRANT ALL ON tomdb.* TO 'tom'@'10.8.96.%';
The %
allows any nodes on the 10.8.96.0/20
network to connect to the database instances (note this is an application layer rule, you still need to be able to ping the database, and have them listening on the correct interface using the bind-address
setting above).
Verify access
SELECT User, Host FROM mysql.user WHERE Host <> 'localhost';
Configure Ports
Enable the ufw firewall:
ufw enable
Allow the following ports:
ufw allow 22/tcp
ufw allow 3306/tcp
ufw allow 4444/tcp
ufw allow 4567/tcp
ufw allow 4568/tcp
ufw allow 4567/udp
For a complete list of MariaDB ports see: http://galeracluster.com/documentation-webpages/firewallsettings.html
What are these port for?
- 3306 For MySQL client connections and State Snapshot Transfer that use the mysqldump method.
- 4567 For Galera Cluster replication traffic, multicast replication uses both UDP transport and TCP on this port.
- 4568 For Incremental State Transfer.
- 4444 For all other State Snapshot Transfer.
Create Swapfile
Memory requirements for Galera Cluster are difficult to predict with any precision. The particular amount of memory it uses can vary significantly, depending upon the load the given node receives. In the event that Galera Cluster attempts to use more memory than the node has available, the mysqld instance crashes.
The way to protect your node from such crashing is to ensure that you have sufficient swap space available on the server, either in the form of a swap partition or swap files. Docs
(Check you have enough space) then:
fallocate -l 10G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
Add the swap file to /etc/fstab
to make it permanent:
vi /etc/fstab
Add the swap file to your /etc/fstab
file:
echo "/swapfile none swap defaults 0 0" >> /etc/fstab
Verify swap is on:
swapon --summary
Start the Galera database cluster
Stop all mysql services on all nodes
sudo systemctl stop mysql
On node1 only
Because Ubuntu 18.04 uses systemd (and not Upstart init) there MariaDB provides the upstart scrpt galera_new_cluster
which is stored in /usr/bin/galera_new_cluster
by default.
On your first database instance only, issue:
galera_new_cluster
Verify the initial first node (currently primary) is OK:
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
You should see value 1
.
Now you can start mariadb on the remaining nodes, and they will attempt to join the cluster.
On each remaining node (1 & 2):
systemctl start mysql
You will see:
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
& Then finally:
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
Check your logs if you don't see this. `tail -f /var/log/mysql/error.log`
Set up apache2 instances
We're going to create two apache instances, and use HAproxy to load balance between them.
Set-up apache instances
Update & upgrade each apache instance:
sudo apt update; apt upgrade -y
Install apache2 and mariadb-client in order to access the cluster:
sudo apt update; sudo apt upgrade -y; apt install apache2 mariadb-client -y;
Exable apache modules:
sudo a2enmod ssl rewrite; systemctl restart apache2;
Install PHP if needed:
sudo apt install -y php libapache2-mod-php php-mysql; systemctl restart apache2.service;
Each apache instance will of course need the same virtual hosts configured on each machine. You might use rsync for this, or even more simple simply scp -r /var/www root@ip-of-other-machine:/var/
to copy over your web files.
In addition, you'll need to copy over virtual host definitions e.g.
scp -r /etc/apache2/sites-available/ root@ip-of-other-machine:/etc/apache2/sites-available/
Finally, make sure each virtual host you want to load balance is enabled using a2ensite
. e.g. sudo a2ensite /etc/apache2/sites-available/example.conf
Setup HAproxy
Centos 7
yum install haproxy -y
Configure IP address.
Your HAproxy needs to be able to connect to the apache cluster, so via another virtual interface:
cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
BOOTPROTO="static"
ONBOOT="yes"
TYPE="Ethernet"
NM_CONTROLLED="no"
IPADDR=<internal-net-address>
NETMASK=<netmask>
NOZEROCONF="yes"
ZONE=public
MTU=1450
Open port 80 up on your Centos box:
sudo firewall-cmd --permanent --zone=public --add-service=http
Open up port 443 on your Centos box (for ssl):
sudo firewall-cmd --permanent --zone=public --add-service=https
Reload Centos firewall:
sudo firewall-cmd --reload
Turn on HAproxy logging on Centos 7
https://www.ostechnix.com/setup-centralized-rsyslog-server-centos-7/
Configure SSL on HAproxy using letsencrypt
yum -y install yum-utils
yum-config-manager --enable rhui-REGION-rhel-server-extras rhui-REGION-rhel-server-optional
sudo yum install -y certbot
systemctl stop haproxy
sudo certbot certonly --standalone -d example.co.uk -d www.example.co.uk
systemctl start haproxy
Auto renew SLL for HAproxy on Centos 7
This is inspired by this tutorial.
Create renew-cert.sh
in the root directory.
Add --dry-run
to the command to test your script.
vim /root/renew-cert.sh
#!/bin/bash
/usr/bin/certbot renew --pre-hook "systemctl stop haproxy" --post-hook "systemctl start haproxy"
DOMAIN='example.co.uk' sudo -E bash -c 'cat /etc/letsencrypt/live/$DOMAIN/fullchain.pem /etc/letsencrypt/live/$DOMAIN/privkey.pem > /etc/haproxy/certs/$DOMAIN.pem'
Add as a cron entry:
crontab -e
Add the following:
* * */89 * * /root/renew-cert.sh
Wordpress Specific SSL Redirects
Edit your wp-config.php
file to include:
define('FORCE_SSL_ADMIN', true);
define('FORCE_SSL_LOGIN', true);
if ($_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https')
$_SERVER['HTTPS']='on';
Example HAproxy config
On centos the default location of the HAproxy config is: /etc/haproxy/haproxy.cfg
This example config:
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
tune.ssl.default-dh-param 2048
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch #Allow proxy to redistribute persistent session to a working server.
retries 3
timeout http-request 5s
timeout queue 1m
timeout connect 10s
timeout client 1m # recommended client timeout equal to server timeout to avoid complex debug.
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
errorfile 408 /dev/null # https://www.haproxy.com/blog/haproxy-and-http-errors-408-in-chrome/
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend www-http
# Redirect HTTP to HTTPS
bind *:80
# Adds http header to end of end of the HTTP request
reqadd X-Forwarded-Proto:\ http
# Sets the default backend to use which is defined below with name 'app'
default_backend www-backend
frontend main
bind *:443 ssl crt /etc/haproxy/certs/example.co.uk.pem
# set X-SSL in case of ssl_fc <- explained below
http-request set-header X-SSL %[ssl_fc]
# set x-forward to https
reqadd X-Forwarded-Proto:\ https
default_backend www-backend
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend www-backend
# Redirect with code 301 so the browser understands it is a redirect. If it's not SSL_FC.
# ssl_fc: Returns true when the front connection was made via an SSL/TLS transport
# layer and is locally deciphered. This means it has matched a socket declared
# with a "bind" line having the "ssl" option.
redirect scheme https code 301 if !{ ssl_fc }
balance roundrobin
server apache1 10.8.96.6:80 check maxconn 15
server apache2 10.8.96.7:80 check maxconn 15
Performance tune HAproxy with apache2
With one change to our current HAproxy config we can improve the longest request time from 31.71 secconds to 1.645. How?
Perhaps counterintuitively, by reducing the number of max connections (mancon
) to a lower number. E.g. maxconn 10
. The reason for this is rather than have apache struggle to manage many concurrent requests (the time is takes apache to respond is proporiatate to the number of concurrent requests it must serve) by reducing this number to ten, we can ensure that each apache instance is only ever handling ten requests at a time, additional connections get qeued up at HAProxy. Consider how many requests your site will need to load it's most complicated page with many embeded images for example. This might influence your maxconn
number.
The meaning is too prevent too many requests to be forwarded to an application server, by adding a limit for simultaneous requests for each server of the backend. Fortunately, HAProxy would not reject any request over the limit, unlike some other load balancer does.
For the full story read: https://www.haproxy.com/blog/play_with_maxconn_avoid_server_slowness_or_crash/
Here's an example output of an apache benchmark of adjusting just the maxconn
value for each apache instance from 30 to 10 with the haproxy config:
With maxconn 30:
ab -i -n 100 -c 10 example.co.uk
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking example.co.uk (be patient).....done
...
Connection Times (ms)
min mean[+/-sd] median max
Connect: 103 862 1744.3 380 15173
Processing: 247 5405 6890.9 1267 30117
Waiting: 246 5350 6881.1 1266 30117
Total: 349 6267 7147.7 2462 31710
Percentage of the requests served within a certain time (ms)
50% 2462
66% 5759
75% 11289
80% 13220
90% 17158
95% 21414
98% 23157
99% 31710
100% 31710 (longest request)
With maxconn 10:
Notiche the apache bench call is exactly the same, we've simply improved our HAproxy config / HAproxy performance turning specific to our two apache instances:
ab -i -n 100 -c 10 example.co.uk
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking example.co.uk (be patient).....done
...
Connection Times (ms)
min mean[+/-sd] median max
Connect: 97 128 32.6 122 332
Processing: 276 903 338.9 1024 1517
Waiting: 268 902 339.1 1024 1516
Total: 384 1031 341.4 1134 1645
Percentage of the requests served within a certain time (ms)
50% 1134
66% 1267
75% 1308
80% 1356
90% 1437
95% 1581
98% 1629
99% 1645
100% 1645 (longest request)
Other steps we took
- Informed Vultr of the upcoming traffic spike, which they appreciated
- Not letting them know may have caused automated suspension of the service, as it may have otherwise looked like an attack on