Graylog is one of the best OSS projects I've ever come across. When deployed in a simple setup (all nodes on the same machine) there's a pretty good guide in the Graylog documentation, but I've been looking for a guide on how to set it up in a HA cluster with all parts using HA. This is what this blog post is about. I will be setting up the following:
- 3x ElasticSearch in a sharding cluster
- 3x MongoDB in a replicating cluster
- 2x Graylog server nodes
- 2x HAProxy and keepalived nodes for TCP (HAProxy) and UDP (keepalived) load balancing
- 1x Graylog Web interface
This guide will be focusing on CentOS 6.x and below, only applicable for the firewall. Everything else should be more or less universal.
ElasticSearch
To start with, install ElasticSearch of the same version on all three nodes. In my cluster I'm using version 1.4.4.
Now to the configuration. Change the following lines on all nodes in elasticsearch.yml:
cluster.name: graylog2
node.name: "[name of node, example elasticsearch01]"
If needed change the number of shards and replicas. I'm using 5 and 1 (standard):
index.number_of_shards: 5
index.number_of_replicas: 1
I use unicast ping, to do this disable multicast ping and the specify the cluster nodes:
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["IP_of_node1:9300", "IP_of_node2:9300", "IP_of_node3:9300"]
ElasticSearch firewall configuration
Now, for ElasticSearch to be able to communicate the firewall has to allow this:
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9200 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9300 -j ACCEPT
If you want you can specify explicit IPs here to the other nodes in the cluster. When this is done, start ElasticSearch on all nodes. To check cluster status, use this command:
[root@es02 ~]# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "graylog2",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
Here I have 5 nodes (two of which are Graylog nodes, which will be covered later) and 3 data nodes, the ones we just set up. The number of shards correspond with what was configured earlier, 5 primary shards and 5 backup shards, 10 in total.
MongoDB
Start be installing MongoDB. MongoDB will be configured to be a three node replica set. Let's have a look at the config changes in mongod.conf:
replSet = graylog
MongoDB firewall config
The firewall has to allow port 27017.
-A INPUT -m state --state NEW -m tcp -p tcp --dport 27017 -j ACCEPT
Initiating the cluster and adding members
Issue the following command on the first node:
mongo
Now initiate the replica set:
rs.initiate()
Add the new members:
rs.add("mongodb02.example.com")
rs.add("mongodb03.example.com")
Now, use the rs.status() command to make sure that everything looks alright.
graylog:PRIMARY> rs.status()
{
"set" : "graylog",
"date" : ISODate("2015-06-01T11:07:08Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mongodb01.example.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 165272,
"optime" : Timestamp(1433156828, 1),
"optimeDate" : ISODate("2015-06-01T11:07:08Z"),
"electionTime" : Timestamp(1432991818, 1),
"electionDate" : ISODate("2015-05-30T13:16:58Z"),
"self" : true
},
{
"_id" : 1,
"name" : "mongodb02.example.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 165018,
"optime" : Timestamp(1433156826, 2),
"optimeDate" : ISODate("2015-06-01T11:07:06Z"),
"lastHeartbeat" : ISODate("2015-06-01T11:07:06Z"),
"lastHeartbeatRecv" : ISODate("2015-06-01T11:07:07Z"),
"pingMs" : 0,
"syncingTo" : "mongodb01.example.com:27017"
},
{
"_id" : 2,
"name" : "mongodb03.example.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 165172,
"optime" : Timestamp(1433156826, 2),
"optimeDate" : ISODate("2015-06-01T11:07:06Z"),
"lastHeartbeat" : ISODate("2015-06-01T11:07:06Z"),
"lastHeartbeatRecv" : ISODate("2015-06-01T11:07:06Z"),
"pingMs" : 0,
"syncingTo" : "mongodb01.example.com:27017"
}
],
"ok" : 1
}
MongoDB is now set up with a three way replication.
Graylog
Install graylog-server on your nodes, I'll be using two nodes as graylog-server nodes.
Configure the nodes like this:
password_secret = [someting very secret]
root_password_sha2 = [password hash]
root_timezone = CET # Or whatever your time zone is
rest_listen_uri = http:/[node IP]:12900/
elasticsearch_shards = 5
elasticsearch_replicas = 1
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_discovery_zen_ping_unicast_hosts = [elasticsearch node 01 IP]:9300,[elasticsearch node 02 IP]:9300,[elasticsearch node 03 IP]:9300
mongodb_uri = mongodb://[MongoDB IP 01]:27017,[MongoDB IP 02]:27017,[MongoDB IP 03]:27017/graylog?replicaSet=graylog
Plus check the config file for any other options you might need.
Graylog firewall config
Graylog will need some firewall configuration aswell.
-A INPUT -m state --state NEW -m tcp -p tcp --dport 12900 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9350 -j ACCEPT
Don't forget to add firewall configuration for your inputs, like 514 TCP/UDP for syslog.
HAProxy
HAProxy will take care of the TCP load balancing. I've configured it to listen on port 514 with my two graylog nodes added as backend servers.
global
log 127.0.0.1 local0
pidfile /var/run/haproxy.pid
debug
defaults
option dontlognull
option redispatch
option contstats
retries 3
timeout connect 5s
timeout queue 30s
timeout tarpit 1m
backlog 10000
option tcplog
option redispatch
log global
timeout client 300s
timeout server 300s
default-server inter 3s rise 2 fall 3
# HTTP server for admin status check
listen stats 0.0.0.0:8000 #Listen on all IP's on port 8000
mode http
balance
timeout client 5000
timeout connect 4000
timeout server 30000
#This is the virtual URL to access the stats page
stats uri /haproxy_stats
#Authentication realm. This can be set to anything. Escape space characters with a backslash.
stats realm HAProxy\ Statistics
#The user/pass you want to use. Change this password!
stats auth admin:secret123
#This allows you to take down and bring up back end servers.
#This will produce an error on older versions of HAProxy.
stats admin if TRUE
# Graylog input
listen graylogsys_1 172.16.0.26:514
mode tcp
balance roundrobin
option tcplog
option tcpka
option httpchk GET /system/lbstatus
http-check expect string ALIVE
server graylog_1 [graylog-server ip 1]:514 check port 12900
server graylog_2 [graylog-server ip 2]:514 check port 12900
maxconn 10000
To check whether a backend server is up or down I use [graylog IP]:12900/system/lbstatus and checks if it says "ALIVE", if it does the node is OK otherwise it will be removed and not get any traffic. I've also changed the name of the input to graylogsys2 for the second HAProxy instance, so that I know which one is active if I connect to 172.16.0.26:8000/haproxystats with my browser.
HAproxy firewall config
More firewall rules which you might need.
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 514 -j ACCEPT
HAproxy SELinux config
Need to allow this SELinux boolean:
setsebool -P haproxy_connect_any 1
HAproxy status
Visit http://[HA_IP]:8000/haproxy_stats and login with admin/your password.
Keepalived
Keepalived will be used for moving the cluster IP between the two loadbalancing servers (172.16.0.26 in my case, as you've seen in the HAproxy config). It will also be responsible for load balancing UDP with a Linux Virtual Server.
global_defs {
lvs_id LoadBalancer01 # Load balancer name
}
vrrp_script check_haproxy {
script "/usr/bin/killall -0 haproxy" # make sure haproxy is running
interval 2 # check every 2 seconds
weight 2 # add weight if OK
}
vrrp_instance FloatIP01 {
state MASTER
interface eth0 # Change this if needed
virtual_router_id 10
priority 101 #Change this to 100 on node 2
advert_int 2
virtual_ipaddress {
172.16.0.26 # Floating IP
}
track_script {
check_haproxy
}
}
virtual_server 172.16.0.26 514 {
delay_loop 6
lb_algo rr
protocol UDP
real_server [Graylog node 1] 514 {
weight 100
}
real_server [Graylog node 2] 514 {
weight 100
}
}
Keepalived firewall config
Keepalived uses VRRP, VRRP needs to be able to talk to the other nodes in the HA cluster via a multicast address and the VRRP protocol.
-A INPUT -d 224.0.0.0/8 -j ACCEPT
-A INPUT -p vrrp -j ACCEPT
Keepalived status
Use ipvsadm to check the status of keepalived UDP load balancer.
[root@lb01 keepalived]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 172.16.0.26:syslog rr
-> graylog01.nickebo.net:514 Local 100 0 1
-> graylog02.nickebo.net:514 Masq 100 0 2
If you shut down keepalived the IP should hopefully move to the other node and a virtual server for UDP should be started. The same thing will happen if you shut down HAproxy, since we don't want the node to have the cluster IP if HAproxy crashes.
Graylog web interface
The graylog webinterface doesn't need very much configuration. I've changed these lines:
graylog2-server.uris="http://[graylog node 01]:12900/,http://[graylog node 02]:12900/"
application.secret="your_sercret"
timezone="Europe/Stockholm" # Change to where you are!
Graylog web interface firewall config
Another rule
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
All done
That should be it. I'm sure I've missed something here, please give me a shout if you feel there's something I should add.