With Elasticsearch set up, we can now install and configure Logstash. As Logstash will be performing the log collection role, it needs to be installed on the same server as NGINX. For larger installations, you could use Filebeat (also part of the Elastic family) to act as a lightweight log forwarder and have Logstash on its own instance to parse the logs.
Like Elasticsearch, we need Java 8 or higher to run Logstash. As this demo recipe is also using Ubuntu 16.04, we can install it via the following:
apt install openjdk-8-jre
Next, we download the latest copy of Logstash and install it:
wget https://download.elastic.co/logstash/logstash/packages/debian/logstash-5.0.0-alpha4.deb dpkg -i logstash-5.0.0-alpha4.deb
Once the installation is complete, we'll now need to create the configuration file. This file will define where to look for the logs, how to parse them, and then where to send them to. While the complexity can be a bit daunting at first, it's mostly a set and forget scenario once it's working. Here's the configuration which will work for most of the configurations already outlined in this book:
input { file { type => nginx_access path => ["/var/log/nginx/*-access.log"] } } filter { grok { match =>{"message" => "%{COMBINEDAPACHELOG}" } } } output { elasticsearch { hosts => ["192.168.50.5:9200"] } stdout { codec => rubydebug } }
For most installations, this needs to be placed in the /etc/logstash/conf.d/ folder with a .conf filename (for example, nginx.conf).
In the input section, we use the file plugin to monitor all the logs which have the naming pattern *-access.log. The type definition simply allows for easy filtering if your Elasticsearch server has logs from more than one source.
Then, in the filter section, we use the grok plugin to turn plain text data into a structured format. If you haven't used grok patterns before, they're similar to regular expressions and can look quite complex to start with. Because we have NGINX using the combined log format (which has already been defined as a grok pattern), the hard work has already been done for us.
Lastly, the output section defines where we're sending the data. Logstash can send to multiple sources (about 30 different ones), but, in this instance, we're simply sending it to our Elasticsearch server. The hosts setting can take an array of Elasticsearch servers, so that in a big scenario you can load balance the push of the data.
To start Logstash, we can simply call the standard init system, which for our example is systemd:
systemctl start logstash
If we now load a page on any of the monitored sites, you should now have data within Elasticsearch. We can run a simple test by querying the logstash index and returning a record. To do this from the command line, run the following:
curl http://192.168.50.5:9200/logstash-*/_search -d '{ "size": 1 }' | python -m json.tool
I pipe the cURL command through python to quickly format the code; by default, it returns it in a compressed format without whitespaces. While that cuts down on the packet size, it also makes it harder to read. Here's what the output should look like:
While the exact syntax won't make a lot of sense yet, the important bit is to note that our single log line has been parsed into separate fields. The power of this will become evident, once we completed the installation of Kibana.