Grok is a plugin available in Logstash; it takes unstructured data from sources such as system logs, MySQL, Apache, and other webserver logs and transforms them into structured and queryable data for easy ingestion into Elasticsearch.
Grok combines text patterns into something that matches the logs, for example, numbers or IP address. The pattern for this is as follows:
%{SYNTAX:SEMANTIC}
Here, SYNTAX is the name of the pattern that matches the text and SEMANTIC is the identifier given to the segment of text.
An example of an event for HTTP would be as follows:
55.3.244.1 GET /index.html 15824 0.043
One pattern match for this could be the following:
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
So, by putting it all together in an actual filter configuration, it looks like this:
input {
file {
path => "/var/log/http.log"
}
}
filter {
grok {
match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
}
}