Selecting entries by date

Having seen how we can display the date, we should perhaps look at how we print entries from just one day. To do this, we can use the match operator in awk. This is denoted by the tilde or squiggly line, if you prefer. As we only need the date element, there is no need for us to use both the date and time zone field. The following command shows how to print entries from September 10, 2014:

$ awk ' ( $4 ~ /10\/Sep\/2014/ ) ' access.log

For completeness, this command and partial output is shown in the following screenshot:

The round brackets or parentheses embrace the range of lines that we are looking for and we have omitted the main block, which ensures that we print the complete matching lines from the range. There is nothing stopping us from further filtering on the fields to print from the matching lines. For example, if we want to print out just the client IP address that is being used to access the web server, we can print field 1. This is shown in the following command example:

$ awk ' ( $4 ~ /10\/Sep\/2014/ ) { print $1 } ' access.log

If we want to be able to print the total number of accesses on a given date, we could pipe the entries through to the wc command. This is demonstrated in the following:

$ awk ' ( $4 ~ /10\/Sep\/2014/ ) { print $1 } ' access.log | wc -l

However, if we want to use awk to do this for us, this will be more efficient than starting a new process and we can count the entries. If we use the built-in variable NR, we can print entire lines in the files, not just those within the range. It is best to increment our own variable in the main block instead of matching the range for each line. The END block can be implemented to print the count variable we use. The following command line acts as an example:

$ awk ' ( $4 ~ /10\/Sep\/2014/ ) { print $1; COUNT++ }  END { print COUNT }' access.log

The output of the count from both wc and the internal counter will give us 16205 as a result from the demonstration file. We should use the variable increment within the main block if we want to count and nothing else:

$ awk ' ( $4 ~ /10\/Sep\/2014/ ) { COUNT++ }  END { print COUNT }' access.log

We can see this in the following output: