Deciphering Your Site's Access Logs

Requirements

Searching your site’s access logs can be difficult! There’s a lot of information to visually take in, which means viewing the entire log entry can sometimes be overwhelming. In this article we’ll discuss ways to parse your access logs to better understand trends on your site!

Before we get started, let’s talk prerequisites. In order to use the bash commands we’ll discuss, you’ll need:

Apache and/or Nginx
SSH access to your server

Using grep, cut, sort, uniq, and awk

Before we dive into the logs themselves, let’s talk about how we’ll be parsing them. First, you’ll want to understand the bash commands we use to collect specific pieces of information from the logs.

cat

Used to “concatenate” two or more files together. In the case of searching access logs, we’d typically use it to print out results from two files together, to search for information in both. For example, if I wanted to print or search both today’s and yesterday’s logs.

zcat

Works the same as cat, but concatenates and prints out results of gzip (.gz) compressed files specifically.

grep

Use to search for a specific string or quote in a single file.

ack-grep

Used to find strings in a file or several files. The ack-grep command is more efficient than grep when looking for results in a directory or multiple files. And, unlike a standard grep, ack-grep prints out the specific line in which the text string was found. It’s also easier to only search in files of a specific kind, like PHP files for example.

cut

Used to cut out specific pieces of information, useful for sorting through grep results. Using the -d flag you can set a “delimiter” (dividing signal). And using the -f flag you can choose which field(s) separated by said delimiter to print out specifically.

sort

Sorts output results from lowest to highest, useful for sorting grep and cut results. You can use sort -rn to show results from highest to lowest instead.

uniq

Filters out repeating lines, most often used as uniq -c to get a count of unique results per entry. The uniq command filters out repeating lines by combining any repeated entries into a single result, to only show “unique” entries.

awk

A text processor for Unix – typically used as a sorting mechanism by using awk -F (-F meaning “Field separator”).

head

Says to print the top 10 lines of the specified file(s). You can adjust the number of lines by adding the -n flag to the end, n meaning any number.

tail

Says to print the last 10 lines of the specified file(s). Similar to head, you can adjust the number of lines by adding the -n flag to the end, where n means any number. You can also use tail to live monitor access logs using tail -f logname.log.

find

Used to find files by a particular name, date, or extension type. You can use the flags -d to find directories, or -f to search files.

Apache access logs

So how do we put all the information above together? Let’s start by looking at the Apache access logs. Apache is the most commonly-used web server, so it’s a good place to start.

Before you get started, be sure to locate your access logs. Depending on the version of Linux you’re running, it could be in a slightly different location. On my server I’m using Ubuntu so my logs are found in the /var/log/apache2/ directory. Here’s an example of some of my Apache access logs for a reference point:

104.54.207.40 techgirlkb.guru - [18/Aug/2017:21:52:08 +0000] "GET /wp-content/uploads/2017/08/lynx_server_status_2.png HTTP/1.1" 200 223989 "http://techgirlkb.guru/2017/08/troubleshooting-high-server-load/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
104.54.207.40 techgirlkb.guru - [18/Aug/2017:21:52:08 +0000] "GET /wp-content/uploads/2017/08/sar-q-output-768x643.png HTTP/1.1" 200 357286 "http://techgirlkb.guru/2017/08/troubleshooting-high-server-load/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
104.54.207.40 techgirlkb.guru - [18/Aug/2017:21:52:08 +0000] "GET /wp-content/cache/autoptimize/js/autoptimize_a60c72b796d777751fdd13d6f0375f9c.js HTTP/1.1" 200 49995 "http://techgirlkb.guru/2017/08/troubleshooting-high-server-load/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
104.54.207.40 techgirlkb.guru - [18/Aug/2017:21:52:09 +0000] "GET /wp-content/uploads/2017/08/htop_mysql_usage.png HTTP/1.1" 200 456536 "http://techgirlkb.guru/2017/08/troubleshooting-high-server-load/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
104.54.207.40 techgirlkb.guru - [18/Aug/2017:21:52:09 +0000] "GET /wp-content/uploads/2017/08/mytop.jpg HTTP/1.1" 200 417613 "http://techgirlkb.guru/2017/08/troubleshooting-high-server-load/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

Totals of HTTP response codes

cut -d' ' -f9 techgirlkb.access.log | sort | uniq -c | sort -rn

This command says, using space (‘ ‘) as a delimiter, print the 9th field from techgirlkb.access.log. Then sort says to arrange the results lowest to highest, uniq -c says to combine all the same response codes into one entry and get a count, and then the sort -rn says to then sort your results from highest to lowest count. You should get an output like this:

Hits to Apache per hour

awk -F'[' '{print $2 }' techgirlkb.access.log | awk -F: '{ print "Time: " $1,$2 ":00" }' | sort | uniq -c | more
This command says, using ‘[‘ as a delimiter, print the 2nd field from my Apache access log. Then with that output, using ‘:’ as a delimiter print the word “Time :” followed by the 1st and 2nd fields followed by “:00”. This allows you to group the requests in the hour together and get a count. You should get an output that looks like this:

293 Time: 18/Aug/2017 00:00
78 Time: 18/Aug/2017 01:00
188 Time: 18/Aug/2017 02:00
79 Time: 18/Aug/2017 03:00
27 Time: 18/Aug/2017 04:00
14 Time: 18/Aug/2017 05:00
40 Time: 18/Aug/2017 06:00
4 Time: 18/Aug/2017 07:00
74 Time: 18/Aug/2017 08:00

Top requests to Apache for today and yesterday

This command says to concatenate today and yesterday’s access logs. Then, using space as the delimiter, print the 7th field. With that output, sort from lowest to highest, group like entries together, sort them by the count and then print the top 20 results. Your output should look something like this:

202 /wp-admin/admin-ajax.php
59 /feed/
49 /2017/08/troubleshooting-high-server-load/
37 /2017/08/the-anatomy-of-a-ddos/
33 /

Top IPs to hit the site today and yesterday

This command says to concatenate the logs for today and yesterday. Then, with the output using space as a delimiter, print the 1st field. Sort and group your results together to get the list of the top 25 IPs. Here’s an example output:

343 104.54.207.40
341 104.196.38.166
114 66.162.212.19
75 38.140.212.19
56 173.212.242.97
46 5.9.106.230
45 173.212.203.245

Top User-Agents to hit the site today and yesterday

This command says to concatenate today’s and yesterday’s logs. Then with the output, using double quotes (“) as the delimiter, print the 6th field. Sort and combine the results, printing the top 20 unique User-Agent strings.

628 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
339 WordPress/4.8.1; http://techgirlkb.guru
230 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
230 Mozilla/5.0 (compatible; MJ12bot/v1.4.7; http://mj12bot.com/)
209 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Top HTTP referrers to the site today

cut -d'"' -f4 techgirlkb.access.log | sort | uniq -c | sort -rn | head -20

This command says to print the 4th field (using ” as the delimiter), sort and combine into unique entries and get a count, then print the top 20 results. Here’s what your output should look like:

714 -
278 http://techgirlkb.guru/2017/08/the-anatomy-of-a-ddos/
151 http://techgirlkb.guru/wp-admin/post-new.php
100 http://techgirlkb.guru/wp-admin/options-general.php?page=NextScripts_SNAP.php
87 http://techgirlkb.guru/wp-admin/

Nginx access logs

If you use Nginx access logs, you’ll need to adjust your commands to match the log format. Here’s what my Nginx logs look like, for an example to gather data:

18/Aug/2017:22:31:41 +0000|v1|104.196.38.166|techgirlkb.guru|499|0|127.0.0.1:6789|-|0.996|POST /wp-cron.php?doing_wp_cron=1503095500.1473810672760009765625 HTTP/1.1
18/Aug/2017:22:31:42 +0000|v1|104.54.207.40|techgirlkb.guru|302|4|127.0.0.1:6788|2.941|3.109|POST /wp-admin/post.php HTTP/1.1
18/Aug/2017:22:31:43 +0000|v1|104.54.207.40|techgirlkb.guru|200|60419|127.0.0.1:6788|0.870|0.870|GET /wp-admin/post.php?post=182&action=edit&message=10 HTTP/1.1
18/Aug/2017:22:31:44 +0000|v1|104.54.207.40|techgirlkb.guru|200|1321|-|-|0.000|GET /wp-content/plugins/autoptimize/classes/static/toolbar.css?ver=1503095503 HTTP/1.1
18/Aug/2017:22:31:44 +0000|v1|104.54.207.40|techgirlkb.guru|404|564|-|-|0.000|GET /wp-content/plugins/above-the-fold-optimization/admin/css/admincp-global.min.css?ver=2.7.12 HTTP/1.1

Notice that in these access logs, most fields are separated by a pipe (|). We also have some additional information in these logs, like how long the request took and what port it was routed to. We can use this information to look at even more contextual data. Usually your Nginx access logs can be found in /var/log/nginx/.

Totals of HTTP response codes today

cut -d'|' -f5 techgirlkb.access.log | sort | uniq -c | sort -rn
This command says to print the 5th field using pipe (|) as a delimiter. Then, sort and combine the results into unique entries by count, and show us the results sorted highest to lowest frequency. Your output should look something like this:

Top requests to Nginx for today and yesterday

cut -d' ' -f3 techgirlkb.access.log | sort | uniq -c | sort -rn | head -20

This command says to use space as a delimiter, print the 3rd field. Then sort and combine the results, and print the top 20 results. Your output should look like this:

142 /wp-admin/admin-ajax.php
79 /wp-content/uploads/2017/08/staircase-600468_1280.jpg
54 /2017/08/the-anatomy-of-a-ddos/
53 /
41 /?url=https%3A//www.google-analytics.com/analytics.js&type=js&abtf-proxy=fd0703a0b1e757ef151b57e8dec02b32
32 /wp-includes/js/wp-emoji-release.min.js?ver=4.8.1
30 /robots.txt
30 /feed/

Show requests that took over 60 seconds to complete today

awk -F\| '{ if ($9 >= 60) print $0 }' techgirlkb.access.log
This command says, using pipe (|) as a delimiter, print the entire line only if the 9th field is greater than or equal to 60. The 9th field shows the response time in our Nginx logs. Your output should look something like this:

18/Aug/2017:00:30:45 +0000|v1|104.54.207.40|techgirlkb.guru|200|643|127.0.0.1:6788|63.400|67.321|POST /wp-admin/async-upload.php HTTP/1.1
18/Aug/2017:00:30:49 +0000|v1|104.54.207.40|techgirlkb.guru|200|644|127.0.0.1:6788|62.343|63.828|POST /wp-admin/async-upload.php HTTP/1.1
18/Aug/2017:00:30:56 +0000|v1|104.54.207.40|techgirlkb.guru|200|642|127.0.0.1:6788|61.613|64.402|POST /wp-admin/async-upload.php HTTP/1.1

Bash Tips and Tricks

When creating commands for searching your logs, there are a few best practices to keep in mind. We’ll cover some tips that apply to searching access logs below.

cat filename | grep pattern – never cat a single file and pipe to a grep. The cat command is made to concatenate the contents of multiple files. It’s much more efficient to use a format like: grep “pattern” filename

expr and seq – don’t use outdated scripting methods. Some examples would be expr and seq. For seq, counting is now built into bash, so it’s no longer needed. And expr is inefficient in that is written for outdated shell systems. In these older systems, expr starts a process that calls other programs to do the requested work. In general, it’s best practice to use newer and more efficient methods when making scripts.

ls -l | awk ‘{ print $8 }’ – If you’re searching for a filename within a directory, never parse the output of ls! The ls command is not consistent in its output across various platforms, meaning running it in one environment may offer far different results from another environment. And since some file names can contain a new line, this causes weird output in some cases as well.

For more best practices when writing scripts, check out this Bash FAQ.

Happy log crunching! I hope these tips and quick commands help you understand the output of your access logs, and help you build your own scripts to do the same. Have any other quick one-liners you use to search your logs? Let me know in the comments, or contact me.

Deciphering Your Site’s Access Logs

Requirements

Using grep, cut, sort, uniq, and awk

cat

zcat

grep

ack-grep

cut

sort

uniq

awk

head

tail

find

Apache access logs

Totals of HTTP response codes

Hits to Apache per hour

Top requests to Apache for today and yesterday

Top IPs to hit the site today and yesterday

Top User-Agents to hit the site today and yesterday

Top HTTP referrers to the site today

Nginx access logs

Totals of HTTP response codes today

Top requests to Nginx for today and yesterday

Show requests that took over 60 seconds to complete today

Bash Tips and Tricks

Related

Categories

Requirements

Using grep, cut, sort, uniq, and awk

Apache access logs

Totals of HTTP response codes

Hits to Apache per hour

Top requests to Apache for today and yesterday

Top IPs to hit the site today and yesterday

Top User-Agents to hit the site today and yesterday

Top HTTP referrers to the site today

Nginx access logs

Totals of HTTP response codes today

Top requests to Nginx for today and yesterday

Show requests that took over 60 seconds to complete today

Bash Tips and Tricks

Share this:

Related

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

Categories