Simple website visitor stats and location with GoAccess

I do not use cookies or browser trackers (like Google Analytics) on any of my personal websites to track visitor information.

From time to time however, it is nice to get an idea of the amount traffic to my sites, including what pages get viewed the most.

I use the following one-liner for this:

sudo zcat -f /var/log/nginx/access.log* | goaccess --log-format=VCOMBINED --geoip-database dbip-city-lite-2022-09.mmdb

Read on below to see how I made this work.

Introduction

I host my websites on a small Hetzner VM and use nginx as web server and/or reverse proxy for PHP (for WordPress sites)

The default server logs for nginx are enough to count basic server visits, and for this purpose, I use GoAccess.

GoAccess usually works out of the box like this:

goaccess /var/log/nginx/access.log

For me, this command is a bit limited though, specifically:

  1. It only includes the latest access log. Nginx rotates the logs and the older ones get stored in gzip’ed files. Ideally, these should be included.
  2. The default log format in Nginx seems to not include the domain (e.g. “davidlebech.com”) so if there are multiple sites on the same server, their views will be mixed.
  3. It is not possible to see where people are from in the world.

I solved all three earlier this year by making some simple changes to the nginx config and the command itself.

Use combined log format

First, change the nginx configuration in /etc/nginx/nginx.conf with the following lines:

log_format vcombined '$host:$server_port '
        '$remote_addr - $remote_user [$time_local] '
        '"$request" $status $body_bytes_sent '
        '"$http_referer" "$http_user_agent" "$gzip_ratio"';
access_log /var/log/nginx/access.log vcombined;

This tells nginx to use the “combined log format with virtual host” for the access log files. The logs now include the domain, thereby solving issue 2 from above.

Ensure IP addresses are correct from Cloudflare

In order to get location, we can use the logged IP address of people to look up in a local database. To enable this, there is an important extra step to take when using Cloudflare as a proxy. By default, the logged IP are the Cloudflare server’s IPs. The server needs to know where to get the real IP address of the visitor from.

Here’s the official guide from Cloudflare. Here’s my short version:

  1. Copy the list of IP addresses from Cloudflare. Both the IPV4 and IPV6 addresses.
  2. Create a new file under /etc/nginx/cloudflare.conf and paste all the addresses, one per line.
  3. Prepend each IP address line with set_real_ip_from
  4. Add to the end of the file:
    real_ip_header CF-Connecting-IP;
  5. In /etc/nginx/nginx.conf, add:
    include /etc/nginx/cloudflare.conf;

This is the entire content of my current /etc/nginx/cloudflare.conf file which works as of September 2022:

set_real_ip_from 103.21.244.0/22;
set_real_ip_from 103.22.200.0/22;
set_real_ip_from 103.31.4.0/22;
set_real_ip_from 104.16.0.0/13;
set_real_ip_from 104.24.0.0/14;
set_real_ip_from 108.162.192.0/18;
set_real_ip_from 131.0.72.0/22;
set_real_ip_from 141.101.64.0/18;
set_real_ip_from 162.158.0.0/15;
set_real_ip_from 172.64.0.0/13;
set_real_ip_from 173.245.48.0/20;
set_real_ip_from 188.114.96.0/20;
set_real_ip_from 190.93.240.0/20;
set_real_ip_from 197.234.240.0/22;
set_real_ip_from 198.41.128.0/17;
set_real_ip_from 2400:cb00::/32;
set_real_ip_from 2606:4700::/32;
set_real_ip_from 2803:f800::/32;
set_real_ip_from 2405:b500::/32;
set_real_ip_from 2405:8100::/32;
set_real_ip_from 2c0f:f248::/32;
set_real_ip_from 2a06:98c0::/29;

# use any of the following two
real_ip_header CF-Connecting-IP;
# real_ip_header X-Forwarded-For;

Use geolocation for IP

In order to translate an IP address to a rough location, I am currently using db-ip.com’s free geolocation database. It updates monthly, and e.g. the September 2022 version can be fetched like this:

wget https://download.db-ip.com/free/dbip-city-lite-2022-09.mmdb.gz
gunzip dbip-city-lite-2022-09.mmdb.gz

With all these pieces together (detecting real IPs, using full log format and with a local geo location database), analyzing the last 14 days of visitor information can be done with the one-liner from the beginning of the post:

sudo zcat -f /var/log/nginx/access.log* | goaccess --log-format=VCOMBINED --geoip-database dbip-city-lite-2022-09.mmdb

Conclusion

In the last 14 days, I had 98 thousand request to my server, and 57% of this was from crawlers — so my site is not popular :-)

As a final note: I don’t aggregate the above information, and the data is deleted automatically after 14 days.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.