{"id":2959,"date":"2022-09-24T22:03:28","date_gmt":"2022-09-24T20:03:28","guid":{"rendered":"https:\/\/davidlebech.com\/thoughtflow\/?p=2959"},"modified":"2022-09-24T22:03:32","modified_gmt":"2022-09-24T20:03:32","slug":"simple-website-visitor-stats-and-location-with-goaccess","status":"publish","type":"post","link":"https:\/\/davidlebech.com\/thoughtflow\/simple-website-visitor-stats-and-location-with-goaccess\/","title":{"rendered":"Simple website visitor stats and location with GoAccess"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I do not use cookies or browser trackers (like Google Analytics) on any of my personal websites to track visitor information.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From time to time however, it is nice to get an idea of the amount traffic to my sites, including what pages get viewed the most.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I use the following one-liner for this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo zcat -f \/var\/log\/nginx\/access.log* | goaccess --log-format=VCOMBINED --geoip-database dbip-city-lite-2022-09.mmdb<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Read on below to see how I made this work.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I host my websites on a small <a href=\"https:\/\/www.hetzner.com\/\">Hetzner VM<\/a> and use nginx as web server and\/or reverse proxy for PHP (for WordPress sites)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The default server logs for nginx are enough to count basic server visits, and for this purpose, I use <a href=\"https:\/\/goaccess.io\/\" data-type=\"URL\" data-id=\"https:\/\/goaccess.io\/\">GoAc<\/a><a href=\"https:\/\/goaccess.io\/\" data-type=\"URL\" data-id=\"https:\/\/goaccess.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">c<\/a><a href=\"https:\/\/goaccess.io\/\" data-type=\"URL\" data-id=\"https:\/\/goaccess.io\/\">ess<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">GoAccess usually works out of the box like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>goaccess \/var\/log\/nginx\/access.log<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">For me, this command is a bit limited though, specifically:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>It only includes the latest access log. Nginx rotates the logs and the older ones get stored in gzip&#8217;ed files. Ideally, these should be included.<\/li><li>The default log format in Nginx seems to not include the domain (e.g. &#8220;davidlebech.com&#8221;) so if there are multiple sites on the same server, their views will be mixed.<\/li><li>It is not possible to see where people are from in the world.<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">I solved all three earlier this year by making some simple changes to the nginx config and the command itself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Use combined log format<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">First, change the nginx configuration in \/etc\/nginx\/nginx.conf with the following lines:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>log_format vcombined '$host:$server_port '\n        '$remote_addr - $remote_user &#91;$time_local] '\n        '\"$request\" $status $body_bytes_sent '\n        '\"$http_referer\" \"$http_user_agent\" \"$gzip_ratio\"';\naccess_log \/var\/log\/nginx\/access.log vcombined;<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This tells nginx to use the &#8220;combined log format with virtual host&#8221; for the access log files. The logs now include the domain, thereby solving issue 2 from above.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ensure IP addresses are correct from Cloudflare<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In order to get location, we can use the logged IP address of people to look up in a local database. To enable this, there is an <strong>important extra step to take when using Cloudflare as a proxy<\/strong>. By default, the logged IP are the Cloudflare server&#8217;s IPs. The server needs to know where to get the real IP address of the visitor from.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s <a rel=\"noreferrer noopener\" href=\"https:\/\/support.cloudflare.com\/hc\/en-us\/articles\/200170786-Restoring-original-visitor-IPs\" target=\"_blank\">the official guide from Cloudflare<\/a>. Here&#8217;s my short version:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Copy the list of IP addresses from Cloudflare. Both the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.cloudflare.com\/ips-v4\" target=\"_blank\">IPV4<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/www.cloudflare.com\/ips-v6\" target=\"_blank\">IPV6<\/a> addresses.<\/li><li>Create a new file under \/etc\/nginx\/cloudflare.conf and paste all the addresses, one per line.<\/li><li>Prepend each IP address line with <code>set_real_ip_from<\/code> <\/li><li>Add to the end of the file:<br><code>real_ip_header CF-Connecting-IP;<\/code><\/li><li>In \/etc\/nginx\/nginx.conf, add:<br><code>include \/etc\/nginx\/cloudflare.conf;<\/code><\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">This is the entire content of my current \/etc\/nginx\/cloudflare.conf file which works <strong><em>as of September 2022<\/em><\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>set_real_ip_from 103.21.244.0\/22;\nset_real_ip_from 103.22.200.0\/22;\nset_real_ip_from 103.31.4.0\/22;\nset_real_ip_from 104.16.0.0\/13;\nset_real_ip_from 104.24.0.0\/14;\nset_real_ip_from 108.162.192.0\/18;\nset_real_ip_from 131.0.72.0\/22;\nset_real_ip_from 141.101.64.0\/18;\nset_real_ip_from 162.158.0.0\/15;\nset_real_ip_from 172.64.0.0\/13;\nset_real_ip_from 173.245.48.0\/20;\nset_real_ip_from 188.114.96.0\/20;\nset_real_ip_from 190.93.240.0\/20;\nset_real_ip_from 197.234.240.0\/22;\nset_real_ip_from 198.41.128.0\/17;\nset_real_ip_from 2400:cb00::\/32;\nset_real_ip_from 2606:4700::\/32;\nset_real_ip_from 2803:f800::\/32;\nset_real_ip_from 2405:b500::\/32;\nset_real_ip_from 2405:8100::\/32;\nset_real_ip_from 2c0f:f248::\/32;\nset_real_ip_from 2a06:98c0::\/29;\n\n# use any of the following two\nreal_ip_header CF-Connecting-IP;\n# real_ip_header X-Forwarded-For;<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Use geolocation for IP<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In order to translate an IP address to a rough location, I am currently using <a rel=\"noreferrer noopener\" href=\"https:\/\/db-ip.com\/db\/download\/ip-to-city-lite\" target=\"_blank\">db-ip.com&#8217;s free geolocation database<\/a>. It updates monthly, and e.g. the September 2022 version can be fetched like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>wget https:\/\/download.db-ip.com\/free\/dbip-city-lite-2022-09.mmdb.gz\ngunzip dbip-city-lite-2022-09.mmdb.gz<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With all these pieces together (detecting real IPs, using full log format and with a local geo location database), analyzing the last 14 days of visitor information can be done with the one-liner from the beginning of the post:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo zcat -f \/var\/log\/nginx\/access.log* | goaccess --log-format=VCOMBINED --geoip-database dbip-city-lite-2022-09.mmdb<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In the last 14 days, I had 98 thousand request to my server, and 57% of this was from crawlers &#8212; so my site is not popular :-)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As a final note: I don&#8217;t aggregate the above information, and the data is deleted automatically after 14 days.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I do not use cookies or browser trackers (like Google Analytics) on any of my personal websites to track visitor information. From time to time however, it is nice to get an idea of the amount traffic to my sites, including what pages get viewed the most. I use the following one-liner for this: Read [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29],"tags":[242,243,189,32,72],"class_list":["post-2959","post","type-post","status-publish","format-standard","hentry","category-tips","tag-geolocation","tag-goaccess","tag-nginx","tag-tip","tag-ubuntu"],"_links":{"self":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/2959","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/comments?post=2959"}],"version-history":[{"count":0,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/2959\/revisions"}],"wp:attachment":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media?parent=2959"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/categories?post=2959"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/tags?post=2959"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}