Measure your audience with your access logs and Matomo

Spoiler: Measure the audience of your website without javascript? It is possible with your access logs! Today we show you how to use them with Matomo. After retrieving these log files, we use the command line to send their contents to Matomo and then ask him to archive his statistics.

To measure the audience of our application, we have installed Matomo. After the preparation of the system, installation and setup of Matamo, the question arose to insert a javascript tracker…

As you can imagine, we are not fans of these intrusive methods and so we royally snubbed this step of the installation. Problem, no statistical measurement goes back to Matomo and our dashboard keep therefore hopelessly empty…

ddzphoto @ pixabay

We therefore needed another method, more passive and respectful of our users. That’s good, Matomo offers one by reading the server logs.

Access logs

For our dear readers who do not see what they are, the web servers (apache, IIS, nginx, …) provide access log. Theses are registries that record every action of interest that occured. Error goes to error log and visitor access go to access log.

Thus, each time a visitor requests access to a resource (i.e. a webpage) in your web application, the server will log this request and informations about its response (its status code and amount of data sent). As this file contains all the activity on your website, you can use it to build visit statistics (i.e. with goaccess).

These logs contain the visitor’s IP address, which is covered by the GDPR, you will need to anonymize them to use them to measure your audience.

Depending on the server and the system on which it is running, these files are stored in different directories:

If you use a third-party host, you will have to search and find in its interface, how to retrieve these files (i.e. how to retrieve them from kimsufi).

For the rest, I will assume that your logs are available from the server on which you installed Matomo. Whether you install it directly on your web server, or retrieve the logs with a scheduled task.

Send logs to Matomo

Now that the logs are available, they need to be sent to Matomo so it can read them, update its database and produce audience reports. The outline is described in the official documentation.

Get site ID

Since Matomo is designed to measure audiences for multiple websites, you are going to need the site ID for your newspapers.

This information is shown when you set up a new site, but if you haven’t noted it down, you can find it through the administration menus.

To see the site ID, go to the Matomo administration interface, via the menu at the top right (cogwheel icon if your screen is large, or simply “Administration” otherwise). Then, using the menu on the left, click the section “Websites” and then click on “Manage”. You will then obtain the list of measured sites with, under the site name, the ID.

Website list

Integrate logs

Now we have the access logs and site ID, we will be able to use the official script designed to send the logs to matomo.

The version of python used by this script depends on the version of matomo:

  • Matomo 3 requires python 2,
  • Matomo 4 requires python 3.

The script has help (via the --help argument or its page on github), as I use it locally, I only need few arguments:

sudo -u www-data                                      \
    /var/www/matomo/misc/log-analytics/import_logs.py \
        --url=https://votredomaine                    \
        --idsite=1                                    \
        --enable-static                               \
        access.log

The sudo is optional but handy. Without it, I would have to supply a matomo’s username and password (in clear in the arguments 😢). With sudo, the script does not need these identifiers and will look for a token in the files. And to avoid using administrator rights when not needed, I use the argument -u www-data to only use web server rights.

Archiving of statistics

Once the logs have been inserted into the database, Matomo must archive them. Usually, it does this on the fly when you view the statistics pages. For small sites, or if you import your stats very often (every 5 minutes), the process is quick and no one notices.

With our logs, it doesn’t work so well anymore. Since the logs contain a lot of events inserted at the same time, the archiving takes a long time. Believe me, it is noticeable and it’s kinda painful having to wait to get the dashboard.

Fortunately, you can trigger this process directly from the command line. By doing it right after import, Matomo will no longer need to do it during your visits. The corresponding script has plenty of options but only one is needed in our case:

sudo -u www-data                         \
    /var/www/matomo/console core:archive \
        --url='https://votredomaine'

Since we are doing the archiving after the import, we do not need Matomo to trigger it when we visist its dashboards. To do this, go to the administration menu and then to the “System” section we click on “General settings”.

Archiving stats

And after ?

Ideally, these two command lines should be called automatically via a scheduled task (like crontab). Every night, for example, to collect logs, send them to matomo and archive statistics.

You will then be able to see each morning the statistics of visits of the day before. See which topics are doing better than others, your visitor trends and that sort of thing.

And since you have avoided javascript, take the opportunity to anonymize IP addresses. Not only is it more respectful, but it saves you from having to display a consent popup.