Measure your audience with your access logs and Matomo
Cette page est également disponible en français.
As you can imagine, we are not fans of these intrusive methods and so we royally snubbed this step of the installation. Problem, no statistical measurement goes back to Matomo and our dashboard keep therefore hopelessly empty…
We therefore needed another method, more passive and respectful of our users. That’s good, Matomo offers one by reading the server logs.
For our dear readers who do not see what they are, the web servers (apache, IIS, nginx, …) provide access log. Theses are registries that record every action of interest that occured. Error goes to error log and visitor access go to access log.
Thus, each time a visitor requests access to a resource (i.e. a webpage) in your web application, the server will log this request and informations about its response (its status code and amount of data sent). As this file contains all the activity on your website, you can use it to build visit statistics (i.e. with goaccess).
These logs contain the visitor’s IP address, which is covered by the GDPR, you will need to anonymize them to use them to measure your audience.
Depending on the server and the system on which it is running, these files are stored in different directories:
- Apache 2: in
/var/log/apache2on Debian and its derivatives, in
/var/log/httpdon Red Hat and its derivatives.
- Nginx: in
/var/log/nginxmost of the time,
- IIS: in
- Otherwise, your installation is too personalized. Either you already know what we are talking about or you have done anything.
If you use a third-party host, you will have to search and find in its interface, how to retrieve these files (i.e. how to retrieve them from kimsufi).
For the rest, I will assume that your logs are available from the server on which you installed Matomo. Whether you install it directly on your web server, or retrieve the logs with a scheduled task.
Send logs to Matomo
Now that the logs are available, they need to be sent to Matomo so it can read them, update its database and produce audience reports. The outline is described in the official documentation.
Get site ID
Since Matomo is designed to measure audiences for multiple websites, you are going to need the site ID for your newspapers.
This information is shown when you set up a new site, but if you haven’t noted it down, you can find it through the administration menus.
To see the site ID, go to the Matomo administration interface, via the menu at the top right (cogwheel icon if your screen is large, or simply “Administration” otherwise). Then, using the menu on the left, click the section “Websites” and then click on “Manage”. You will then obtain the list of measured sites with, under the site name, the ID.
Now we have the access logs and site ID, we will be able to use the official script designed to send the logs to matomo.
The version of python used by this script depends on the version of matomo:
- Matomo 3 requires python 2,
- Matomo 4 requires python 3.
The script has help (via the
--help argument or its page
github), as I use it locally, I only need few arguments:
--urlto tell what is the HTTP(s) address of my Matomo,
--idsiteto tell which site the logs correspond to,
--enable-staticbecause I like having stats on some particular files (
style.css, some pdf and of course, the translations of phrack to TXT),
- And then the names of the files to import.
sudo -u www-data \ \ /var/www/matomo/misc/log-analytics/import_logs.py --url=https://votredomaine \ --idsite=1 \ --enable-static \ access.log
sudo is optional but handy. Without it, I would have
to supply a matomo’s username and password (in clear in the
arguments 😢). With
sudo, the script does not need these
identifiers and will look for a token in the files. And to
avoid using administrator rights when not needed, I use the argument
-u www-data to only use web server rights.
Archiving of statistics
Once the logs have been inserted into the database, Matomo must archive them. Usually, it does this on the fly when you view the statistics pages. For small sites, or if you import your stats very often (every 5 minutes), the process is quick and no one notices.
With our logs, it doesn’t work so well anymore. Since the logs contain a lot of events inserted at the same time, the archiving takes a long time. Believe me, it is noticeable and it’s kinda painful having to wait to get the dashboard.
Fortunately, you can trigger this process directly from the command line. By doing it right after import, Matomo will no longer need to do it during your visits. The corresponding script has plenty of options but only one is needed in our case:
--urlto tell where to join our Matomo installation.
sudo -u www-data \ \ /var/www/matomo/console core:archive --url='https://votredomaine'
Since we are doing the archiving after the import, we do not need Matomo to trigger it when we visist its dashboards. To do this, go to the administration menu and then to the “System” section we click on “General settings”.
- The first zone concerns archiving and we tick “No”.
And after ?
Ideally, these two command lines should be called automatically via a scheduled task (like crontab). Every night, for example, to collect logs, send them to matomo and archive statistics.
You will then be able to see each morning the statistics of visits of the day before. See which topics are doing better than others, your visitor trends and that sort of thing.