[ code. keyboards. terminals. cyber. ]



Some of you may have noticed that the Click! Clack! Hack! podcast is available. Be sure to subscribe if you haven't already.
Of course, if you podcast you want to know if and how often you are heard.
Now there is the possibility of Wordpress or similar CMS to use the one in connection with Podlove plugins everything ready to provide.
Well - who knows the Nerdbude, it is clear that this is out of the question.
(Podlove is of course still worth supporting - no question)

I started a PODCAST REPO on Github.
There you can find everything you need for DIY podcasting.
Among other things, also a way to get statistics.


To get statistics (downloads, players, etc.) from his podcast, Wordpress and Co. use databases.
You can do. But you don't have to. I actually want to keep everything as small as possible here, so no wordpress and no DB.
So what is available to get statistics?
If the webspace provider already writes logs, then we also use them.
(In advance: The IP addresses are anonymized.)
In addition, it needs for visualization still separate software in the form of GoAccess.


The provider must, thanks to the law, write access logs.
In these logs are of course also the accesses to the individual podcast episodes.
One must filter them only.
This is what an entry in the access log looks like that you want to have:

anon - - [04/Mar/2021:23:23:00 +0100] "GET /podcast/audio/CCH_001.mp3 HTTP/2.0" 200 11376985 "-" "Pocket Casts"

The Structure

The structure will be important to analyze later.
Now are in the log file but also other accesses that are not relevant for the podcast.
For this I have a small shell script in the repository that splits the logfile.

First, only the entries are filtered out that refer to the directory with the episodes:

grep "/podcast/audio" $newlog > filtered.log 

Here we grep all entries that contain "/podcast/audio" from the logfile into a temporary filtered.log.
This way we get all entries that refer to the episodes and throw away the rest.
After that we filter by episodes, meaning all entries for each episode.

grep "GET /podcast/audio/CCH_000.mp3" filtered.log > f000.log

Here grep fetches the lines from the filtered.log which, the episode contains and writes them into a new temporary log file which is episode-specific (f000.log).
This is now done for each episode, so you get a log catalog with all episodes (f000.log, f001.log etc.).
Continue with the downloads.

  grep "200 234223" f000.log > fs000.log

Here grep grabs all lines with the HTTP code 200 (for download) and the complete file size in bytes (233223) from the file "f000.log" and writes them again to the temporary "fs000.log"
This way we get only the entries for completed downloads.
Now the content of the temporary "fs000.log" is appended to the episode0.log:

  cat fs000.log episode0.log > episode0.log

Cat reads the content from "fs000.log" and "episodeo.log" and writes both contents to "episode0.log". This way you get the entire log entries since the beginning.
Finally, we merge all available episode logs into one master:

    cat fs000.log fs001.log fs002.log > fsmaster.log
    cat master.log fsmaster.log > final_master.log

Here again all temporary episode logs (fs000.log, fs001.log etc) are written to a temporary master log file (fsmaster.log). and then written to the final_master.log, so that you have a log file in which all episode downloads are inside.

Subsequently, the temporary log files are deleted, because we no longer need them.

If all this is too much effort, use the "" in the repository.

At this point, the log files are ready.


Now statistics look better when you have them in a nice dashboard.
There are ways to display the whole thing with Grafana and similar tools, but there's also a DB attached to it and that's a bit of a crapshoot for my little podcast.
That's why we use a tool called "GoAccess" - a fine little log file viewer that runs primarily in the terminal but also provides the option of an HTML dashboard.
To enable GoAccess to handle the log files, parameters have to be provided. In my case (example see above) it looks like this:

goaccess master.log --log-format='%h %^[%d:%t %^] "%r" %s %D "%u"' --date-format=%d/%b/%Y --time-format=%T --no-ip-validation

This starts GoAccess in the terminal with the master.log.
The HTML Dashboard can be started with the following parameters:

goaccess master.log --log-format='%h %^[%d:%t %^] "%r" %s %D "%u"' --date-format=%d/%b/%Y --time-format=%T --no-ip-validation -o /home/ph_0x17/dashboard.html --real-time-html

Now you see in the terminal that a small dashboard server is started.
Under: /home/ph_0x17/dashboard.html the dashboard is accessible via browser.
Taddaaaa! Statistics!

Viel SpaƟ damit.