NERDBUDE

[ code. keyboards. terminals. cyber. ]

[HOWTO] [WHOAMI] [PODCAST] [GERMAN]

Podstats

Some of you may have noticed that the Click! Clack! Hack! podcast is available. Be sure to subscribe if you haven't already.
Of course, if you podcast you want to know if and how often you are heard.
Now there is the possibility of Wordpress or similar CMS to use the one in connection with Podlove plugins everything ready to provide.
Well - who knows the Nerdbude, it is clear that this is out of the question.
(Podlove is of course still worth supporting - no question)

I started a PODCAST REPO on Github.
There you can find everything you need for DIY podcasting.
Among other things, also a way to get statistics.

Overview

To get statistics (downloads, players, etc.) from his podcast, Wordpress and Co. use databases.
You can do. But you don't have to. I actually want to keep everything as small as possible here, so no wordpress and no DB.
So what is available to get statistics?
If the webspace provider already writes logs, then we also use them.
(In advance: The IP addresses are anonymized.)
In addition, it needs for visualization still separate software in the form of GoAccess.

Log-Files

The provider must, thanks to the law, write access logs.
In these logs are of course also the accesses to the individual podcast episodes.
One must filter them only.
This is what an entry in the access log looks like that you want to have:

anon - - [04/Mar/2021:23:23:00 +0100] "GET /podcast/audio/CCH_001.mp3 HTTP/2.0" 200 11376985 "-" "Pocket Casts"

The Structure
[IP] - - [DATE:TIME] [FILE] [HTTP CODE] [FILESIZE] [CLIENT]

The structure will be important to analyze later.
Now are in the log file but also other accesses that are not relevant for the podcast.
For this I have a small shell script in the repository that splits the logfile.

First, only the entries are filtered out that refer to the directory with the episodes:

BASH
grep "/podcast/audio" $newlog > filtered.log 


Here we grep all entries that contain "/podcast/audio" from the logfile into a temporary filtered.log.
This way we get all entries that refer to the episodes and throw away the rest.
After that we filter by episodes, meaning all entries for each episode.

BASH
grep "GET /podcast/audio/CCH_000.mp3" filtered.log > f000.log


Here grep fetches the lines from the filtered.log which, the episode contains and writes them into a new temporary log file which is episode-specific (f000.log).
This is now done for each episode, so you get a log catalog with all episodes (f000.log, f001.log etc.).
Continue with the downloads.

BASH
  grep "200 234223" f000.log > fs000.log
  


Here grep grabs all lines with the HTTP code 200 (for download) and the complete file size in bytes (233223) from the file "f000.log" and writes them again to the temporary "fs000.log"
This way we get only the entries for completed downloads.
Now the content of the temporary "fs000.log" is appended to the episode0.log:

BASH
  cat fs000.log episode0.log > episode0.log
  


Cat reads the content from "fs000.log" and "episodeo.log" and writes both contents to "episode0.log". This way you get the entire log entries since the beginning.
Finally, we merge all available episode logs into one master:


BASH
    cat fs000.log fs001.log fs002.log > fsmaster.log
    cat master.log fsmaster.log > final_master.log
  


Here again all temporary episode logs (fs000.log, fs001.log etc) are written to a temporary master log file (fsmaster.log). and then written to the final_master.log, so that you have a log file in which all episode downloads are inside.

Subsequently, the temporary log files are deleted, because we no longer need them.

If all this is too much effort, use the "sortlog.sh" in the repository.

At this point, the log files are ready.


Dashboard

Now statistics look better when you have them in a nice dashboard.
There are ways to display the whole thing with Grafana and similar tools, but there's also a DB attached to it and that's a bit of a crapshoot for my little podcast.
That's why we use a tool called "GoAccess" - a fine little log file viewer that runs primarily in the terminal but also provides the option of an HTML dashboard.
To enable GoAccess to handle the log files, parameters have to be provided. In my case (example see above) it looks like this:

BASH
goaccess master.log --log-format='%h %^[%d:%t %^] "%r" %s %D "%u"' --date-format=%d/%b/%Y --time-format=%T --no-ip-validation


This starts GoAccess in the terminal with the master.log.
The HTML Dashboard can be started with the following parameters:


BASH
goaccess master.log --log-format='%h %^[%d:%t %^] "%r" %s %D "%u"' --date-format=%d/%b/%Y --time-format=%T --no-ip-validation -o /home/ph_0x17/dashboard.html --real-time-html


Now you see in the terminal that a small dashboard server is started.
Under: /home/ph_0x17/dashboard.html the dashboard is accessible via browser.
Taddaaaa! Statistics!


Viel SpaƟ damit.



//EOF