For some time now, the idea of taking a look behind the websites of the public sector has been buzzing around in our heads. Our tax money funds countless website launches and updates across the public sector. But what technology powers these sites – open-source solutions or proprietary software? Where are they hosted? Are there regional differences between states or trends we can track in municipalities?
In Q4/23, I threw this project idea into the CFP of CloudFest's Hackathon. The project was accepted, and we travelled by train from Berlin to Europapark Rust, which is an eight-hour journey.
Tim Heide (CEO of versionmanager.io, whom we met at the last FrOSCon) and Stephan Luckow have taken over the position of team leader.
This year's version of the CloudFest hackathon was the biggest yet in terms of the number of participants. 11 project leaders presented their ideas to more than 150 participants in order to find a team of hackers to work on the project for three days. We got 15 hackers to sit at our table and it felt right from the start. Self-organised people willing to do their best to create a prototype and present it to a jury 48 hours later. To be fair, we lost 3 of them after the first day. We didn't have the opportunity to ask them why they decided to leave the team.
Day 1
The first half of day 1 served to sharpen the scope of the project. We identified a big problem: where to collect the relevant domains with metadata, what kind of website and who is the responsible entity. We found some GitHub repositories with thousands of domains from the public sector. From there, we came to the conclusion that wikidata was our source of truth. Some of us read the wikidata documentation on how to query the API endpoint and started programming a script to automate the queries.
Although we knew that our prototype should end up as a website, we decided to build a tool chain that fulfils the requirements for a sustainable infrastructure. We looked at the apps on our self-hosted app platform and found directus as an app that is in the automatic update process from our sponsor Cloudron. Fun fact: no one at the table had ever had any experience with creating headless cms projects. Almost a third of us worked in front-end development. They decided to use react as our frontend framework. The wikidata query people worked with some other backend developers to set up more scripts for the various api endpoints.
- wikidata to collect the domains with metadata
- versionmanager to obtain information about the CMS
- directus to save the results.
Due to the backend developers' taste, they decided in favour of Python as the main language.
Ah. And because nobody could really remember the title of the project (Public Sector Website Funding Transparency project), we came up with a catchier title. Welcome: Follow the money.
At the end of almost 10 hours of hacking, we had fun with pizza, beer and rollercoaster. Wow, what an impressive first day.
Day 2
The collection of public sector domains shows us some implications. How can quality assurance be carried out on more than 10,000 domains? Not in one weekend. Therefore, for the prototype, we decided to analyse only almost 1,000 domains from the category of municipal websites.
We harvested them directly from wikidata, got their metadata (which city belongs to which state), examined the version manager results, and set up a feedback loop to add about 10 small CMSs from companies we'd never heard of.
At the same time, Tim added the Lighthouse framework to versionmanager. As a result, we can now display accessibility metrics in our results list.
Later in the day, we decided not to show the versions of the identified CMSs in detail publicly. Nearly 30% of the websites analysed run on outdated software and are potentially vulnerable.
While the results were being honed and QA'd, our front-end team got to grips with design and user experience. After a brief round of questions, we suddenly had a logo for the project.
Welcome:
Day 3 - the final countdown
We had to present our results to a jury. Remember: a hackathon is not for fun. It is a competition. We had a lot of sponsors supporting the event to get results. And we didn't want to disappoint.
We therefore used the last few hours before the final countdown to fine-tune the prototype. A small team prepared the presentation, while the others looked at the final results and made comments to achieve better results.
Our first conclusion from the findings:
- 70% of all websites rely on FLOSS CMS (great)
- 1st place of FLOSS CMS goes to TYPO3
- almost 30% of FLOSS CMS websites lack security updates
Then the final presentations took place
To cut a long story short. Out of 6 possible prizes, we won the most important one. The dream team of the heart.
Wow! What a blast. Thanks to the team, the organisers and the sponsors.
And now what?
We have a prototype. We have a repo. We know what we want to do next. If you want to join the "Follow the money" project, leave a ping.
GitHub repo of the project: https://github.com/CMS-Garden/ftm
Public website: https://www.follow-the-money.org
Low hanging fruit
- Cleaning up the hackathon prototype
- Add more domains
- Check and improve results
- Add more categories (we have identified almost 20) and make them visible
- Ask questions and analyse the results to provide answers
- Offer the project to our European friends to compete FLOSS across countries
- Tell the world about this project.