The last few weeks, I’ve been developing an app for the substitution plan of my school (hence the name: “Vertretungsplan” is German for substitution plan).
Before the development of the Android app, students of my school were able to view the substitution plan…
- in the school building
- online (currently limited to students of the MSS)
The online substitution plan was introduced this school year, allowing students to view it “comfortably” after school, or right before at about 7:30am.
Albeit a nice a feature, it also has its problems:
- Ugly user interface
- Bad readability, thanks to the poorly chosen contrast of text and background
- No mobile interface, and a bad user experience for mobile users (too much scrolling, hardly reachable control elements)
Basically, the idea was to develop an app that provides the user only the data that is relevant to him. Currently, this means that he is only displayed the courses for his class level.
To keep things clear, the user would only see the most relevant data at a glance, further information being available at a single click.
In theory, receiving the necessary data would be the easiest part. I would contact the responsible AG (“Arbeitsgemeinschaft”, working group) and ask them to either give me access to the source code and let me implement a machine-readable API, or let them implement the API themselves. I didn’t even get a response to my request by the responsible developers.
So as it turns out, it wasn’t that easy. Cooperation was pretty much non-existent and communication was sparse. And having some pride myself, I thought I could do this on my own, without having their support.
Thankfully, I am more or less experienced in web scraping, as a result of past jobs involving data gathering from multiple websites. This was no different. The idea is that you would read and use the website as if you were a user. So in this case, I worked with the jsoup library and CSS selectors. Jsoup describes itself as a “convenient API for extracting and manipulating data”. Of course, users don’t parse websites by using CSS selectors, but it’s a good idea to begin with.
The steps for gathering the plan data are:
- Get the HTML code for the plan we want to get (/heute for today and /morgen for tomorrow)
- Parse the code using jsoup
- Get all table rows using a CSS selector:
- For each row, get all
tdelements, parse the data, and feed it to a list
Having parsed all the data, we could display it as intended.
One more problem I ran into was authentication. The online substitution plan uses PHP session ids and cookies for authentication. The authentication flow would be as follows: Enter your data into the login form, POST request to the index page to have your session id authenticated and from there on, get the data you want from
But this meant an extra request for authentication each time we needed to login. And not being authenticated still needed me to parse the page to check for errors, since this “beast of a product” simply makes use of no HTTP feature whatsoever. The HTML code is malformed as well: Ids, which ought to be unique, are reused multiple times and typos are included as well.
As I later discovered, it does not matter which page you send the post request to. So to save traffic, authentication information is sent with each request to either
/morgen. At first glance, this might be considered a security issue, but the requests are performed using TLS and no sensitive data is connected to the accounts.
The biggest feature planned for the future are notifications based on a list of your courses. At a fixed interval, the app would refresh the plan data, generate a changeset and based on this set, show notifications concerning your courses. I am not sure yet, however, whether I will include a cloud sync feature for this, since the app is currently running without further server software, save the online substitution plan.
The complete app’s source code is available at Github under the BSD-2-clause-license.