Thursday, April 20, 2017

Air Quality In Poland #12 - What is this? Is this .... API?

Surprise from GIOS

Image from
When I was starting playing with air quality data from GIOS, only reasonable way to obtain it was download pre-packed archives with data for each year separately. Also when I'm publishing this post, last data archive is dated 2015, so there is no data from whole 2016 and 2017-now available. 

But recently, GIOS with collaboration with ePaństwo Foundation released initial and experimental version of data RESTful API. It is very simple, but you don't have to identify yourself with private key and there are no significant limitations of how one can use it. Lets see what can we do with it.

Currently, four types of request are handled which can give us following data: measurement stations, sensors, measured data and air quality index.

Measurement stations

First API request we should check is station/findAll. This API request should give us JSON response with list of measuring stations and some information about them. Most important field from this response is top level id, which contains id of station. We will need that value as part of further requests. To receive data from this request, parse it as data frame, and (for example) select interesting place we can do those simple operations:
We will receive following data frame:
In my example I picked station AM5 Gdańsk Szadółki which has id 733.


Since we have station id we can explore its sensors now. Overall idea of sending request and transforming its response is the same as in measurement stations example:
As result we have neat data frame with information about sensors located in this measurement station:
We can easily see that we have 5 sensors located there. Lets then explore data from PM10 sensor which has id 4727.

Measured data

Now we arrived to probably most interesting part of API. We can get actual measurement data here. As we can expect, complexity of getting this data is similar as above, with one distinction. We are receiving list of dictionaries, and each dictionary contains two pairs of key/value. So if we want nice data frame we have to add additional transformation. But fear no more - it is quite simple and gives us wanted results immediately:
Example results:
But as we can expect, there are some gotchas here. If you have "trained" eye, you probably spotted fact, that date time data is written in 12h format without AM/PM distinction. Well, this is because ... there is no such information provided in API response. I'm assuming that received data is sorted, so first 01 is one hour after midnight and second occurrence of 01 during the same date will correspond to 13 in 24h time format. For now I didn't bother to recalculate it according to above assumption - I'm hoping that this will be fixed soon so I don't have to deal with it. Second gotcha here is about range of data. Received data points are from range of three calendar days including current day, so it will contain at most 24 * 3 points. There is no way to modify that range, so if our data retrieving application crash, and we fail to notice it over three days, we will have data gap, which would not be filled until yearly data package will be released. Also, if someone is interested only in current values, he will always receive unneeded data which basically wastes bandwidth. Apart of those little flaws I didn't found other problems. Here's plot of this data:

Air quality index

Last data which we can get with API is current air quality. It doesn't seems to be very interesting - it just gives current air quality category for each sensor in station and overall air quality for that station. If you like to see how to access it I invite you to check my notebook dedicated to operations with API. It also contains all mentioned API requests and data processing.


It's great we can access so valuable and important data trough API. Despite its simplicity and flaws it still provides good point for analysis of current air quality situation. If I could add something to that API, I would enable modifying time frame for measurements data, so users could fill the gaps in their copies of data for different time frames analysis. If only other public government data would be so nice...

No comments:

Post a Comment