Surprise from GIOS
![]() |
Image from memegenerator.net |
When I was starting playing with air quality data from GIOS, only reasonable way to obtain it was download pre-packed archives with data for each year separately. Also when I'm publishing this post, last data archive is dated 2015, so there is no data from whole 2016 and 2017-now available.
But recently, GIOS with collaboration with ePaństwo Foundation released initial and experimental version of data RESTful API. It is very simple, but you don't have to identify yourself with private key and there are no significant limitations of how one can use it. Lets see what can we do with it.
Currently, four types of request are handled which can give us following data: measurement stations, sensors, measured data and air quality index.
Measurement stations
First API request we should check is station/findAll. This API request should give us JSON response with list of measuring stations and some information about them. Most important field from this response is top level id, which contains id of station. We will need that value as part of further requests. To receive data from this request, parse it as data frame, and (for example) select interesting place we can do those simple operations:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
r = requests.get('http://api.gios.gov.pl/pjp-api/rest/station/findAll') | |
allStations = json_normalize(r.json()) | |
print(allStations[allStations["city.name"] == u"Gdańsk"]) |
We will receive following data frame:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
addressStreet city city.commune.communeName \ | |
2 ul. Powstańców Warszawskich NaN Gdańsk | |
14 ul. Kaczeńce NaN Gdańsk | |
26 ul. Wyzwolenia NaN Gdańsk | |
67 ul. Leczkowa NaN Gdańsk | |
75 ul. Ostrzycka NaN Gdańsk | |
city.commune.districtName city.commune.provinceName city.id city.name \ | |
2 Gdańsk POMORSKIE 218.0 Gdańsk | |
14 Gdańsk POMORSKIE 218.0 Gdańsk | |
26 Gdańsk POMORSKIE 218.0 Gdańsk | |
67 Gdańsk POMORSKIE 218.0 Gdańsk | |
75 Gdańsk POMORSKIE 218.0 Gdańsk | |
dateEnd dateStart gegrLat gegrLon id \ | |
2 None 1996-10-01 12:00:00 54.353336 18.635283 729 | |
14 None 1996-10-01 12:00:00 54.367778 18.701111 730 | |
26 None 1998-09-01 12:00:00 54.400833 18.657497 731 | |
67 None 1998-10-01 12:00:00 54.380279 18.620274 736 | |
75 None 1998-05-01 12:00:00 54.328336 18.557781 733 | |
stationName | |
2 AM1 Gdańsk Śródmieście | |
14 AM2 Gdańsk Stogi | |
26 AM3 Gdańsk Nowy Port | |
67 AM8 Gdańsk Wrzeszcz | |
75 AM5 Gdańsk Szadółki |
In my example I picked station AM5 Gdańsk Szadółki which has id 733.
Sensors
Since we have station id we can explore its sensors now. Overall idea of sending request and transforming its response is the same as in measurement stations example:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stationId = 733 | |
r = requests.get('http://api.gios.gov.pl/pjp-api/rest/station/sensors/' + str(stationId)) | |
sensors = json_normalize(r.json()) | |
print(sensors) |
As result we have neat data frame with information about sensors located in this measurement station:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
id param.idParam param.paramCode param.paramFormula \ | |
0 4720 8 CO CO | |
1 4727 3 PM10 PM10 | |
2 4723 6 NO2 NO2 | |
3 4725 5 O3 O3 | |
4 4730 1 SO2 SO2 | |
param.paramName sensorDateEnd sensorDateStart stationId | |
0 tlenek węgla None 1998-05-01 12:00:00 733 | |
1 pył zawieszony PM10 None 1998-05-01 12:00:00 733 | |
2 dwutlenek azotu None 1998-05-01 12:00:00 733 | |
3 ozon None 1998-05-01 12:00:00 733 | |
4 dwutlenek siarki None 1998-05-01 12:00:00 733 |
We can easily see that we have 5 sensors located there. Lets then explore data from PM10 sensor which has id 4727.
Measured data
Now we arrived to probably most interesting part of API. We can get actual measurement data here. As we can expect, complexity of getting this data is similar as above, with one distinction. We are receiving list of dictionaries, and each dictionary contains two pairs of key/value. So if we want nice data frame we have to add additional transformation. But fear no more - it is quite simple and gives us wanted results immediately:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sensorId = 4727 | |
r = requests.get('http://api.gios.gov.pl/pjp-api/rest/data/getData/' + str(sensorId)) | |
concentration = json_normalize(r.json()) | |
concentrationFrame = pd.DataFrame() | |
concentrationFrame["dates"] = [d[u'date'] for d in concentration["values"].values.item()] | |
concentrationFrame["values"] = [d[u'value'] for d in concentration["values"].values.item()] | |
concentrationFrame.set_index(["dates"], inplace=True) | |
concentrationFrame = concentrationFrame.iloc[::-1] |
Example results:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
values | |
dates | |
2017-04-16 01:00:00 18.64540 | |
2017-04-16 02:00:00 8.53258 | |
2017-04-16 03:00:00 3.52958 | |
2017-04-16 04:00:00 2.12867 | |
2017-04-16 05:00:00 1.00000 | |
2017-04-16 06:00:00 12.96610 | |
2017-04-16 07:00:00 13.20580 | |
2017-04-16 08:00:00 2.32258 | |
2017-04-16 09:00:00 12.68400 | |
2017-04-16 10:00:00 16.42370 | |
2017-04-16 11:00:00 2.36208 | |
2017-04-16 12:00:00 18.84620 | |
2017-04-16 01:00:00 2.88683 | |
2017-04-16 02:00:00 1.00000 | |
2017-04-16 03:00:00 4.57983 | |
2017-04-16 04:00:00 1.00000 | |
2017-04-16 05:00:00 1.56867 | |
2017-04-16 06:00:00 5.09883 | |
2017-04-16 07:00:00 6.82033 | |
2017-04-16 08:00:00 14.69580 | |
2017-04-16 09:00:00 6.48150 | |
2017-04-16 10:00:00 9.68442 | |
2017-04-16 11:00:00 13.97220 | |
2017-04-17 12:00:00 21.47280 | |
2017-04-17 01:00:00 23.68470 | |
2017-04-17 02:00:00 18.86320 | |
2017-04-17 03:00:00 16.08500 | |
2017-04-17 04:00:00 17.16610 | |
2017-04-17 05:00:00 25.23930 | |
2017-04-17 06:00:00 23.35970 | |
... ... | |
2017-04-17 02:00:00 1.00000 | |
2017-04-17 03:00:00 9.16675 | |
2017-04-17 04:00:00 4.61542 | |
2017-04-17 05:00:00 12.47840 | |
2017-04-17 06:00:00 10.25780 | |
2017-04-17 07:00:00 1.00000 | |
2017-04-17 08:00:00 6.54967 | |
2017-04-17 09:00:00 6.25992 | |
2017-04-17 10:00:00 5.65067 | |
2017-04-17 11:00:00 1.00000 | |
2017-04-18 12:00:00 6.41642 | |
2017-04-18 01:00:00 4.60872 | |
2017-04-18 02:00:00 1.00000 | |
2017-04-18 03:00:00 8.82750 | |
2017-04-18 04:00:00 4.15767 | |
2017-04-18 05:00:00 7.86589 | |
2017-04-18 06:00:00 4.47778 | |
2017-04-18 07:00:00 8.80628 | |
2017-04-18 08:00:00 16.62710 | |
2017-04-18 09:00:00 20.00600 | |
2017-04-18 10:00:00 12.23940 | |
2017-04-18 11:00:00 8.05267 | |
2017-04-18 12:00:00 8.93228 | |
2017-04-18 01:00:00 13.93350 | |
2017-04-18 02:00:00 12.10550 | |
2017-04-18 03:00:00 20.18720 | |
2017-04-18 04:00:00 17.33160 | |
2017-04-18 05:00:00 20.82570 | |
2017-04-18 06:00:00 9.27978 | |
2017-04-18 07:00:00 16.89930 | |
[67 rows x 1 columns] |
But as we can expect, there are some gotchas here. If you have "trained" eye, you probably spotted fact, that date time data is written in 12h format without AM/PM distinction. Well, this is because ... there is no such information provided in API response. I'm assuming that received data is sorted, so first 01 is one hour after midnight and second occurrence of 01 during the same date will correspond to 13 in 24h time format. For now I didn't bother to recalculate it according to above assumption - I'm hoping that this will be fixed soon so I don't have to deal with it. Second gotcha here is about range of data. Received data points are from range of three calendar days including current day, so it will contain at most 24 * 3 points. There is no way to modify that range, so if our data retrieving application crash, and we fail to notice it over three days, we will have data gap, which would not be filled until yearly data package will be released. Also, if someone is interested only in current values, he will always receive unneeded data which basically wastes bandwidth. Apart of those little flaws I didn't found other problems. Here's plot of this data:
Air quality index
Last data which we can get with API is current air quality. It doesn't seems to be very interesting - it just gives current air quality category for each sensor in station and overall air quality for that station. If you like to see how to access it I invite you to check my notebook dedicated to operations with API. It also contains all mentioned API requests and data processing.Conclusion
It's great we can access so valuable and important data trough API. Despite its simplicity and flaws it still provides good point for analysis of current air quality situation. If I could add something to that API, I would enable modifying time frame for measurements data, so users could fill the gaps in their copies of data for different time frames analysis. If only other public government data would be so nice...
No comments:
Post a Comment