Thursday, April 20, 2017

Air Quality In Poland #12 - What is this? Is this .... API?

Surprise from GIOS

Image from memegenerator.net
When I was starting playing with air quality data from GIOS, only reasonable way to obtain it was download pre-packed archives with data for each year separately. Also when I'm publishing this post, last data archive is dated 2015, so there is no data from whole 2016 and 2017-now available. 

But recently, GIOS with collaboration with ePaństwo Foundation released initial and experimental version of data RESTful API. It is very simple, but you don't have to identify yourself with private key and there are no significant limitations of how one can use it. Lets see what can we do with it.

Currently, four types of request are handled which can give us following data: measurement stations, sensors, measured data and air quality index.

Measurement stations

First API request we should check is station/findAll. This API request should give us JSON response with list of measuring stations and some information about them. Most important field from this response is top level id, which contains id of station. We will need that value as part of further requests. To receive data from this request, parse it as data frame, and (for example) select interesting place we can do those simple operations:
r = requests.get('http://api.gios.gov.pl/pjp-api/rest/station/findAll')
allStations = json_normalize(r.json())
print(allStations[allStations["city.name"] == u"Gdańsk"])
We will receive following data frame:
addressStreet city city.commune.communeName \
2 ul. Powstańców Warszawskich NaN Gdańsk
14 ul. Kaczeńce NaN Gdańsk
26 ul. Wyzwolenia NaN Gdańsk
67 ul. Leczkowa NaN Gdańsk
75 ul. Ostrzycka NaN Gdańsk
city.commune.districtName city.commune.provinceName city.id city.name \
2 Gdańsk POMORSKIE 218.0 Gdańsk
14 Gdańsk POMORSKIE 218.0 Gdańsk
26 Gdańsk POMORSKIE 218.0 Gdańsk
67 Gdańsk POMORSKIE 218.0 Gdańsk
75 Gdańsk POMORSKIE 218.0 Gdańsk
dateEnd dateStart gegrLat gegrLon id \
2 None 1996-10-01 12:00:00 54.353336 18.635283 729
14 None 1996-10-01 12:00:00 54.367778 18.701111 730
26 None 1998-09-01 12:00:00 54.400833 18.657497 731
67 None 1998-10-01 12:00:00 54.380279 18.620274 736
75 None 1998-05-01 12:00:00 54.328336 18.557781 733
stationName
2 AM1 Gdańsk Śródmieście
14 AM2 Gdańsk Stogi
26 AM3 Gdańsk Nowy Port
67 AM8 Gdańsk Wrzeszcz
75 AM5 Gdańsk Szadółki
In my example I picked station AM5 Gdańsk Szadółki which has id 733.

Sensors

Since we have station id we can explore its sensors now. Overall idea of sending request and transforming its response is the same as in measurement stations example:
stationId = 733
r = requests.get('http://api.gios.gov.pl/pjp-api/rest/station/sensors/' + str(stationId))
sensors = json_normalize(r.json())
print(sensors)
As result we have neat data frame with information about sensors located in this measurement station:
id param.idParam param.paramCode param.paramFormula \
0 4720 8 CO CO
1 4727 3 PM10 PM10
2 4723 6 NO2 NO2
3 4725 5 O3 O3
4 4730 1 SO2 SO2
param.paramName sensorDateEnd sensorDateStart stationId
0 tlenek węgla None 1998-05-01 12:00:00 733
1 pył zawieszony PM10 None 1998-05-01 12:00:00 733
2 dwutlenek azotu None 1998-05-01 12:00:00 733
3 ozon None 1998-05-01 12:00:00 733
4 dwutlenek siarki None 1998-05-01 12:00:00 733
We can easily see that we have 5 sensors located there. Lets then explore data from PM10 sensor which has id 4727.

Measured data

Now we arrived to probably most interesting part of API. We can get actual measurement data here. As we can expect, complexity of getting this data is similar as above, with one distinction. We are receiving list of dictionaries, and each dictionary contains two pairs of key/value. So if we want nice data frame we have to add additional transformation. But fear no more - it is quite simple and gives us wanted results immediately:
sensorId = 4727
r = requests.get('http://api.gios.gov.pl/pjp-api/rest/data/getData/' + str(sensorId))
concentration = json_normalize(r.json())
concentrationFrame = pd.DataFrame()
concentrationFrame["dates"] = [d[u'date'] for d in concentration["values"].values.item()]
concentrationFrame["values"] = [d[u'value'] for d in concentration["values"].values.item()]
concentrationFrame.set_index(["dates"], inplace=True)
concentrationFrame = concentrationFrame.iloc[::-1]
view raw aqip12-data.py hosted with ❤ by GitHub
Example results:
values
dates
2017-04-16 01:00:00 18.64540
2017-04-16 02:00:00 8.53258
2017-04-16 03:00:00 3.52958
2017-04-16 04:00:00 2.12867
2017-04-16 05:00:00 1.00000
2017-04-16 06:00:00 12.96610
2017-04-16 07:00:00 13.20580
2017-04-16 08:00:00 2.32258
2017-04-16 09:00:00 12.68400
2017-04-16 10:00:00 16.42370
2017-04-16 11:00:00 2.36208
2017-04-16 12:00:00 18.84620
2017-04-16 01:00:00 2.88683
2017-04-16 02:00:00 1.00000
2017-04-16 03:00:00 4.57983
2017-04-16 04:00:00 1.00000
2017-04-16 05:00:00 1.56867
2017-04-16 06:00:00 5.09883
2017-04-16 07:00:00 6.82033
2017-04-16 08:00:00 14.69580
2017-04-16 09:00:00 6.48150
2017-04-16 10:00:00 9.68442
2017-04-16 11:00:00 13.97220
2017-04-17 12:00:00 21.47280
2017-04-17 01:00:00 23.68470
2017-04-17 02:00:00 18.86320
2017-04-17 03:00:00 16.08500
2017-04-17 04:00:00 17.16610
2017-04-17 05:00:00 25.23930
2017-04-17 06:00:00 23.35970
... ...
2017-04-17 02:00:00 1.00000
2017-04-17 03:00:00 9.16675
2017-04-17 04:00:00 4.61542
2017-04-17 05:00:00 12.47840
2017-04-17 06:00:00 10.25780
2017-04-17 07:00:00 1.00000
2017-04-17 08:00:00 6.54967
2017-04-17 09:00:00 6.25992
2017-04-17 10:00:00 5.65067
2017-04-17 11:00:00 1.00000
2017-04-18 12:00:00 6.41642
2017-04-18 01:00:00 4.60872
2017-04-18 02:00:00 1.00000
2017-04-18 03:00:00 8.82750
2017-04-18 04:00:00 4.15767
2017-04-18 05:00:00 7.86589
2017-04-18 06:00:00 4.47778
2017-04-18 07:00:00 8.80628
2017-04-18 08:00:00 16.62710
2017-04-18 09:00:00 20.00600
2017-04-18 10:00:00 12.23940
2017-04-18 11:00:00 8.05267
2017-04-18 12:00:00 8.93228
2017-04-18 01:00:00 13.93350
2017-04-18 02:00:00 12.10550
2017-04-18 03:00:00 20.18720
2017-04-18 04:00:00 17.33160
2017-04-18 05:00:00 20.82570
2017-04-18 06:00:00 9.27978
2017-04-18 07:00:00 16.89930
[67 rows x 1 columns]
But as we can expect, there are some gotchas here. If you have "trained" eye, you probably spotted fact, that date time data is written in 12h format without AM/PM distinction. Well, this is because ... there is no such information provided in API response. I'm assuming that received data is sorted, so first 01 is one hour after midnight and second occurrence of 01 during the same date will correspond to 13 in 24h time format. For now I didn't bother to recalculate it according to above assumption - I'm hoping that this will be fixed soon so I don't have to deal with it. Second gotcha here is about range of data. Received data points are from range of three calendar days including current day, so it will contain at most 24 * 3 points. There is no way to modify that range, so if our data retrieving application crash, and we fail to notice it over three days, we will have data gap, which would not be filled until yearly data package will be released. Also, if someone is interested only in current values, he will always receive unneeded data which basically wastes bandwidth. Apart of those little flaws I didn't found other problems. Here's plot of this data:

Air quality index

Last data which we can get with API is current air quality. It doesn't seems to be very interesting - it just gives current air quality category for each sensor in station and overall air quality for that station. If you like to see how to access it I invite you to check my notebook dedicated to operations with API. It also contains all mentioned API requests and data processing.

Conclusion

It's great we can access so valuable and important data trough API. Despite its simplicity and flaws it still provides good point for analysis of current air quality situation. If I could add something to that API, I would enable modifying time frame for measurements data, so users could fill the gaps in their copies of data for different time frames analysis. If only other public government data would be so nice...

No comments:

Post a Comment