As I promised in last post, I should add calculation of overall air quality for each time point. Such quality is defined by worst category of quality for each measured pollutant. In order to determine this category, we need to examine each row and put proper category based on descriptive values:
for quality in qualities:
reducedDataFrame.loc[(reducedDataFrame[["C6H6.desc", "CO.desc", "NO2.desc", "O3.desc", "PM10.desc",
"PM25.desc", "SO2.desc"]] == quality).any(axis=1),"overall"] = quality
It might be not optimal procedure, but it seems to be quite fast, at least on reduced data frame. Since our qualities are sorted, if there is worse value in following iterations, this worse value is overwriting previous value in overall column:
qualities = sorted(descriptiveFrame.index.get_level_values(1).unique().tolist())
After generating additional column we need also to concatenate it with descriptive data frame
overall = reducedDataFrame.groupby(level="Station")["overall"].value_counts(dropna =
False).apply(lambda x: (x/float(hours))*100)
descriptiveFrame = pd.concat([descriptiveFrame, overall], axis=1)
descriptiveFrame.rename(columns={0: "overall"}, inplace=True)
And what are the results?
LuZarySzyman NaN NaN
1 Very good 9.601553
2 Good 57.266811
3 Moderate 26.955132
4 Sufficient 3.482133
5 Bad 0.890513
6 Very bad 0.308254
MzLegZegrzyn NaN NaN
1 Very good 1.255851
2 Good 50.941888
3 Moderate 31.693116
4 Sufficient 8.425619
5 Bad 3.950223
6 Very bad 2.580203
MzPlocKroJad NaN NaN
1 Very good 21.965978
2 Good 60.806028
3 Moderate 15.983560
4 Sufficient 0.947597
5 Bad 0.102751
6 Very bad 0.011417
OpKKozBSmial NaN NaN
1 Very good 2.922708
2 Good 54.446855
3 Moderate 30.117593
4 Sufficient 6.302089
5 Bad 4.144309
6 Very bad 2.009362
PmStaGdaLubi NaN NaN
1 Very good 43.155611
2 Good 38.075123
3 Moderate 12.204590
4 Sufficient 3.539217
5 Bad 1.758192
6 Very bad 1.164516
SlRybniBorki NaN NaN
1 Very good 1.541272
2 Good 56.444800
3 Moderate 27.662975
4 Sufficient 6.781596
5 Bad 3.242379
6 Very bad 3.014043
Name: overall, dtype: float64
It seems that amount of very bad data points in Rybnik are without change. But for example OpKKozBSmial data station has 2.009362 percent of very bad data points, but individually worst pollutant there has 1.198767 percent of very bad air quality time. So it seems that other pollutants are also significant - which is true with values 0.913346 and 0.399589 there.
Next post - looking for beast air quality place in Poland. I hope that my laptop would not explode.
No comments:
Post a Comment