Wednesday, March 15, 2017

Air Quality In Poland #04

Hey! Welcome in 4-th post dedicated to messing with air quality in Poland data. In last post we prepared data files for easy manipulation. It is time now, for actually analyze something.

For fist analysis I decided to look for most extreme values for each of most important pollutants across year 2015. So basically I will look for dead zones across Poland ;). And what are those important pollutants? In Poland, air quality is calculated as worst class of classes among "PM10", "PM25", "O3", "NO2", "SO2", "C6H6", "CO".

In order to find such places we need to locate proper files. To do that we need simple selection from data frame:

 importantPollutants = ["PM10", "PM25", "O3", "NO2", "SO2", "C6H6", "CO"]  
 pollutants2015 = dataFiles[(dataFiles["year"] == "2015") & (dataFiles["resolution"] == "1g") &   

which will give us much smaller data frame with list of interesting files:

Since we have relevant list of files, we can write simple (and probably not super efficient) loop over them, which will find maximum vale of each pollutant and corresponding measurement station. It will look like that:

 worstStation = {}  
 for index, dataRow in tqdm(pollutants2015.iterrows(), total=len(pollutants2015.index)):  
   dataFromFile = pd.read_excel("../input/" + dataRow["filename"] + ".xlsx", skiprows=[1,2])  
   dataFromFile = dataFromFile.rename(columns={"Kod stacji":"Godzina"})  
   dataFromFile = dataFromFile.set_index("Godzina")  
   worstStation[dataRow["pollutant"]] = dataFromFile.max().sort_values(ascending = False).index[0]  

This loop is taking 2 minutes on my ThinkPad X200s and produces only dictionary with pollutants as keys and codenames of stations as values. We may easily count values occurrences and see worst "dead zone":

 Counter({u'LuZarySzyman': 1,  
      u'MzLegZegrzyn': 1,  
      u'MzPlocKroJad': 1,  
      u'OpKKozBSmial': 1,  
      u'PmStaGdaLubi': 1,  
      u'SlRybniBorki': 2})  

Since "SlRybniBorki" doesn't says much, we must consult "Metadane_wer20160914.xlsx" file which allows us to decode this station as "Rybnik, ul. Borki 37 d". Whoever lives there, I feel sorry for you!

Thats all for today, thanks for reading! ;)

No comments:

Post a Comment