Saturday, March 4, 2017

Air Quality In Poland #01

Dear reader! Welcome to my first post tagged "Get Noticed 2017!". As you can see, this is not first post on this blog. But as you can also see, I'm not publishing regularly. This is my attempt to change it. By change it, I mean to publish at least two posts weekly. One post dedicated to my open source project and other post related to IT. By doing it, I will fulfill requirements of competition "Get Noticed!". So I will try to kill two birds with one stone. Or something like that. I was inspired to compete in "Get Noticed!" by Michał and encouraged by Wojtek which are also competing in it. We will see how will it roll.

It looks like there is difference
in air quality between two
central points ...
What open source project I will develop here? Since there were no restrictions about technology, programming language, topics and purposes, I decided to perform exploratory data analysis on air quality data of Poland. What exactly do I meant by that? I would like to build Jupyter Notebook in which I will do step by step research analysis. My goal will be to build usable analysis which will be worthy, scientifically correct and engineeringly valid. And it will be fully reproducible of course!

... but the yellow point show
only data for PM10 ...
My first step will be toward downloading and preparing GIOŚ air pollution data. I will try to estimate missing values, so every point on map will have various pollutant estimates. Then I will work on visualizing those estimates. Next step will be dedicated to gathering and discussing various facts related to performed analysis. When I will be able to complete mentioned steps I will try to build predictive model for predicting best time for physical outdoor activities, for various places in Poland. So those are my initial ideas for project development.

... and orange one shows also
PM 2.5. Is air really better in
yellow one?
Which technologies will I use? I'm big fan of Python, so I will use it exclusively. I'm thinking about using Pandas, NumPy, Matplotlib and Folium modules, but this list will probably change over time. My product will have form of Jupyter Notebook, but I may not restrict myself to only one Notebook. If my work will be effective I might consider building standalone Python scripts to perform some parts of analysis. Time will show what ideas will pop up.

What was my motivation to pick up such idea for project? Recently in Poland we had some discussions about poor air quality around couple of big cities known of heavy industrial profile. I found that some persons are getting wrong conclusions about data points. Worst case was when reporter was comparing two not so distant points and air quality index (AQI) in them. One point was "safe" and other "unsafe", and that was clearly highlighted by reported. But the problem with "safe" point was that measuring station weren't measuring all pollutants there, so system probably filled missing values with zeros. This case leads to situation when someone think that he is chilling in "safe" air quality zone, while actually he knows nothing about it. This problem pushed me to thinking about better way to estimate air quality in points where there are no direct measurements available. I don't have any experience with working with air quality and geospatial data but I think I will be able to perform at least some basic analysis and produce some useful pieces of code.

I also hope that participating in "Get Noticed 2017!" competition will give me much fun and possibly some feedback related to my work. Michal is very enthusiastic about it, so I need at least try to maintain development and writing momentum and see what I will build. Last but not least: source code for my project is located here.

Ideas to implement:
  • Fill missing values for measuring stations
  • Interpolate pollutants values over Poland
Random ideas:
  • Scrap current data from GIOS server
Ideas graveyard:
  • Build mobile application based on my work

No comments:

Post a Comment