Dear reader! Welcome to my first post tagged "Get Noticed 2017!". As you can see, this is not first post on this blog. But as you can also see, I'm not publishing regularly. This is my attempt to change it. By change it, I mean to publish at least two posts weekly. One post dedicated to my open source project and other post related to IT. By doing it, I will fulfill requirements of competition "Get Noticed!". So I will try to kill two birds with one stone. Or something like that. I was inspired to compete in "Get Noticed!" by Michał and encouraged by Wojtek which are also competing in it. We will see how will it roll.
It looks like there is difference in air quality between two central points ... |
What open source project I will develop here? Since there were no restrictions about technology, programming language, topics and purposes, I decided to perform exploratory data analysis on air quality data of Poland. What exactly do I meant by that? I would like to build Jupyter Notebook in which I will do step by step research analysis. My goal will be to build usable analysis which will be worthy, scientifically correct and engineeringly valid. And it will be fully reproducible of course!
... but the yellow point show only data for PM10 ... |
... and orange one shows also PM 2.5. Is air really better in yellow one? |
What was my motivation to pick up such idea for project? Recently in Poland we had some discussions about poor air quality around couple of big cities known of heavy industrial profile. I found that some persons are getting wrong conclusions about data points. Worst case was when reporter was comparing two not so distant points and air quality index (AQI) in them. One point was "safe" and other "unsafe", and that was clearly highlighted by reported. But the problem with "safe" point was that measuring station weren't measuring all pollutants there, so system probably filled missing values with zeros. This case leads to situation when someone think that he is chilling in "safe" air quality zone, while actually he knows nothing about it. This problem pushed me to thinking about better way to estimate air quality in points where there are no direct measurements available. I don't have any experience with working with air quality and geospatial data but I think I will be able to perform at least some basic analysis and produce some useful pieces of code.
I also hope that participating in "Get Noticed 2017!" competition will give me much fun and possibly some feedback related to my work. Michal is very enthusiastic about it, so I need at least try to maintain development and writing momentum and see what I will build. Last but not least: source code for my project is located here.
Ideas to implement:
- Fill missing values for measuring stations
- Interpolate pollutants values over Poland
Random ideas:
- Scrap current data from GIOS server
Ideas graveyard:
- Build mobile application based on my work
No comments:
Post a Comment