Can we predict air quality?
One of fields covered by machine learning, is prediction of future values based on time series. Those values might be for example: stock market prices, number of products sold, temperature, clients visiting store and so on. The goal is to predict values for future time points based on historical data. The problem here is that available data consist only time point and value, without any additional information. In the other words, in this types of problems you have to predict shape of plot having only plot (and actual data frame) of historical data. How to do that? I have not much of ideas, but maybe naive and brute force approach will give me something.
Yes, we can!
What kind of brute force I have in mind? Well, even if only data we have, are pollutant measurements values and dates and time over which it was averaged, we can still treat is as regression problem ... with very small number of features. Actually it will be only one feature, which will be number of hours since first available measurement. So in addition of date time and value columns I will add calculated relative time column. That was "naive" part. Time for brute force.
In this case, by "brute force" approach I mean to push data into TPOTRegressor and see what will be the result. It is quite lazy and not too smart, but since I don't have to much time now it have to be enough.
After about 10 minutes of model mutating we can use it to predict values for next 24 hours and plot them to see if it make any sense.
In this case, by "brute force" approach I mean to push data into TPOTRegressor and see what will be the result. It is quite lazy and not too smart, but since I don't have to much time now it have to be enough.
After about 10 minutes of model mutating we can use it to predict values for next 24 hours and plot them to see if it make any sense.
Well ... it's something.
As you can see on plot above, we were able to generate "some" predictions for further pollutant concentration. Are they valid? I don't know because I didn't bother to perform too much cross validation and comparison with actual measurements. But even without them, we can see that shape of perditions is not as it suppose to be. I know that this approach make not too much sense, but I wanted to test how quickly can I make at least small step toward time series prediction. If you would like to check my code - here it is.
No comments:
Post a Comment