Successful machine learning is not a one-off project, but a long-term roadmap to make your business more efficient through data. New data is constantly changing and responding to our actions. This means machine learning models have to be actively maintained to keep up.
In this blog post, our data scientist Andrew Rowlinson explains why successful machine learning is not a one-off project, but a long-term roadmap to make your business more efficient through data. Machine learning generally finds patterns in historical data. You then use these patterns learned from the old data to make predictions with your new data. In the real world, new data is constantly changing and responding to our actions. This means machine learning models have to be actively maintained to keep up.
At Bluugo, we provide Machine Learning-as-a-service (MLaaS) so you can keep your models up-to-date and accurate. Our systems monitor your data and predictions and we can automatically retrain them or manually step in when necessary. Our Machine Learning Kickstart Package will give you a comprehensive understanding of how to improve your business with machine learning in a few weeks.
The way the data changes over time is called concept drift. One concrete example where it has shown up for us was the unusually mild winter weather we experienced in Finland in the beginning of 2020.
We run an award-winning solution at Helsinki-Vantaa Airport to predict flight delay times. Generally, the weather in Helsinki is cold, snowy, and icy in the winter, but this year it was unusually mild. You would expect that with milder weather conditions, there would be fewer delays. However, the model may have learned that flight delays are higher in December and January, rather than associating the higher delays with the colder weather conditions. To ensure the models stay accurate we had to retrain them with the latest data, which helps the model work better as there are similar examples of mild weather conditions to learn from.
A tweet by Peter Skomoroch / Jordan Smoller.
You also have to think carefully about the historical data used to train the machine learning models. The outbreak of coronavirus has meant 90% of flights are cancelled in Finland. This will directly impact flight delays. These are highly unusual times. In the future, as/if the flight services return to normal, we will need to investigate how to use data from this period as delay patterns are likely to be very different from normal. For example, we might want to exclude it from the training process or weight it to reduce its importance.
Data rapidly changes nearly everywhere where machine learning is being applied. Here are some examples:
- you are predicting house prices. The government may implement a scheme to help first-time buyers and the house prices rise in response. The old data on house prices now differs from the new data.
- you deploy predictive maintenance for your machinery. Your maintenance staff now respond proactively to prevent machinery breaking. The old data on breakdowns now differs from the current data as there are fewer unplanned maintenance activities.
- you are predicting spam posts on a forum. After you implement the model, the behaviour of the people posting spam changes to avoid your spam filter and your model no longer works.
Here is another great example on the subject by Ville Tuulos from from Netflix, where he talks about hypothetically losing millions of dollars from missing data on Mexico (the part starts at 09:12).
The solution is to audit your data so you know when it changes, audit your models so you know how well they are performing, and retrain them when they are underperforming. There is a great technical paper about hidden maintenance costs in machine learning systems by Google developers. It highlights that deploying machine learning is relatively fast and cheap, but maintaining them over time takes effort.
Machine Learning is a powerful tool that can be used in countless ways to create a competitive advantage for companies across all industries.
If you’re interested in exploring the possibilities and potential of machine learning for your business case, take a look at our Machine Learning Kickstart Package – a completely risk-free way to get started in a few weeks.
More and more companies are building their core business processes on digital platforms, and the ability to gather and share data seamlessly wherever and whenever you need to has become a necessity for many. To do this you need stable internet connections – or do you?