Machine Learning enables many innovative and nontraditional ways to improve business processes, but the execution of these projects often proves to be harder than anticipated. Find out what are the most common pitfalls of Machine Learning projects and how they can be avoided.
Think before you do
Machine learning is a science that uses statistical models and algorithms to perform a specific task without using explicit instructions but relying on patterns and inference instead. When executed properly, it can be applied to many business cases to create added value through innovative, nontraditional ways. Many companies are showing interest towards these cutting-edge solutions and are eager to get started as fast as possible to gain competitive advantage over their competitors. However, before diving head-first into a new project and start running pilots, it is crucial to understand what it takes to embed these powerful tools in your business landscape. Otherwise there's a risk that you end up repeating the age-old IT project anecdote: multiplying your budget with pi.
The process of applying machine learning to business use from start to finish usually consists of 5 key phases:
- Defining KPI’s
- Collecting Data
- Developing infastructure
- Optimizing models
To really understand the outlines of a machine learning project, it is important to understand each phase individually – what type of expertise is needed and how much time and effort each phase requires? Answering these two questions is pivotal in order to understand what your ROI for a project can be.
Ville Tuulos, a Machine Learning Architect from Netflix shared some great insights about the execution and effort allocation of machine learning projects at the North Star AI’s Applied Machine Learning & Data Science conference in Tallinn a few weeks ago. One of Ville’s key insights was that Data Science is more about the Data than Science. The same subject was also brought up by André Karpištšenko, the Head of Data Science from Taxify, who did a marvelous job visualizing the problems related to the common misperceptions regarding the effort allocation of machine learning projects. We at Bluugo couldn’t agree more with these gentlemen and are delighted to see industry-leading companies share our thoughts.
The picture below represents how many companies perceive the amount effort that goes into the different phases of a machine learning project:
…and this is how the amount of effort actually divides:
You probably see why this can be highly problematic?
It is quite a common misperception that most of the effort behind a machine learning solution has to do with optimizing the algorithms and models. Define your KPI’s, spend a few weeks optimizing some awesome models, slap it on top of some data and get amazing results? Nope, it is not quite that simple.
It’s all about the Data and Infrastructure
Even though optimizing models and algorithms is a crucial part of any machine learning solution, most of the effort goes into collecting data and building infrastructure. Before any algorithms and models can be optimized, a set of base data must be gathered. In most cases this means hundreds of thousands of lines of data – and not just any data. The base set must consist of high-quality data that is relevant to the defined KPI’s.
Refining heaps of raw data from different sources into a functional base data set and optimizing models based on that not only requires high level of expertise but also a great deal of time and effort. If every project is started from scratch, the potential of machine learning solutions is directly limited to the number of Data Scientists and engineers available. To avoid this, it is essential to build a strong infrastructure of standardized workflow and uniform, reproducible pipelines that can reliably produce high-quality training data across various models. Like any other tool, it is also important to constantly monitor and maintain those pipelines to guarantee a high-quality output. Building and maintaining a strong infrastructure in machine learning projects is a long-term investment that will pay itself back over and over again when executed well.
Integration is integral
This might sound obvious, but before starting a machine learning project with a selected partner, make sure you know what you are actually paying for. Because the common perception about the effort allocation of machine learning projects tends to be heavily weighted towards creating and optimizing models, it is also natural that most of the attention is focused around that – even to the point where the integration of the finished solution is all but forgotten during the project. It’s easy to get caught in the technical hype and lose sight of the ultimate objective behind the project: improving your business. It doesn’t do much good if you have a cutting-edge machine learning solution created by the best IT-company around if they can’t integrate it with your business and existing platforms. Learn from the past and keep your eyes on the prize!
In the end, behind every machine learning solution there is a need derived from an actual business problem – the solution itself is not worth a penny if it can’t produce value for your company. Business comes first, and the solution is just a tool to make it better. At Bluugo, we understand this and have seen some great results from our machine learning projects. Our solution for Swissport Finland was chosen as the winner of an Innovation Award at the Pride of Ground Handling Awards in Gothenburg last fall. Read more about the award-winning solution here.
If you’re looking for a partner for a machine learning project or would like to know more about our solutions, don’t hesitate to contact us!
CEO, Bluugo Oy