Guide to Machine Learning Journey
Prepared with the world’s best Kaggle Grandmaster, Gilberto Titericz, learn how to start and finish the Machine Learning Journey successfully
Every report and analyst is recommending incorporating AI into your business.
Harvard Business Review estimates that AI will add $13 trillion to the global economy in the next ten years. Finding new opportunities, increasing operational efficiencies, and reducing costs are well-known financial benefits of AI. As a result, more and more companies are turning to their data science team or building a data science team, in order to implement AI into their business. However, many business leaders will claim victory once they have a data science team in place and fail to follow up with the team to see the return on investment (ROI).
Based on discussions with business and data science leaders and experts from Ericsson, DuPont, Zappos, and many more, it became clear that many organizations do not see ROI because although they know how to start an AI project, they don’t know how to finish it to realize the business impact. The main reason for this challenge is the disconnection between business and data science teams throughout an AI or ML Journey.
For the purpose of simplicity, this guide will focus on starting a Supervised Machine Learning project.
Every successful AI project starts with a clearly defined business problem to solve using Supervised Machine Learning. A business problem can come from any business user – such as business analysts, marketing managers, data engineers, and logistics managers who are subject matter experts that understand the business and the data.
A business analyst could ask questions such as: “Given the current market trend, how many of Item A should be ordered to meet the customer demand?” and “Which customers are likely to churn, given their history with our company?” The answer should be in a format that can be described objectively. Subjective questions such as “Is this room family-friendly?” cannot be used. Once a labeled target variable is identified, the next step is to look for data.
To solve a business case using supervised machine learning, a business analyst needs to have a dataset that can be used to train a model. If the business question originated from exploring an existing dataset, the analyst may use that dataset as a starting point. However, if the dataset is not yet defined, the analysts may collaborate with colleagues – such as data scientists, data engineers, or other cross-functional team members – to find sources of data that can be combined to create the dataset. In some cases, the desired data may not be available, and consideration should be given to the cost of obtaining the data. Some data can be obtained by purchase, and other data needs to be collected. It is important to remember that implementing AI into a business is not a one-person job. It’s a highly collaborative process that requires participation from many team members, like an orchestra.
Once the initial dataset is identified, we recommend establishing a repeatable process to source the data so it can be consistently used by other team members. Defining where the data is located, how to collect it, and documenting the whole collection pipeline are good places to start for reusability and data governance.
Depending on the size and structure of the company, the data may be managed by different teams using different repositories of data. The Ople.AI Platform provides various options for users to transfer data into the platform. Furthermore, by keeping records of datasets used to build different models, the users are able to quickly iterate and test different theories while learning what data is most relevant for a machine learning model.
Once the right dataset is defined, the next step is to translate the business problem into a data science task. For example, is it a regression or classification task?
Assume that a business analyst approaches the data science team and asks for help in predicting whether a customer would churn or not. The business objective, in this case, would be to prevent a customer from churning and to instead secure revenue from renewals. The success could be measured by comparing the renewal status in the next period.
To the data science team, this business problem would be a classification task – churn or not churn (renew). And the data scientist may choose to evaluate the performance of the model’s predictions using the Area Under the ROC Curve (AUC) metric. These are concepts that are specific to machine learning, and typically require the analyst to have studied the concepts.
Since the Ople.AI Platform is designed for business users, you don’t need to understand machine learning concepts like Regression vs. Classification or AUC. Based on the data you provide, and the column in your data that you want to predict, Ople.AI will choose the type of machine learning model for you (Regression, Binary Classification, Multi-Classification) and present the model’s prediction performance in terms that business users can understand.
With the dataset and target identified, the next step is to build models. A data scientist needs to build multiple models, using various techniques such as feature engineering, hyperparameter optimization, ensembling, and stacking to find the most accurate model. They will compare different versions and test various modifications to the machine learning pipeline to continuously optimize. It’s important in this stage for the data scientist to regularly communicate the results with the business stakeholders.
The Ople.AI Platform builds and tests various models using the same techniques as a data scientist, but does this on behalf of the business users so the users don’t need to be familiar with the science of machine learning. The resulting model is performant and lightweight every time. With the Automated Machine Learning technology running behind the scenes, our platform offers an end-to-end solution.
While it may seem logical to define project success as achieving a model with the highest accuracy, in reality, it isn’t. Depending on the business objective and sensitivity, the higher accuracy model may not contribute to increased revenue compared to a model with a little less accuracy.
Assume that the data science team was able to build a model that predicts whether a customer will churn with 80% accuracy. The business analyst, who is the project owner, is faced with two options: (1) use the 80% accurate model to reduce churn and increase customer lifetime value today, or (2) spend more time to increase the accuracy of the model. In order to make the decision, the business analyst and the data scientist should be in perfect sync in terms of the objective and limitations. The data scientist may need an extra three months to increase the accuracy by 1-percentage point, in which case, the business owner may be better off using the current model and realize revenue gains for the additional three months.
The Ople.AI Platform empowers users to build and evaluate AI models quickly, and in an iterative manner, then deploy those models with the click of a button. The models are explained in business terms, not textbook statistical terms, helping users to understand how effective the model will be in delivering meaningful business results. The use of business terms in the evaluation of the performance of models fosters better communication between the stakeholders.
Each time a model is built, its performance needs to be evaluated by both technical and business stakeholders. In fact, Step 4 through 6 is a reiterative process to enhance the performance and achieve the organizational requirements.
Once the model is built, the Ople.AI Platform provides further explanations for users to evaluate the performance with just a few clicks. Comprehensive information such as confusion matrix and benchmarks against popular machine learning models are provided to help technical experts evaluate.
When the final model is built, the last step is to actually use the model in your business. For example, if your business analyst is using AI to predict next quarter’s sales, they can now go further and optimize the sales and marketing activities to maximize the effect. If the prediction was a drop in sales, they can dive deeper into why the drop is being predicted and act accordingly to turn it into an increase.
Everything has to be tied back to business value. The purpose of the model is to use its predictions to achieve beneficial business changes.
Once the model is built, the Ople.AI Platform automatically deploys your model and lets users run simulations. A user can learn the important or most relevant variables, try different values, and see how the target changes in real-time. In addition, the model can be easily integrated into more familiar tools like Google Sheets and Tableau for more stakeholders to be equipped with AI in their day-to-day activities.
Once AI is implemented to solve your business problems, you should continue to use it and improve it. New data might come in, or your company’s strategy might shift. When that happens, you should reevaluate the current AI model and adjust accordingly to maximize the benefits.
With the Ople.AI Platform, you will always have the record of how a model was built, making it easier for anyone in your team to reiterate and update a model when necessary. Datasets and models can be shared between team members, to facilitate collaboration and refinement of problem definitions as the business matures its use of AI.
The Ople.AI Platform
The easiest way to accelerate decision making and reduce risk with predictive analytics
With the Ople.AI Platform, you are only a few clicks away from building predictive models to derive optimal business recommendations with reduced risk.