Fundamentals of efficient ML monitoring

Oren Razon

December 9th, 2020 min read

December 9th, 2020

min read

Fundamentals efficient ML model monitoring

Today’s enterprises rely on machine learning-powered predictions to guide business strategies, such as forecasting demand and mitigating risk. For an increasing number of businesses, machine learning (ML) underpins their core business model, like financial institutions that use ML models to approve or reject loan applications.

As ML is drastically different from other software or traditional IT, models risk degrading the moment they are pushed into production – where the hyper-dynamic nature of the data meets the hypersensitivity of the models. These “drifts” in the data structure, or other properties that cause model degradation, are too often silent and unobservable.

In the last few months, triggered by the COVID-19 crisis, we have all witnessed companies struggling to fix corrupted, business-critical models. One of the most documented such issues was Instacart, whose inventory prediction model’s accuracy declined from 93% to 61%, leaving a sour after-taste for their customers and their teams.

Rare are the data science and engineering teams who are prepared for this “Day 2”, the day their models meet the real world; as they invest the majority of their time researching, training, and evaluating models. While it’s clear that teams want to address any potential issues before they arise, there is a lack of clear processes, tools, and requirements for production systems. Today, the industry still lacks guidelines on what an optimal ML infrastructure should look like

That’s why we’ve gathered best practices for data science and engineering teams to create an efficient framework to monitor ML models. The ebook provides a framework for anyone who has an interest in building, testing, and implementing a robust monitoring strategy in their organization or elsewhere. The article below briefly covers some of the key points.

The fundamentals

1. Look beyond performance measurements

It’s instinctive to focus on model performance as a key metric, but detecting degraded performance might only be achieved too late or remain undetected, as it requires the ability to collect the ground truth  – which in many cases is missing or is only collected once your business has suffered a blow.

It’s crucial to monitor the stability of the entire ML flow, including input, inferences, and output. This allows you to spot the earliest indication of drifts in real time. Additionally, monitoring input delivers useful insights into the cause of drifts, which speeds up diagnostics and remediation.

2. Use different metrics for different features

Tracking multiple models comes with the challenge of being able to build relevant metrics for each case. Therefore a centralized monitoring solution should take into account different types of data and use multiple performance metrics. For instance, for numeric features, the mean, std, min, max, outliers, etc., must be monitored; while for categorical features, the number of unique values, new values, entropy, portion of the most frequent values, etc., are what matter the most.

3. Use a granular point of view

Overall, data can obscure serious segmental drifts since many data changes are quite subtle and difficult to detect. Sometimes, drift only affects a certain subset of your dataset or appears only at specific seasons, making it easy to miss if you stick to a high-level overview.

4. Avoid overflow and detect events automatically

There’s no efficient way to manually monitor your ML models. Every model could have dozens of features for multiple segments or sub-populations, each requiring multiple metrics and each with different natural distribution changes over time. An effective monitoring framework needs to be both automated and smart. With time series anomaly detection and causality analysis methods, teams can easily extrapolate data and aggregate metrics in a way that indicates the level of urgency of the detected deviation and promptly identify their potential root cause.

5. Comparing different versions in parallel

If you’re validating new versions or benchmarking your production model, you’ll need to monitor them alongside your existing version for some time, using either a shadow model, A/B testing, or the multi-armed bandit (MAB) approach.

Whichever approach you use, ensure that you can carry out advance testing for a number of use cases and gather and analyze the status and KPIs of both versions to check for serious diversions between the two, where those diversions take place, whether they are the differences you expected, etc. You can consult our extensive article here.

6. Monitor protected features as proxies to ensure fairness

The first step in preventing biases in models is to look for them in the training data in the research phase before training and deploying to production. However, no matter how careful the data scientist is and even if the model was developed and evaluated to ensure it’s free of biases, the data in production continuously changes. There is always a risk that a bias that hasn’t been seen so far may show up and increase or even be amplified by the model itself. Therefore, it is important to continuously test your model for biases with the live production data.

7. Ensure your monitoring is platform agnostic

Frequently, data science and ML engineering teams within the same organization use different platforms to develop and deploy their ML models (custom Python microservices, SageMaker, TensorFlow serving,…). Your ML monitoring framework needs to be decoupled from the ML platform so that you can apply it to Python, Java, R, and other types of technologies backing your models.

8. Empower all stakeholders

Most AI models involve multiple stakeholders, but while they’re all invested in the model operating at peak accuracy, they often struggle to communicate with each other.

Every team needs independent access and visibility, including the operational team, which has to be able to monitor models without the data science team’s involvement. Your teams need a single dashboard and a common language to track and discuss ongoing issues.

9. Use your production insights for other stages of your ML process

To truly maximize the value of your monitoring framework, you need to feed the results back into the entire ML pipeline. For instance, your monitoring framework should guide you to the most accurate datasets, so when you come to retraining models, you can exclude those that show anomalies or use only data that reflects a long-term change.


Download our eBook to learn more about the best practices out there to refine your ML monitoring strategy and avoid the pitfalls of “Day 2.”

Everything you need to know about AI direct to your inbox

Superwise Newsletter

Superwise needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our privacy policy.


Featured Posts

Drift in machine learning
May 5, 2022

Everything you need to know about drift in machine learning

What keeps you up at night? If you’re an ML engineer or data scientist, then drift is most likely right up there on the top of the list. But drift in machine learning comes in many forms and variations. Concept drift, data drift, and model drift all pop up on this list, but even they

Read now >
Everything you need to know about drift in machine learning
July 12, 2022

Concept drift detection basics

This article will illustrate how you can use Layer and Amazon SageMaker to deploy a machine learning model and track it using Superwise.

Read now >
Concept drift detection basics
Data Drift
August 31, 2022

Data drift detection basics

Drift in machine learning comes in many shapes and sizes. Although concept drift is the most widely discussed, data drift is the most frequent, also known as covariate shift. This post covers the basics of understanding, measuring, and monitoring data drift in ML systems. Data drift occurs when the data your model is running on

Read now >
Data drift detection basics