Thinking about building your own ML monitoring solution?

Superwise team

April 19th, 2021 min read

April 19th, 2021

min read

Thinking about building your own ML monitoring solution?

“We already have one!” That’s the first sentence most of our customers said when we met to discuss AI assurance solutions. Most AI-savvy organizations today have some form of monitoring. Yet, as they scale their activities, they find themselves at a crossroads: should they invest more in their homegrown solution or receive support from vendor solutions? And if they do choose to invest more, for how long will their DIY solution be “good enough”?

In this blog, we explore how far homegrown solutions can take you and what you need to think about when planning to scale your use of machine learning.

DIY tools are (only) a start when monitoring your AI

Data science teams spend months researching and training their best models. The production phase and the necessary MLOps/monitoring phase sometimes only come as an afterthought. In this context, many data science and engineering teams develop initial AI monitoring tools in-house. But while DIY tools may be a decent approach for businesses with a contained use of AI when the time comes to expand the use of modeling, homegrown tools fail short of supporting the diversity and complexity of the models and the data used. Here is a shortlist of some of the lessons learned that we have witnessed with customers scaling their AI.

As they grow, the number of models and use cases grow

Guess what? Homegrown solutions don’t scale in sync with the models and require more and more maintenance, tweaks, and attention…This is especially true as organizations adopt AI for various use cases: from marketing to core activities embedded in their product.

Models monitoring is not a one-off task. As organizations adopt new models, they need to create a new monitoring paradigm that caters to the different types of data – structured, text, image, video, etc..; all of which require different measures and techniques to analyze the incoming data for the process. In other words, what works for a classification model probably won’t work for a regression/clustering one, and a new set of tools will need to be developed. And even for specific structured use cases, different features of the model require different KPIs to analyze the health of the process: numerical/categorical/time/etc…

Regardless of the sophistication of the models, monitoring is an ongoing task that requires 25%-40% of a data science team’s time. The inefficiency and the frustration that comes with the heavy investment in homegrown monitoring solutions are among the first reasons that push organizations to turn to vendor solutions. Along with the fact that they would much rather their teams focus on creating models that have an impact on the business.

You don’t know what you don’t know

This is perhaps the most critical point. For organizations that have already engineered a solution that computes specific KPIs for your models, they find themselves struggling to proactively understand when concept drifts happen or when biases start to develop. More often than not, homegrown solutions tend to look at the things that are already known, and the issues that were already anticipated, thus realizing too late when events occur that are beyond this scope. This is often the point where organizations realize the limitations of their own solution, however sophisticated they engineered it to be, as it fails to bring value to the whole ML process.

In environments where data is extremely dynamic, assuring the health of models in production is about leveraging the expertise and best practices to be proactive: be alerted on issues that pertain to the health of the models, gain insights, and diagnose issues promptly.

Multiple stakeholders

As mentioned in a previous post, scaling AI poses the question of who owns it when it’s in production: data science teams? data engineering? business analysts? hybrid creatures? Ultimately, as AI use grows, the stakeholders involved also change, regardless of the number of models. Think about the fraud detection and cybersecurity space where analysts are the predominant users of the AI predictions and need to make sure the models are always tuned to a very dynamic data landscape.

For a monitoring solution to be useful, all the stakeholders involved need to derive insights and an understanding of the health of the predictions:

  • Data science teams need to understand if/when/how they should retrain the model, and the cases in which the model doesn’t perform well,
  • Business analysts want to know what drives decisions and get alerted as soon as there is high uncertainty regarding the model decision quality,
  • Data engineers need to know about the quality of the data streaming through the system, and whether it has outliers missing values or strange data distributions

To do so organizations need to create and maintain a view of the ML predictions that everyone involved can access and extract value from, without creating unnecessary noise. Beyond determining if there are sufficient resources, there is also a matter of skill set as all stakeholders often have different perceptions that need to be bridged under one enterprise-wide view. Ultimately, the complexity of these tasks is what drives AI practitioners to scale their activities to select a best-of-breed solution for assuring their models in production.

The amount of data is exponential!

In industries such as Adtech where models process TBs of data each day, the velocity of the data is a challenge to obtaining a clear picture. Do you have the time and tools necessary to continuously extract, compare, and analyze statistical metrics for your ML process, without impacting your core activities?

Scaling your AI? Here’s what you need to ask yourself

Here’s a quick list of considerations you may want to think over as you consider the best way to assure the health of your models in production. At the end of the day, it boils down to a question of resources management and efficiency: how much time should you invest in developing a set of tools to monitor your models in production, today? And what will it cost you tomorrow as you add more and more models and use cases?

  • How much will a homegrown solution cost?
  • How efficient will this be in the long run?
  • Is it really what my team needs to focus on, or is it better to buy and use such a capability?
  • How can I foster an enterprise-wide understanding of the models’ health?
  • How can I make my monitoring solution a proactive one?

Don’t play around with your growth

At Superwise, we specialize in accompanying our customers as they transition from using homegrown solutions–or even nothing!–to a rich model observability solution that helps them achieve business impact and grow their AI practice. Enabling them to focus on what they do best: developing and deploying models that help their business grow.

Everything you need to know about AI direct to your inbox

Superwise Newsletter

Superwise needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our privacy policy.


Featured Posts

Drift in machine learning
May 5, 2022

Everything you need to know about drift in machine learning

What keeps you up at night? If you’re an ML engineer or data scientist, then drift is most likely right up there on the top of the list. But drift in machine learning comes in many forms and variations. Concept drift, data drift, and model drift all pop up on this list, but even they

Read now >
Everything you need to know about drift in machine learning
July 12, 2022

Concept drift detection basics

This article will illustrate how you can use Layer and Amazon SageMaker to deploy a machine learning model and track it using Superwise.

Read now >
Concept drift detection basics
Data Drift
August 31, 2022

Data drift detection basics

Drift in machine learning comes in many shapes and sizes. Although concept drift is the most widely discussed, data drift is the most frequent, also known as covariate shift. This post covers the basics of understanding, measuring, and monitoring data drift in ML systems. Data drift occurs when the data your model is running on

Read now >
Data drift detection basics