Data-driven retraining with production observability insights

We all know that our model’s best day in production will be its first day in production. It’s simply a fact of life that, over time, model performance degrades. ML attempts to predict real-world behavior based on observed patterns it has trained on and learned. But the real world is dynamic and always in motion; sooner or later, depending on your use case and data velocity, your model will decay and begin to exhibit concept drift, data drift, or even both. 

Your best day in production is your first day in production

When models misbehave, we often turn to retraining to attempt to fix the problem. As a rule of thumb, data science teams will often take the most recent data from production, say 2 or 3 months’ worth of data, and retrain their model on it with the assumption that “refreshing” the model’s training observations will enable it to predict future results better. But is the most recent data the best data to resolve a model’s performance issues and get it back on track? Think about it this way, if an end-to-end software test failed would you accept it as fixed just by rerunning the test? Most likely not. You’d troubleshoot the issue to pinpoint the root cause and apply an exact fix to resolve the issue. ML teams do precisely this with model monitoring to pinpoint anomalies and uncover their root cause to resolve issues quickly before they impact business outcomes. But when the resolution requires retraining, “fresh is best” is not exactly a data-driven approach.

This article will demonstrate how data science and ML engineering teams can leverage ML monitoring to find the best data and retraining strategy mix to resolve machine learning performance issues. This data-driven, production-first approach enables more thoughtful retraining selections and shorter and leaner retraining cycles and can be integrated into MLOps CI/CD pipelines for continuous model retraining upon anomaly detection.

Matching production insights to retraining strategies

The insights explained below are based on anomalies detected in the Superwise model observability platform and analyzed in a corresponding jupyter notebook that extracts retraining insights. All the assets are open for use under the Superwise community edition, and you can use it to run the notebook on your own data. 

* It’s important to note that the value of this approach lies in identifying how to best retrain once you have eliminated other possible issues in your root cause investigation. 

Identifying retraining groups 

The question? What data should I use for the next retraining?

Models are subject to temporality and seasonality. Selecting a dataset impacted by a temporal anomaly or flux can result in model skew. An important insight from production data is data DNA or the similarity of days distribution. Understanding how data is changing between dates (drift score between dates) enables date-based grouping based on similarities and differences. With this information, you can create a combination of data-retraining groups that reflect or exclude the temporal behavior of your production data. 

Here we can see a heatmap plot matrix of dates X dates, and each cell represents the change between 2 dates. Cells colored in bold are very different from each other, while cells that are colored lightly represent dates that are very similar to each other.

Identifying retraining groups
Data DNA

As you can see in this example, the data is divided into 3 main groups, orange, red, and green, representing the 3 optional datasets to use in the next retraining.

  • Red – the red group reflects a recurring event in our data that we want to train on. This could be, for example, behavior over the weekends. 
  • Orange – the orange group is normal data behavior in production.
  • Green – the green group represents a unique behavioral event. For example, this could be a marketing campaign in a click-through rate optimization use case. 

Depending on your domain insights, the include/exclude decisions may differ. If the marketing campaign was successful and the insights will be rolled-out to all marketing campaigns, you may decide to retrain green and red. If the campaign was a one-time event or a failed experiment, orange and red would be a better retraining data group selection. 

Retraining groups
Retraining groups

Identifying drifted segments

The question? Which populations are impacted?

A model’s purpose is to abstract predictions across your population, but with that said, you will always need to monitor your model’s behavior on the segment level to detect if a specific segment is drifting. When segment drift is detected, we can consider the following resolutions or even a combination of them, together with retraining. 

  • Model split – create a specific model for the segment.
  • Optimize the model – suit the model to handle the current segment.
  • Resample the data – change the data distribution that the model will learn on data for the specific model.

Here we can see the segment drift value where each bar shows the drift score of each segment. Before taking action, it is important to understand the relationship of the segment size to the segment drift to determine the extent of the segment’s effect on the model.

Mean segment drift
Mean segment drift

Moreover, this lets us see the relation of segment size to the segment drift value and determine if we need to create a specific model for this segment or not.

Relationship between segment size and segment drift
Relationship between segment size and segment drift

Identifying days with integrity issues

The question? Which days should be excluded from retraining on principle?

Some data should be excluded from retraining on principle, namely days when we experienced data integrity issues due to some pipeline or upstream source issue. If this data is taken into consideration during retraining, it can cause our model to misinterpret the ‘normal’ distribution, which can result in a further decline in model performance. 

Top 10 day's with integrity incidents
Top 10 days with integrity incidents

Here we can see a bar graph of the days with data integrity incidents. This lets us quickly identify ‘bad’ data that we should exclude from the next retraining.

Smarter, leaner retraining

Retraining isn’t free. It takes up resources both in terms of training runs and your team’s focus and efforts. So anything that we can do to improve the probability of finishing a retraining cycle with higher-performing results is crucial. That is the value of data-driven retraining with production insights. Smarter and leaner retraining that leverages model observability to take you from detection quickly and effectively. 

Ready to get started with Superwise?

Head over to the Superwise platform and get started with easy, customizable, scalable, and secure model observability for free with our community edition.

Prefer a demo?

Pop in your information below, and our team will show what Superwise can do for your ML and business. 

Everything you need to know about AI direct to your inbox

Superwise Newsletter

Superwise needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our privacy policy.


Featured Posts

Drift in machine learning
May 5, 2022

Everything you need to know about drift in machine learning

What keeps you up at night? If you’re an ML engineer or data scientist, then drift is most likely right up there on the top of the list. But drift in machine learning comes in many forms and variations. Concept drift, data drift, and model drift all pop up on this list, but even they

Read now >
Everything you need to know about drift in machine learning
July 12, 2022

Concept drift detection basics

This article will illustrate how you can use Layer and Amazon SageMaker to deploy a machine learning model and track it using Superwise.

Read now >
Concept drift detection basics
Data Drift
August 31, 2022

Data drift detection basics

Drift in machine learning comes in many shapes and sizes. Although concept drift is the most widely discussed, data drift is the most frequent, also known as covariate shift. This post covers the basics of understanding, measuring, and monitoring data drift in ML systems. Data drift occurs when the data your model is running on

Read now >
Data drift detection basics