MLflow & Superwise integration

Superwise team

May 12th, 2022 min read

May 12th, 2022

min read

Superwise - MLFlow Integration

Part I: Connecting Superwise with MLflow

MLflow and Superwise are two powerful MLOps platforms that assist in the process of managing ML models’ training, monitoring, and logging. The two systems have different capabilities: MLflow Experiments is mostly good for tracking metrics, while Superwise offers a more deep and comprehensive analysis of your models and data. This post describes the concepts of Integrating MLflow with Superwise. Check out our documentation on how to integrate and our end-to-end tutorial notebook.

–

Machine learning development can become disorganized very quickly, and to try to address this issue, different tools for each phase of the machine learning lifecycle have been and still are being developed. MLflow is one of the popular libraries that helps organize the process of machine learning development. It consists of 3 main components: Tracking, Projects, and Models; This post is only focusing on the Tracking part.

MLflow Components
MLflow Components

MLflow Tracking provides both an API and a UI that help to visualize metadata related to training sessions. For each experiment, you can log and store: hyperparameters, metrics, artifacts, the source code used to build the model, and the trained model. With a few lines of code, you can quickly compare and view the output from multiple training sessions to quickly have an understanding of which model performed the best.

Integrate Superwise with MLflow

Superwise offers a simple way to integrate with MLflow. To establish the integration, you just need to match 3 types of parameters between the two systems:

  1. Matching the names of the models.
  2. Setting MLflow Tags to match versions with Superwise.
  3. Use MLflow to track the experiments:
    1. Automatic collection of loss and parameters will continue to be done with MLflow’s auto logging.
    2. Specific logging of Superwise calculated metrics will be sent as custom metrics.
Integrate Superwise with MLflow
Integrate Superwise with MLflow

Model name

In order to be able to reference the models identically on the two platforms, it’s necessary to make sure that the names of the models are the same. To ensure that, define a global name for the model, then pass the model name both to Superwise and MLflow as this snippet shows:

# Setting up global names
model_name = "Diamond Model"
 
# Using the model name in MLflow’s experiment
mlflow.set_experiment(f"/Users/{databricks_username}/{model_name}")
 
# Using the same model name to create a Superwise model
from superwise.models.model import Model
sw_model = Model(
     name=model_name,
     description="..."
 )

The models on both platforms will have the same name, making them easily identifiable, as the screenshots show:

Model in MLflow
Model in MLflow
Model in Superwise
Model in Superwise

Model version

Similarly to setting the model_name in both platforms, use the model version defined in the previous step and pass it as a tag to MLflow when starting the run as follows:

superwise_version_name = "version_1"
 
tags = {"Superwise_model": model_name,
  "Superwise_version": superwise_version_name}
mlflow_run = mlflow.start_run(tags=tags)

Custom metrics

During the experiment run, MLflow will easily collect some of the metrics using artifacts and logs that the flavor you are using is generating. You should also add custom metrics to the run using the log_metrics function. 

# Calculating the weighted average of feature drift from Superwise
for feature in features['name'].tolist():
 	importance = features.set_index('name').loc[feature]['feature_importance']
 input_drift = results_df[results_df['entity_name'] == feature]['value'].mean()
 input_drift_value += importance * input_drift
 
input_drift_value /= features['feature_importance'].sum()
 
# Logging the calculated metric to MLflow
mlflow.log_metrics({"input_drift": input_drift_value})
MLflow experiment tracking dashboard
MLflow experiment tracking dashboard

In your MLflow experiment tracking dashboard, the tags and metrics will appear, allowing you to both identify the models and versions as well as benefit from new types of metrics.

Part II: Detect data corruption in an ML experiment

Let’s put into practice the concepts we showed in part I and use the integration to generate more insightful experiment tracking. In our demo notebook, we build a simple ML pipeline to predict the price of diamonds based on a set of numeric and categorical features.

The notebook tells the story of a machine learning experiment that went bad due to data corruption. 

An ML team was working on the diamonds dataset and trained their first model. They logged the training data and the predictions to Superwise and used MLflow to track the experiments.  

Blog CTA Section

After a while, the team wanted to train another model, only that this time some of the features got corrupted accidentally 😱 .  Thankfully enough, the team was using Superwise together with MLflow. They synced the input_drift metric, and in the experiment page in MLflow, they could easily see that an elevation in the input drift was related to the reduced accuracy of the newly trained model.

Using this information, they ran a more comprehensive analysis of each of the features and found the culprit immediately. The feature drift for “carat” showed significantly higher numbers than the rest of the features.

Elevation in the input drift
Elevation in the input drift

Conclusion

In order to perform a deep and comprehensive analysis of the training process, it’s important to log and analyze not only loss metrics but also the data. Superwise is optimized for storing training and production data and offers a continuous analysis of crucial metrics like drift and feature importance. With a few method calls using our client – it’s easy to enrich MLflow with Superwise metrics and better understand the differences in model performance.

Other integrations you might like:

  1. Sagify and Superwise integration
  2. Datadog and Superwise integration
  3. New Relic and Superwise integration

Ready to get started with Superwise?

Head over to the Superwise platform and get started with easy, customizable, scalable, and secure model observability for free with our community edition.

Prefer a demo?

Pop in your information below, and our team will show what Superwise can do for your ML and business. 

Everything you need to know about AI direct to your inbox

Superwise Newsletter

Superwise needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our privacy policy.


Featured Posts

Drift in machine learning
May 5, 2022

Everything you need to know about drift in machine learning

What keeps you up at night? If you’re an ML engineer or data scientist, then drift is most likely right up there on the top of the list. But drift in machine learning comes in many forms and variations. Concept drift, data drift, and model drift all pop up on this list, but even they

Read now >
Everything you need to know about drift in machine learning
July 12, 2022

Concept drift detection basics

This article will illustrate how you can use Layer and Amazon SageMaker to deploy a machine learning model and track it using Superwise.

Read now >
Concept drift detection basics
Data Drift
August 31, 2022

Data drift detection basics

Drift in machine learning comes in many shapes and sizes. Although concept drift is the most widely discussed, data drift is the most frequent, also known as covariate shift. This post covers the basics of understanding, measuring, and monitoring data drift in ML systems. Data drift occurs when the data your model is running on

Read now >
Data drift detection basics