Back to ML Talks

Maxim Khalilov

Head of R&D
Glovo

Can you share a bit about your current use cases?

For a little less than a year, I have been the Head of R&D, at GLOVO. I focus on managing the machine learning platform team, as we are creating a platform so that our data scientists don’t have to care about anything except for modeling. When I joined, a significant percentage of the engineering department was not here. Machine learning has been the biggest application in the logistics department, because in our business, efficiency is one of the main differentiators. It is also a starting point for customer satisfaction, carrier hourly and weekly earnings, and for the profitability of the company.

And that’s why there are lots of different use cases for machine learning in the food delivery industry. For example to predict how long it takes for a carrier to move from A to B? From a restaurant to the delivery address? Or the estimated time to prepare food for a particular order, a specific time of day, and a precise restaurant? Also, how does that look best for guests? And of course, we have lots of different applications, which are more generic for e-commerce: recommendations and sorting, different models related to growth.

What are your operational challenges? How did Glovo manage to scale its use of AI?

Initially, it was done on an ad hoc basis. Models would be running on a local or cloud server. And that’s it, there was no visibility into how these models are, or even have a process to train these models periodically, to see if something happens with the data at the input or how the output of the model fluctuates. In addition, there was no standard well-oiled process for machine model productization which led to delays and errors.

So, all of these considerations led to the idea that we need a machine learning platform which would serve multiple data scientists within the company with standards of service, with a kind of reusable tool, to help them be much more efficient.

While we have the platform already, we are at the beginning of the journey. Almost all the models that we have are hosted on the platform, and we are still missing quite a lot of functionality. Just to give you a more concrete example: We have adopted open-source solutions for training, hosting, serving… Given our size, we realize that we need something more scalable and built specifically for ML.

That’s why we started investing heavily into the model serving part of the platform. So we installed our own Kubernetes cluster, which allows us to share resources, to be more agile, more flexible, and have more visibility into the models. If something goes wrong, we immediately get an alarm. And we can fix it. On the training side, it’s more of an ongoing effort because we use Jenkins. The problem is that it’s not designed for machine learning so we miss quite a lot of functionality, and that’s one of our main focus right now.

The nearest priority in terms of time is the monitoring. Because we don’t have enough visibility into the technical characteristics of the models, but primarily on what happens with the data, how the data flows through our pipeline, and most importantly, how our model behaves, and how it reacts and changes in the data.

So what we want to have is basically two levels of defense: the first level is for the data validation, before we feed it into the model, and the second is related to the deeper analysis of the model input data (features) and its changes over time. And it’s not just that, we also want to understand what the influence on the output on the online model is. If one of the features starts fluctuating, we need to determine how bad it is.

Where in the organization does the data science platform engineers sit?

In my team right now, I don’t have any data scientists, we are all software engineers, and of course it creates a little bit of a cultural gap between us and our customers who are data scientists by training. Currently we have a team in between: data scientists who have experience using the machine learning platform who do know many aspects of software engineering, and they are helping us with triaging the new cases. They consult the data scientists on the use cases that can be used for their tasks, and they also educate the data scientists on the machine learning platform and on some other technical aspects.‍

What is your opinion on the “full stack data scientists” persona?

Our world is steadily changing towards a much bigger need for a full stack data scientist. It’s driven by different aspects: if we look for example at MLOps, as an analogy to DevOps, the data scientists are supposed to be responsible for the full cycle of data science. Starting from modeling in the notebooks then putting it to production, and afterwards operating the models.

Operating machine learning models doesn’t mean looking at model performance only, but also looking at some technical metrics, which requires the software engineering skills.

On the other hand, hiring only data scientists with software engineering knowledge is a utopia as many brilliant data scientists simply don’t know this stuff, but still can deliver lots of value by creating great models and doing analysis. I believe that there is room for both of them.

How is the business involved with the live predictions?

Usually, “business” articulates the requirements. So once the first version of a model is developed, it’s time to decide on the next priority: either invest in developing a new model or iterate on this particular model. There is another option: when you open Glovo, looking for deliveries, you will see the actual ETA appear for the users to know how long it should take to get the food delivered. This is an end-to-end model. If we want to increase the precision of this model, we need to split it into smaller pieces: how long will it take to get from a restaurant to the place where the courier is? and from the restaurant to the user’s door? How long will it take to find parking? How much will this delivery time change depending on the weather, or maybe on some local event? And all of these models can significantly increase the quality of the final prediction.

That’s also the kind of requirements that the business can articulate: “we need more accuracy, but we also need more granularity. Or even, we care much less about accuracy than about explainability”

How did you manage to stay on top of things during the pandemic?‍

The majority of the machine learning models in the world stopped working properly. At Glovo, the problem became less relevant over time, because we collected more and more data, and retrained the models more often. What we’ve seen from the platform perspective is an increase and a change in demand: people started to order more or they started to order at a different time of the day.

What do you think will be the next big trend in the world of ML?

We’re living in the deep learning world which is full of opportunities for us, because we sit on quite a lot of data. One of the ideas that we are trying to push right now is to create our own system for internal embeddings: it can describe an order or a particular customer. Once you create this embedding level, it can be used by multiple teams for various models which can speed up and standardize the way models are created in a pretty significant way.. And of course, the expectations in terms of accuracy are higher from deep learning than from regular shallow models.

On the automation side, looking at feature stores will be probably our next big plan. I still don’t know whether we will be building it ourselves or looking at some external solutions.

We need it for two purposes:

(1) Because it will guarantee the same level of processing in training, and at the inference level so that the models don’t get unexpected inputs.

(2) It will enable data scientists to start training models in a much easier way.

As an AI practitioner, are there any types of failure that are keeping you up at night?‍

There is lots of interesting work in the machine learning space, about different biases. One of them that sticks out for me is the “excellence bias”. For example, when I was a PhD student, we used quite a lot of data to train, improve and evaluate machine translation systems. A lot of data was produced by professional translators. But, most probably when a user sends an email with a complaint, they don’t use the same language as translators. And your model trained on the professionally created data will expect the same input and will become inefficient once it’s different. So this “excellence bias” is an unsolved problem, not just for NLP tasks, but for many other machine learning tasks. I’m looking forward to seeing what would be a solution for this.