Back to ML Talks

Zoumana Keita

Machine Learning Engineer
Lincoln

Tell us a bit about yourself, your background, where you work, and what you do there. 

My responsibility involves working closely with the consulting department to assist and support the sales team. Because we don’t have direct interaction with the client, we work with the sales department to support them with their business development processes for different customers. We have customers in various fields, like banking, retail, etc.

This support involves different tasks. The first one consists of training them because not a lot of business people come from or have a background in data. I help them acquire the minimum knowledge in data literacy needed for their day-to-day work. The second part consists of developing use cases to demonstrate our expertise in data science and machine learning based on innovation and research methods. The goal is to help the business team show the clients what we can do in order to show our strength and expertise in those fields.

I am currently a machine learning engineer at Lincoln, working in the innovation department. Previously, I worked as a data scientist for an AI startup based in Paris, and before that, I was a machine learning consultant at IBM. I hold a bachelor’s and a master’s degree in computer science with a concentration in artificial intelligence and data science.

My team has also implemented several use cases, mainly about natural image processing and computer vision, and very few in the field of time series.

I also like to write articles about data science. I relax by running, and I’m also part of a Tae Kwon Do club where I train four times a week, which helps me keep my feet on the ground.

Why is machine learning important to your business, and what does it help you achieve?

Take the amount of data we have nowadays; many companies are trying to utilize that data for a competitive advantage. As a company or industry, we should try to take advantage of such an opportunity. Leveraging the sheer amount of data available together with current computing abilities, companies and industries can leverage machine learning to stay ahead of the competition.

What disadvantages would you face if you weren’t using machine learning?

One of the drawbacks would be the latency of understanding my data and making the right decisions because the sooner we discover trends in the data, the sooner we can explore new ideas and make the right decisions. Working on the big data scale, the human intervention needed to understand that data would be very time-consuming. Also, these manual interventions most often do not bring value to the business activity and thus need to be automated. 

We had an insurance use case where the goal was to help insurance companies quickly process requests from their clients. The key problem was that agents were taking too much time on tasks such as checking the completion of the documents required to process those requests, while their main goal was to focus on activities that make those agents efficient, which is to quickly process data based on the request of the clients. Using machine learning helped to automatically identify the type of request made by a given client so that it can be sent to the right department within the insurance company for quick processing. 

How is your company structured, and who oversees what aspect of the machine learning process?

My team works end-to-end on machine learning with the business team. We work with the business team to better scope the projects we’re working on. Because building models without a good understanding of the business goal is a complete waste of time. We also have a data engineering team that is responsible for data acquisition. Both data engineers and the data science teams are responsible for building those models and deploying them into production.

We try to involve everybody in the whole machine learning process. Because being able to develop the machine learning model is one thing, but maintaining it in production is a whole different story. 

Now we’re involving everybody — data scientists, data engineering, and the machine learning team — to be a horizontal set of people who are not only focused on their area of expertise. 

For example, the data engineers are not only responsible for acquiring the data, but are also involved when it comes to the skills to maintain the model into production. That’s the philosophy we’re trying to shift to currently.

What does that look like in practice? Have you developed any best practices along the way?

We’ve implemented some functional templates explaining the best practices of working on machine learning products within the company. This includes all the steps from scoping the business goal to building, deploying, and training the final users of the products.

Who is responsible for the models once they’re in production?

That’s a team effort. Everybody is involved, including the business team, but not as much as technical teams. When the model is deployed into production, many things need to be taken into consideration to better maintain its performance. Some of those things include retraining the models in order to be aligned with the business goal. Understanding those things will help us understand what kind of strategy we can capitalize on in order to find the right monetary process. That goes along with acquiring the right data and finding the right metrics to track, etc. 

Because when the business goal changes, the metrics, and KPIs used previously to deploy the model into production might also change. 

Those are some of the things we have to monitor, and everybody is involved in the process.

Once the model is in the real world, what is important to you to monitor, and why?

We don’t only monitor the model’s on performance, such as AUC, F1 Score, etc. We also monitor the system that is hosting the model. 

For instance, let’s consider a machine learning model that generates predictions in a few milliseconds. If that model predictions’ latency starts to increase, say from milliseconds to seconds, we might conclude that the issue is coming from another part of the system, not the model itself.  That’s why it is important to have a global overview of the system that is hosting the model in order to understand where the drawback is coming from.

Another issue could be the model drift, data drift, or both. Another important aspect is model explainability, because it may not initially be a business requirement to have explainable models. But later on, they might come back with the requirement of making the model explainable. We always have to be really aware of what’s being done and what’s state of the art to quickly adapt and make things explainable.

What are the primary metrics you use to determine model health?

The metrics change depending on the machine learning task we’re working on. But we mostly track the F1 score and AUC score. There are other metrics, like accuracy, but we don’t use them often because of the imbalance that we might have in the data sets. Using accuracy might give a huge bias to the model. 90 percent of the time, we use AUC score, because most of the tasks we’ve been working on are also classification tasks.

What does your resolution process look like? Are you a free agent, or do you need to get other teams involved (depending on the issue) to solve issues?

When it comes to maintenance, it’s not only about the machine learning model. When we deploy the model into production, it’s integrated into other applications. What we have to do first is understand the other applications the machine learning model has been integrated into. What we do then is processed into an iterative matter to understand, for instance, performing A/B testing to see what kind of problems we might encounter if we ignore it. Let’s say the model depends on two other applications. We have to evaluate the consequence of maybe shutting down the one application and then evaluate the consequence on the remaining system, etc. Once we have that global overview, then we’re able to put into process the right maintenance strategies.

The machine learning engineers are most involved in that, even though the whole team — data science, data engineering — is also involved. But the machine learning team is most involved in the maintenance aspect.

What keeps you up at night? What monitoring issues are challenging to detect and resolve?

I think the challenge is to put accurate models into production and keep the performance high over time because the model’s performance changes. So the main goal is to be able to maintain the industrialized model as accurately as possible.

The second aspect would be the model’s explainability. That’s something I try to keep tracking. Being able to deploy models that are explainable and that are not biased, based on the quality of data we have. Another one of the challenges is the quality of that data. Being able to get quality data and develop transparent and explainable machine learning.

We have a dilemma when it comes to choosing machine learning models. Most of the time, we use deep learning models, which are very accurate but not easy to explain. And on the other side, we have simple machine learning models, like decision trees, which might not be as accurate as deep learning models but are easy to explain. So the goal is to build models with the right balance between performance and explainability.

In what ML areas are you investing the most heavily to improve in over the next two years?

Mainly natural language processing. Most of the clients we’ve been working with have a lot of textual data. That’s also something I have a lot of personal interest in. Those things sometimes lack interest, especially when it comes to low resources languages. Today, most machine learning models, in terms of NLP, have been developed using English languages. If we take low-resource languages like my mother tongue (Bambara), we don’t have much of a data set available in order to create accurate models. Researchers have been working a lot on that, but I think there is so much to keep working on.

What aspect of the industry, either an opportunity or a risk, isn’t getting enough attention?

From my personal experience and point of view, I think that the field of healthcare is lacking a lot of attention. Most people aren’t willing to make their data available so that researchers can work on building relevant machine learning models. There is lots of sensitive data and lots of regulations have to be put in place to convince people to make that data available. If we have accurate and responsible machine learning models, that will benefit society.

I think people are just scared. Some people consider machine learning something that people with bad intentions could use to collect personal information. This could be why people aren’t willing to make their data open. 

What career advice would future you give to your past self?

I would say remember your business goal because it’s how you make an impact and create value. We develop models based on what the business is expecting. And the second piece of advice is to ask questions. It’s only by asking questions that you’ll get answers. Also, there are so many people that will be more than happy to answer and help you out.

Start simple. Think of simple solutions before you get into using deep learning models. Some things might seem complicated at first glance, but if you take time and try to understand and use some simple models, you might find a global vision of how to approach a model in the long run. Sometimes simple statistical analysis can be better than complicated machine learning models.

Also, never be ashamed of saying you don’t know. It will save a lot of time not only for you, but also for the people you’re working with.

Finally, if you learn something new, teach it to someone else with your own words. You don’t have to wait until you get a Ph.D. to start sharing your knowledge with others.