Back to ML Talks

Aviv Ben Arie

Staff Data Scientist
Intuit

Hi, I’m Aviv, a Data Scientist and certified pastry chef (who is also gluten-free, an interesting and challenging combination).

I am currently working at Intuit as a Staff Data Scientist. Our products, including TurboTax, QuickBooks, Credit Karma, and Mint, are used by millions of consumers, small businesses, and self-employed customers in their day-to-day lives, at work, and at home. Our company’s mission is to power prosperity around the world.

Before joining Intuit, I was a Lead Data Scientist at PayPal. At both companies, I’ve specialized in fraud prevention and security models. I also have a rich background in cyber security from my army service as well as at the Israeli Prime Minister’s Office. This helps me create better models which are based on domain knowledge and not only on my machine learning skills.

Congrats! You are amongst the happy few who have scaled their use of AI. Please share your main operational challenges and best practices.

In my opinion, the key to our success in implementing AI across Intuit product lines is tight collaboration within our business segments between product managers, data scientists, data analysts, and ML engineers (MLEs) as one Intuit team.

Since we have a common data science mission across the company, together, we take the time to deeply understand the problems customers face and carefully plan how we intend to solve them with machine learning before we approach the technical design.

We work closely with internal stakeholders throughout the research phase to make sure that we are optimizing our models while using the correct metrics and setting suitable expectations.

Our business partners have a good understanding of what machine learning can offer, and we collaborate on the prioritization of different tasks that require our attention. This ensures that our models will be utilized to solve high-impact problems.

In your organization, who is in charge of assuring that the models in production behave as they should?

At Intuit, we are all accountable for the models we deliver – the data scientists are in charge of monitoring model and feature distribution, MLEs ensure that the production environment is stable, and the business tracks the impact on their KPIs. We work as one team to make sure that all aspects are covered, and indeed we are able to identify and correct issues quickly, should they come up.

The“Full Stack Data Scientist” is an approach claiming that the Data Scientist is the owner of the model from development to maintenance in production. What’s your view on that?

I had the pleasure of hosting a round table on this important topic at the recent Women in Data Science (WiDS) TLV 2021 conference. In my round table, we discussed the concept of “Full Stack DataScience” and whether or not we think this is the right direction for our industry. We examined real job descriptions and noted the vast skill set required from data scientists these days – from project management, data analysis, and engineering to MLOps and, of course, ML theory and practice.

Our conclusion was that this concept of having a one-person show, although tempting, has many drawbacks that need to be considered.

I proposed an alternative that I strongly believe, in educating our non-data science colleagues on data science and machine learning, thus creating diverse teams with a common understanding of the basics.

This means each person can bring their own expertise and skillset to the table. At Intuit, we’ve put this into practice by providing data science training opportunities at different levels to any employee interested in building this skill set.

2020 was about COVID-19 and drifts. How did it affect your models in production? How did you manage to be on top of things?

COVID-19 put countless people in a position of ill financial state, endless stress, and fear for what the future may bring. Especially throughout COVID-19, running or starting a business can feel like a free fall with no safety net. So our company’s mission to help “power prosperity around the world” has been especially pertinent throughout the pandemic over the past year.

In addition to this, we developed AI-driven solutions that were quickly incorporated into Intuit’s products. One problem we were able to solve is that some Intuit customers in the U.S. had problems with their tax refund because they mistyped their bank account numbers. We applied a deep learning algorithm that predicts the likelihood that the bank account number is invalid. This solution prevented the delay of over $140M in tax refunds this past tax season.

Additionally, since most companies do not state their industry, we previously developed a US-only ML service, which imputes the business industry by analyzing its financial documents. When the UK and Canadian governments asked Intuit to help them assess the financial impact of the COVID-19 pandemic on different industries, we expanded the machine learning model to include businesses outside the US and managed to do so within days and with high accuracy.

Do you have current needs for xAI? How do you see this trend impacting your work?

Explainable AI is one of my top personal interests, and I think this is an important field for any data scientist to have good theoretical and practical knowledge of in their practice. Understanding model decisions helps us in our day-to-day work to debug and optimize our models and understand model decisions.

At Intuit, we strive for the right level of transparency in our AI – taking into consideration the intended use of the AI and its impact on customers.

What are the main technology trends that are most likely to impact your work in the near future?

As AI is integrated into more and more domains and products, I assume we will be working to solve new and challenging problems using ML and data science in the future. I personally don’t believe that AutoML-type solutions are going to be taking over the industry, as most of the models we create require more than just running an automated pipeline. Domain knowledge, correct data handling, business understanding, etc., are all critical to the success of our models. However, parts of the process could be automated to speed things up and enable teams to support more use cases.

What are the main improvements and investment areas planned for your AI/ML activities in the next two years?

We experiment and develop with AI and emerging technologies to create a platform that is evening the odds for customers while upholding our ethics to deploy data for good. Machine learning empowers our customers with insights so they can make data-driven decisions.

Nowadays, there is more data being created than ever before, and this data is the fuel to the fire of data-driven algorithms.

We’re also pioneering different methodologies in predictive intelligence. We are one of the first tech companies to research time series modeling using confidence intervals, which enables us to make predictions for our customers with a higher rate of accuracy. Using natural language processing, we’re also humanizing our technology, so people interact more naturally and make better decisions about managing their money.

As we keep hearing about more and more “AI fails,” what’s keeping you up at night?

I have to say that thankfully, throughout my career as a data scientist, I haven’t experienced many “AI fails.” That is not to say that there aren’t challenges – of course, there are. But we’ve been able to overcome most of these challenges quickly and come out stronger and wiser (and with better monitoring systems).

In general, the fact that no model is 100% accurate (or at least no real-world model) is something that must be taken into account when incorporating AI into products. This can be mitigated by adding decisioning layers – manual or automated – which can be taken together with the model score to optimize performance even for the toughest edge cases.