Rishabh Mehrota is currently Director of Machine Learning at ShareChat and instructor of Personalized Recommendations at Scale on Uplimit (formerly CoRise). Previously, he was a tech lead at Spotify and worked on machine learning research at various companies and universities. Rishabh recently sat down with Uplimit co-founder Sourabh Bajaj to discuss how he leverages user behavior data and insights to build complex recommender systems.
The following excerpts from their conversation have been edited and condensed for clarity.
Sourabh: Rishabh, one thing I’m excited to hear your thoughts on is domain understanding. When you're learning ML, you learn about all these mathematical techniques. But when you go to apply those techniques, the problems and solutions are often very domain-specific. How do you approach domain understanding and how do you think it impacts your success in this field?
Rishabh: I think a lot of people start developing machine learning models just for the love of the math. They try to throw an ML model at a problem without actually taking the time to understand what the problem is. And most of the time that doesn’t get you the results you want. You find that if you don’t understand the problem domain, you can’t formulate the problem very well. Especially if you’re working with a user-centric system, you have to be very thoughtful about the objectives you’re optimizing for. When you get the objective wrong or take shortcuts in understanding user intent, you can end up with some pretty bad impacts.
But if you spend time understanding the different parts of the problem domain — users, content, suppliers, etc. — that’s going to give you wins. There's an example I ran into when I was at Spotify that I think illustrates this really well. Imagine you have a user who’s streaming music and they keep skipping songs. The immediate conclusion you might draw from that is that they don’t like any of those songs. But that’s only true if their intent is to listen to songs all the way through right now. Maybe their intent is different — maybe they’re creating playlist for a party later tonight, so they’re sampling songs to add to the playlist. If you’re not thinking about those nuances in user intent, and interpreting that data in the context of what you know about this user, you’re going to make some bad decisions in terms of the content you recommend to them down the road.
Sourabh: That’s a great example. My next question is, how do you think about data collection, especially when it comes to content and user activity? There’s just so much data you could collect and use — what are some systems that have worked for you?
Rishabh: There's a big problem in our industry as a whole in that log data is very biased by whatever your model is currently optimizing for. So if you want to try out a new idea, there’s potentially going to be a big difference between what you see in working with your log data and the actual impact of what you’re trying to evaluate. The log data isn’t going to give you any real insight. You might want to find out if your new idea is going to surface some niche content to the right user, but your current model sucks at handling that content, so you just don’t have enough data.
This is one place where randomized data can be really powerful. If you have, say, a one percent test collecting randomized data, you can start to see some nuances. You’ll see situations where you would have shown tail content, or more popular content, or more or less discovery. That can give you a way to dive deeper and extract useful insights. It’s worth it to collect that randomized data, and in the same vein, it’s worth it to spend time developing good offline online correlation. That gives you a lot to work with.
Sourabh: I have one follow-up question to that, which is that there are a lot of companies that don’t have that much data, probably because they don’t have many users. How do all these recommendations apply in that case? Or is there a different approach you recommend?
Rishabh: That's a great question. I'm not sure if I'm well-placed to answer it, because I’ve mostly worked with massive datasets. I will say that if you don’t have that much data, it’s even more important to have domain insights. That gives you more options and a better foundation to work with whatever data you do have.
For example, if you know enough about your domain, you can encode that domain in a structural causal graph. Then you have the foundation for something called the do-calculus, which was developed around 2014 by a computer scientist named Judea Pearl. It’s basically three causal rules that allow you to extrapolate about variables without actually having observational data. It’s not the easiest method to follow, and to be honest I spent a lot of time during my PhD trying to understand it. But if you can find a course on it, I’d sign up for that.
Sourabh: Speaking of your PhD — I’m curious, given all the new developments in ML today. If you went back to do your PhD again, what would you work on?
Rishabh: If I had to do my PhD again, I would work on microeconomic theory. That might be a little surprising given my background, but it ties in if you think about the economic impact of recommendation systems at a societal scale.
For example, think about a food delivery company like Postmates. Those companies are generating a lot of revenue. How much of that revenue goes to a specific restaurant? That’s largely decided by the company’s homepage recommendation model. And is that model being developed with input from people who understand its economic impact on society? Maybe, maybe not.
The recommendations community has evolved to be very user-centric, but we need to wake up to the fact that it’s not just a user-centric world. This whole system has to be sustainable for creators and suppliers, too. So we have to start thinking about how to design multi-objective, multi-stakeholder systems. How do we really create an economic ecosystem, and support it for the long term, using the right methodology?
The other thing I’d want to look it is making recommendation systems more globally inclusive. That’s a really hard problem. It’s hard enough to build one good recommendation engine; how do you build dozens of recommendation engines across different languages? How do you account for cultural differences in user motivations and how users perceive and act on recommendations? Do you even have enough content in certain languages for the engine to produce meaningful results? There are so many nuances, and we need to be thinking about the next billion users coming onto these platforms — how can we be more inclusive to them?
Sourabh: We've talked a lot about the importance of multiple stakeholders. So — one model or multiple models? I’m curious where you land in that debate.
Rishabh: I think we’re kidding ourselves if we think these complex user systems can be governed by one model. I don’t think you can ever just have “one” of anything, honestly. You need to account for multiple stakeholders, multiple metrics, multiple tasks, multiple objectives. We’ve seen what happens if you don’t have all that nuance. If you just optimize for clicks, you get clickbait, you get polarized content. There's a lot of bad things that happen when you over-trivialize these problems. Personally, I just don't want to live in a single-model world.
From the individual user perspective, I can also provide a much better experience if I can pivot and make multiple predictions and really personalize your journey. When you come onto my platform, I want to immediately ask — what’s your intent? Maybe you generally like popular content, but is that what you’re looking for right now? Should I show you long-tail content instead, based on your behavior or what you’re typing into the search bar? How long are you going to stay? If it seems like you’re going to stay for a while, maybe I can defer this ad I was going to show you. I would much rather be making all of those predictions as I’m taking in information on the fly. I don’t know how we could live with one model, one prediction.
Sourabh: I’ll go ahead and take a question from the audience here. Let’s say you’re starting from scratch with a new user — what are your thoughts on just asking them directly for their preferences or any other data you need to start feeding them recommendations?
Rishabh: I love this topic. One way we’ve approached this at Spotify and ShareChat is that if you’re a new user, we show you some topics, some creators, some artists, and you choose which one you want to consume. And then that kicks off a kind of dynamic conversation. You select a few options, and the rest of the list dynamically changes. We’re trying to adapt in real time so we get as much information as we can from this one set of interactions.
I think there's a lot of value in asking users in various forms. At Spotify, we did a lot of in-app surveys, trying to understand what our users were trying to do and whether they were happy with their experience. At ShareChat, we do a lot of user studies with both consumers and creators. But we’re also looking at implicity indicators. Maybe a user never directly tells us they’re unhappy, but we’re seeing their dwell time decrease and their session success decrease. Then I know they’re not happy.
At the end of the day you want to have both. There's going to be some explicit feedback, wherein you are explicitly giving a thumbs up or thumbs down, or explicitly selecting something to tell us your preferences. And then there’s going to be the implicit feedback in your behavior and the way you interact with the app. You need both to get a full picture. Often you’re going to get a lot of insight just by observing users, and then you’ll formulate a hypothesis based on that, and then you’ll start asking direct questions to drill down and verify your hypothesis. All of that data is going to guide your ML model journey.
Sourabh: You've been passionate about teaching and sharing knowledge, which is of course amazing. I would love to hear more about what got you into teaching.
Rishabh: One reason I love to teach is that it’s a way to make sure I really understand the nuances of whatever I’m working on. You have to understand something really well to be able to explain it in a simple way. Another reason is that I can see how much machine learning systems are impacting people, and I would so much rather live in a world where the people who are impacted understand the systems. It’s the responsibility of people in my position to make it easier for everyone to understand this technology, because otherwise we can’t have the kind of thoughtful regulation and policy development that we need.
I also think there are some huge gaps in the way machine learning is taught, and I’m motivated to help bridge those gaps. For example, when I went from doing my PhD to working at Spotify, I was thrown into real-world evaluation and it was such a nightmare. There was no course I could have taken as part of my PhD to understand evaluation metrics and the nuances of feature engineering. And you can’t make a model work in the real world without those things. I have so much respect for companies like Uplimit that are trying to provide people with more hands-on real-world experience. It’s in everyone’s interest to upskill students and employees on these topics so they can convert their theoretical machine learning knowledge into actual applications that will impact millions of users.