An interview with MParticles Shafiq Shivji

Shafiq Shivji, mParticle
Shafiq Shivji is the Group Product Marketing Manager at mParticle, where he leads the developer experience and data integrity domains. He has over a decade of experience in product and sales engineering roles across various industries, including cybersecurity, education technology, healthcare and telecommunications.
How did you get into your current position as Group Product Marketing Manager at mParticle?
I’m a bit of a startup junkie, and as you know in the startup world, you take off one cap and put on another pretty much daily. After college, I took a few years off to volunteer at a fledgling non-profit organization, leading an early childhood education program for impoverished areas in South Asia. After this life-changing experience, I changed gears and got a “real” job in business. I have a technical background, so I started as a sales engineer and sold mobile apps and telecom solutions to pharmacies. After that, I did a stint in product management for about four years at a start-stage machine learning (ML) technology. I had an amazing opportunity to work first-hand with data scientists and ML engineers. I then took a position at Auth0 in the product marketing organization, and when Okta acquired them I pivoted and joined the mParticles product team.
Can you tell us about mParticle and how data teams are using it today?
mParticle is a customer data platform, or CDP. That term has become convoluted in my opinion and can mean different things to different people depending on who you ask. Essentially, a CDP is an infrastructure tool to enable real-time personalization use cases without requiring a heavy engineering lift to instrument and maintain data pipelines. More specifically, our core competence is to simplify data intake, association and activation. We provide simple ways to manage data pipelines and flow data to downstream destinations. Think of observability in the same way that Arize monitors model operation and performance, but on the data flow side.
Can you give me a specific customer case?
Different teams use us for different purposes. Marketing and product are the most common teams we approach. An example would be a marketer who wants to send real-time push notifications to people who have abandoned a shopping cart. Product managers typically use us for analytics and growth, and we often hear that they can’t trust the data coming in through their analytics tool. A CDP provides a way to ensure high data security and quality. For ML engineers in particular, they benefit from having clean data pipelines because it ensures that their models are usable both during training and in production. Recently, we added internal ML capabilities to our platform. Oftentimes, ML engineers are asked by marketers to produce a model that predicts what customers will do next, and that’s not something they’re usually happy to do. But now we have internal capabilities to resolve these types of requests—for example, whether to send a coupon or an ad to a customer who is likely to buy a product. For a marketer that is great because you can save money with this opportunity.
Finally, we’re a first-party data tool, which means we don’t rely on third-party data. We provide the tools to maximize the value of your own first-party data coming from your app and data stacks.
What makes it difficult for teams to build data quality pipelines?
We’ve all seen the statistics that data engineers and data scientists and others basically spend most of their time cleaning data. From my perspective, I think it’s difficult because the technology and business realities are constantly changing. Your IT team can build the perfect solution at any time—saying these are the parameters, this is what I need to do—but when business realities change, those solutions either fail or require expensive engineering resources to work, and adaptability is missed. For example, if you have a new application developer who mistakenly changes an event name in an app from “item underline purchased” to “bought”. They accidentally skipped the prefix. When that particular payload comes in, it breaks the entire data model because what was expected has changed. And it could potentially remain undetected until it affects all sorts of downstream applications, including ML models. Then comes the struggle of trying to understand what is happening, which can feel like finding a needle in a haystack. Meanwhile, the business suffers, the team is pinged to solve the problem and is under a lot of stress due to potential loss of revenue.
To summarize, due to constant changes, many companies cannot afford to invest their time or hire resources to keep adapting. A CDP enables data teams to roll out real-time data pipelines and create a system that ensures quality and enables use cases that keep internal stakeholders happy.
Why is personalization so crucial to the user experience?
Let’s start by defining what personalization means. An example is when a data engineer, after being asked by a marketer to search a database, provides a CSV flat file with relevant information – technically it could be personalization. But I think what we’re talking about here, today, is personalization in the form of real-time customer experiences. Instead of mass email, for example, this is very targeted one-to-one personalization. Take Amazon for example. They have proven that personalization works in a visceral way. When I go to their website, I always look and say, “oh, I like that, and I want that,” and that makes me buy all the things that appear on “my” website. I add one thing to my cart—like a big basketball fan, let’s say I add a pair of Air Jordans—and then magically the perfect set of socks to match those sneakers appears in the window. From a customer’s perspective, this is benign and perhaps expected. From a business perspective, you see double-digit increases in conversions and revenue.
Personalization is essential because it makes it easier for me as a consumer to discover the products I like, whether it’s content on Netflix or products on Amazon. Personalization drives sales, loyalty, brand and a better customer experience. While crucial, it is also difficult. Unless you have resources like Amazon and Netflix, it’s an even harder problem to solve. And it’s not just about having the financial resources, but also having a sophisticated data team and mature data practices.
There is a debate as to whether one should prioritize speed or accuracy for the customer when they want personalisation. Why is it so hard to have both?
I think it starts with scale. For example, when dealing with customers in a store, you can have that one-on-one experience, and you can train your staff in customer service to make sure the customer feels good—think Walmart greetings. But how do you do it digitally at scale with customers engaging with your brand using different interfaces? They can visit your store, open your mobile app or browse your web app and do all sorts of random things. Let’s say a customer uses your mobile app, but then switches to the web app. All of this data is typically siloed, making it difficult to maintain a personalized experience in real time. When we talk about personalization at scale, it takes significant engineering to stitch together data and build profiles that can then be fed into an ML model, for example, to give you accurate recommendations. The effort and ability to implement such a model and keep it running will be very expensive. Companies like Netflix and Amazon have dedicated significant resources to implementing these systems and have built tools in-house; however, other companies don’t have the same resources to break data silos and build customer profiles to unlock personalized experiences.
Can you tell us what the build versus buy considerations look like in the CDP area?
I think the build versus buy conversation starts with awareness. I think many IT teams don’t even realize that there are tools out there that can solve so many use cases. When I was at Auth0, we told developers not to focus on low-impact work—the 80% of the effort that only solves 20% of the problem. Instead, focus on core competencies and the interesting, unique aspects of your app. I use the same logic when it comes to a CDP: First, be aware of what tools are available to quickly resolve data issues, and second, ensure that efforts to build and maintain data pipelines align with your company’s core value proposition.
When companies start thinking about building in-house, they often talk about the modern data stack as the holy grail to solve all data pipeline problems. On the surface it sounds great and very promising. But when you start to dig deeper and see where it might have an impact, this approach falls short. For example, I don’t think anyone uses Snowflake for real-time use cases; and even if you try to build on top of Snowflake, what happens in cases like abandoned cart? Recently I put toothpaste in an online shopping cart because we ran out, got distracted with the kids and forgot. Two days later I was wondering why my toothpaste hadn’t arrived. If I had received a push notification about my abandoned cart, I would have made the purchase and had a better experience. When you do it at scale, imagine how many conversions are just waiting to happen! But you can’t do that unless you have the tools and invest in technical resources to build a solution and maintain it. With a CDP, you can deploy real-time infrastructure quickly and efficiently, rather than investing resources in building and maintaining data pipelines that only take you so far. Instead, invest those resources in more interesting tasks like building the next model or improving model metrics (eg PR-AUC) to have a big impact on your business.
Machine learning operations (MLOps) is relatively new and there is an increase in hiring for ML engineers and relevant machine learning certification programs. What is mParticle doing to help data teams trying to take the first step into the personalization and maturation of data pipelines?
We provide the data infrastructure layer for real-time personalization. We can talk for hours about all the business use cases you unlock when you have real-time capabilities. One thing that is relevant to MLOps professionals is the ability to build ML models and use them in production with real-time data. There is nothing more frustrating than implementing a model that nobody uses or that was built on data that you realize was corrupt, incomplete, or broken in some way.
When discussing data maturity, it all starts with developing a data strategy that outlines how a company wants to use data to achieve specific business goals. I see many organizations that do not have a sophisticated data strategy in place. Instead, each team is left to create its own strategy. When it’s time to activate data, teams spend countless hours stitching together data from different silos and sources. A CDP makes an immature team mature for two reasons. First, we’ve been doing this for almost a decade, and we have a lot of in-house expertise. We work with big name companies such as Burger King and Airbnb, and have real-world experience of how to maximize the value of customer data. Second, we have tools to help you solve low-hanging problems for immediate impact without requiring too many resources or having mature teams in place. Many of our customers approach us with one problem in mind, such as broken data pipes, which cannot move the data from here to there. When they use mParticle, they see its overall value and the impact it can have and end up using it for all sorts of other use cases.