Better budgeting with Bayesian models: Bolt’s story with PyMC-Marketing
Jun 30, 2023
This article was written by Carlos Eduardo Trujillo Agostini, Bolt’s Data Analyst.
Imagine you’re on a hike. Your goal? To conquer the mountain of marketing analytics. Your equipment? Tools such as PyMC-Marketing. The trail? That’s the path we’ll follow today — navigating the rough terrains of media mix modelling (MMM), exploring the capabilities of PyMC-Marketing, and questioning the role of attribution models. Your companion? Bolt’s experience in the field.
Together, we’ll aim to demystify the age-old question: “Which half of my advertising budget is really making a difference?“
As we start our ascent, let’s get familiar with the map. Ready for the climb?
What is media mix modelling (MMM)?
MMM is like a compass for businesses, helping them understand the influence of their marketing investments across multiple channels. It sorts through a wealth of data from these media channels, pinpointing the role each one plays in achieving specific goals, such as sales or conversions. This knowledge empowers businesses to streamline their marketing strategies and, in turn, optimise their ROI through efficient resource allocation.
With PyMC-Marketing at hand
Open-source tools are the unsung heroes of our journey. One of them is PyMC-Marketing, boasting an API designed for Bayesian MMM. This API uses a regression model to interpret the contribution of media channels on a target variable. The model accommodates transformation functions that account for lingering effects from past advertisements (adstock or carry-over effects) and decreasing returns at high spending levels (saturation effects).
By doing so, PyMC-Marketing gives us a more accurate and comprehensive understanding of the influence of different media channels.
What about attribution models?
Think of attribution in marketing like a detective story. In this story, you have a conversion or a sale, and that’s your ‘whodunit’ — something good happened, and you want to know who or what to credit for it.
Attribution is like the detective — a method used to identify which marketing activities, or ‘suspects’, contributed to this outcome. It’s all about tracing the customer’s journey and determining which marketing touch-points they interacted with before purchasing.
It can get tricky, though. Say a customer saw a TV ad, clicked on a social media post, and got an email before finally making a purchase. Should the credit go to the TV ad, the social media post, or the email? That’s where different attribution models come into play, setting rules for how credit for sales and conversions is assigned to touch-points in conversion paths.
For instance, a simple model might give all the credit to the last touch-point (in this case, the email), while a more complex model might split the recognition between all the touch-points. The challenge lies in deciding which model paints the most accurate picture of what’s driving your conversions. But by the end of the day, attribution measures contact, but not the genuine drive of your advertising channel in achieving conversions (the actual incremental value).
At first glance, attribution models seem handy when dealing with multiple digital advertising channels like Google, Facebook, or other social media platforms. However, these models only decipher half the enigma, as their application is limited to digital channels. What happens when you have channels like TV, radio, print, etc.? Do we have to forget them?
Moreover, the phasing out of the cookie has made it exceedingly difficult, if not outright impossible, to trace a single sale (or user signup) to a specific advertising channel.
Hence, how do we genuinely gauge the effectiveness of all channels and smartly allocate our future advertising budgets? That’s where media mix models take the stage.
Driving Bolt’s success through PyMC-Marketing
Our MMM journey at Bolt unfolds in three stages:
First, we construct a model teeming with control variables specific to our business, typically considering trends and seasonality at various levels and other marketing activities;
Next, we put the model to the test evaluating it with some source of truth built by regular A/B tests or quasi-experiments derived from Google’s Causal-Impact;
Finally, with our model in sync with the real world, we contrast different response curves to decide where to allocate our budget best to amplify the target variable.
Creating MMM with PyMC-Marketing
When you start creating MMM with PyMC-Marketing, the first step would be to obtain the necessary data in the correct format. In this case, you need to have the information by date (you can choose if it should be daily, weekly, or monthly). Each channel you need to analyse must be present as a column, and the variables you want to use to control.
How should my DataFrame should look? A quick example
On the following DataFrame, x1 and x2 are channels (which could measure impressions or spend on your media platforms). The target variable would be our y, and the information would be split by week. We found a better fit in the models with weekly information, even more than with daily information. Still, this can change based on the vertical and industry.
What variables do you find in our real datasets to generate these models?
The target column is y, representing our goal variable: sales, conversions, or any other marketing outcome we track. The date column ds helps us to monitor the time factor. Our Channel columns encompass different platforms, such as Facebook, Google, and TikTok, each split between Android and iOS. This granularity enables us to measure the effectiveness of each marketing channel across various operating systems.
Importing the DelayedSaturatedMMM class, the heart of our media mix model, we’ll handle the delayed and saturating effects common in advertising. We must define the required data columns, including the target, date, channel, and control columns.
After defining our model, we sail into the fitting stage, where the model learns from the data. Here, we use `target_accept` and `chains` as examples.
There are also other parameters you can fiddle with, such as tune (the number of iterations to adjust the proposal distribution for optimal sampling) or draws (the number of sample draws), among other parameters. Each of these can influence your final results, so it’s vital to understand their roles.
However, it’s essential to remember that our specific settings — a target_accept rate of 0.95 and 6 chains — were tailored for our example and worked well in our context. They don’t represent a one-size-fits-all solution. Depending on your specific needs and data, you might need to tweak these parameters to suit your unique circumstances, emphasising the adaptability and flexibility inherent in PyMC’s design.
By calling mmm.fit(), the model applies a Markov Chain Monte Carlo (MCMC) method to estimate the posterior distribution of the model parameters.
So, we’ve got our model set up, but are we good to go? How can we ensure that it genuinely mirrors the nuances of our data? This is where the validation process comes into play.
Checking model results with experimentation
By employing the method compute_channel_contribution_original_scale, we obtain a DataArray. This object is computing the channel contributions in their original scale. Next, we apply a mean aggregation operation to this returned DataArray, providing us with the average response of each channel relative to their spending.
The outcome presents us a scatter plot — the cost (on the x-axis) and its corresponding response (incremental impact on the y-axis) mapped out in a 2D space. This visual guide can then be cross-checked against our experience (Experiments).
You’ve probably noticed the graph’s legend carries vital information that we still need to unpack. Let’s delve into some terms we commonly encounter at Bolt when analysing response curves:
Plateau Point: Picture this as the point where each additional euro we spend ceases to add extra value to the contribution.
Optimal Point: This is akin to the “elbow” of the curve, the point in the 2D space where the curve’s direction changes most noticeably. It partitions the curve into two segments: one, where we can achieve results at a “premium cost”, and the other, where we can reap cost-effective results but in smaller quantities. This point is a handy reference on the response curve graph, indicating the point at which we might pay a reasonable price per acquired user. Although “reasonable” can be subjective and dependent on several factors that can initially be too optimistic, the curve dictates this value in a very general way.
As evident, our recent experiments align closely with the model curve. This comparison aids in understanding how well our model corresponds with reality. But what happens when our model doesn’t reflect the real world?
Aligning our model to reality 🔬
It’s no secret that media mix models, in their enthusiasm to provide a comprehensive view of all conversion drivers, can initially be too optimistic. Without proper calibration using accurate data, they might lead us to assumptions that diverge from reality.
Acknowledging this, the creators of PyMC have ingeniously devised a way to incorporate this prior knowledge right into the model training process.
Usually, you could use any kind of distribution to generate this information. Right now, we create beta distributions centred around the contribution values that the channel gets according to our experiments. These prior alpha’s and beta’s values are stored in two lists, determining the shape of our distribution for that channel.
In this case, for channels like Google, we firmly believe that it should be contributing around 0.05, and we try to convey to the model that this should be its distribution. However, for other channels like TikTok, we’re not so sure, so we give a more sensitive and loose distribution that the model does not necessarily have to strictly follow.
Yes, I’m sure you have several questions.
How do you perform valid experiments for comparison with PyMC Marketing?
How do we define the values of alpha and beta that we add to the model?
Why are we using Beta Distributions?
How do we determine that these priors actually help and don’t bias our model?
If you want to learn more about this, stay tuned for the upcoming articles we’ll publish on Bolt Labs.
In the meantime, we can confirm our different experimentation methods, sometimes through conversion lift or A/B testing. Still, often, it’s not possible to run proper A/B tests if you work with new users (because you cannot split users to test and control), and there are tracking issues as well. We’re fans of quasi-experimentation and heavy users of Google’s CausalImpact.
Suppose you’re not entirely familiar with the topic. In that case, I recommend reading my latest article on Medium, where I talk about this and use a similar method to analyse marketing actions.
From models to real-life decisions
Equipped with data about how each channel reacts to our investments, we’re all set to strategise our budget allocation to garner maximum returns. We can start painting a picture of the ideal distribution by showcasing the results of each channel on the same graph, accompanied by their respective projections.
In a perfect world, we’d always have the budget to spend up to the optimal point for each channel, but reality often pens a different story. With a finite budget, we may need to sacrifice some gains in lesser-contributing channels to amplify the impact of the heavy hitters.
We must also weigh our model’s projections — these are unvalidated predictions at the outset. Depending on our confidence in the model, we must decide whether to bank on these projections or trust only the reality within our grasp.
At Bolt, our approach is all about resourcefulness. That’s why we’ve tailored specific strategies for each scenario, dictated by the market’s unique demands where we’re allocating the budget.
Without diving too deep, we’ve designed budget distribution functions that adapt to each specific objective we aim to hit, not just about maximising the total contribution of the target variable.
Picture the following — sometimes, you aim for market growth, cost notwithstanding. At other times, sustainability becomes your winning hand, requiring you to make the most of the resources you have. Occasionally, you may need to prioritise quick profitability, and the constraints for each scenario vary, influencing your budget allocation strategy accordingly.
You have a robust advertising budget of 15M euros to expand in the upcoming month. Typically, your advertising team might hustle to utilise the entire budget, fine-tuning the channel to squeeze out maximum value. However, this time around, you have a trusted ally in the form of an MMM with PyMC. You choose to trust its projections for the plateau point and minimum channel spending to thoughtfully distribute your substantial budget.
As you can see, when we have such a large budget, the model decides to spend the maximum possible on each channel before exceeding the plateau point — this is far from the optimal overall value. Still, it’s the appropriate value to have the maximum contribution.
With a smaller budget, such as 10,000 euros, the model will adjust to spending only on those channels that maximise the contribution, even if it means stopping total spending in others.
We open a pull request on PyMC-Marketing to bring a similar budget allocation model (which can be customised based on the user’s needs) to a great open-source library we use at Bolt. If you believe you have things you can contribute, we invite you to do so and collaborate directly in their repository.
Wrapping up 💡
PyMC has earned its place among Bolt’s treasured toolkits, thanks to the malleability it offers in crafting models perfectly suited to our needs. Along the same vein, PyMC-Marketing offers a compelling edge in devising flexible yet standard MMMs, serving both as a springboard and a muse for our analyses.
Armed with this tool, and with prudent calibration and essential reality checks to trust its outcomes, we’ve successfully automated our decision-making process, giving rise to more data-driven actions.
We’re thrilled to hear your comments! You can contact me directly or my teammates in Bolt’s Martech team.
We offer a unique opportunity for individuals to learn and develop while making a meaningful impact on millions of people across the globe in a hyper-growth environment.
If you feel inspired by our approach and want to join our journey, check out our vacancies. We have hundreds of open roles, including Data and Marketing positions to help us work on our ads experiments.
If you’re ready to work in an exciting, dynamic, fast-paced industry and aren’t afraid of a challenge, we’re waiting for you!