This article was written by Anton Bugaev, with contributions from Juan Carlos Medina Serrano and Garret O’Connell. It was supported by the work of Bolt Marketing Technology and Performance Marketing Teams: Ameer Goh, Sabrina Craffitasari, Reimo Ärm, Marleen Kubits, Carelin Tuul, Karina Kochanovskaya, Brian Furnari, and Carlos Eduardo Trujillo Agostini.
In the previous chapter, we focused on experimentation in advertising and introduced quasi-experiments. Here, we dive deeper into the specific preparations required for counterfactual experiments.
Previously, we discussed that a counterfactual represents what our target metric (like signups, sales, revenue, etc.) would have been had we not run the ads. In technical terms, a counterfactual is typically a prediction from a statistical model. To begin our discussion on counterfactuals, let’s explore the various predictors you could use to build such a model.
Structural counterfactuals
Structural counterfactuals use historical data to identify patterns within the target variable itself. These patterns can include trends, seasonal variations (such as yearly, monthly, or weekly fluctuations), and the metric’s response to holidays, events, or previous interventions. If you accurately extract these patterns and assume they will replicate in the future, you can leverage them as a counterfactual.
Control counterfactuals
These don’t rely on the patterns of the target variable itself; instead, they use the control units whose metrics exhibit similar behaviour. Remember that in this case, only the treatment unit should be exposed to ads, while control units don’t receive any advertising. You should have a clear rationale for believing that the impact of your activity on control units is either zero or negligible.
Selecting the control units may be the most engaging aspect of the process. Analysts need to identify the units within the business that are most similar to the treatment group (greater similarity leads to more accurate results) while ensuring these units don’t influence each other. This requires a blend of product thinking and customer insights. Some examples of potential control groups include:
Geo-based controls: cities and countries
This method is among the most commonly used for selecting control groups. Since most businesses operate in multiple locations, running ads in one city or country while comparing the results to a similar location is often feasible. Most often, you can safely assume that audiences don’t overlap.
Product-based controls
This approach is particularly useful for e-commerce businesses with a diverse portfolio of products. Demand for certain products may closely resemble each other, allowing these products to effectively control one another. However, ensure there is no spillover effect; if the products are too similar, the users might learn about an advertised product but then prefer to buy a similar one. In this case, your ads might inadvertently boost sales for both.
OS-based controls
Since the introduction of App Tracking Transparency (ATT) in 2021, many advertisers have had to separate their app campaigns for Android and iOS. One effective strategy is to run ads exclusively on Android while keeping spending stable on iOS. This allows you to create a control unit based on the performance of iOS users, providing valuable insights into the impact of your ads on the Android platform (or vice versa).
Language-based controls
In countries where people speak multiple languages, running ads in one language can effectively stimulate sales on the corresponding version of your website. By targeting specific language demographics, you can create a control unit that helps measure the impact of ads in different linguistic contexts.
There is often debate about whether to use a single control unit or multiple control units (like in the Synthetic Control method).
Relying on a single control may result in less accurate predictions, but allows you to transparently check that your control unit didn’t have unexpected biases (like a reaction to an event).
Conversely, utilising multiple controls can enhance prediction accuracy or even be the only option to obtain a decent prediction. However, it may also reduce transparency in your model. There might be some bias in one or some control units affecting your results that’s more complicated to detect.
Choose a relevant training period
When training your model, it’s important to select the right period for model fitting. A common instinct might be to use an arbitrary timeframe like 5–6 months or simply use all available data (because more data seems better, right?). However, if we consider our goal — predicting the “what-if-no-ads-were-launched” scenario for a particular period — the choice of training data becomes much more strategic.
First, determine the relationship between your target variable and the control. In other words, your target variable (e.g., sales in City 1) can be explained by your control variable (e.g., sales in City 2) through a certain pattern, like: sales_in_city1 = sales_in_city2 * 5 + 100.
The essence of building a counterfactual is twofold:
Extract this relationship from historical data;
Apply it to predict the behaviour of your target during the test period.
Since your test occurs over a specific timeframe (say, 1 to 21 September), select training data where the relationship between your target and control variables is most similar to what you expect during the test period. This will make your model accurately reflect the dynamics during the time you’re trying to evaluate.
Here is one example of how you could be reasoning:
Suppose you plan to run an ad campaign in City 1 starting 1 September, with City 2 serving as the predictor for your counterfactual model. Since you’re planning this test in late August, you observe that the relationship between City 1 and City 2 sales has been stable over June, July, and August. Therefore, using these most recent months as your training period to extract the relationship makes sense.
However, you notice that in May, City 2 experienced a systematic spike in sales due to a discount campaign. Including this data would likely distort the relationship between the two cities and add biasto your model. As a result, you exclude any data from before June to ensure your model is trained on data that closely mirrors the conditions of the test period.
How long should a test run?
Many might assume that the test duration should simply match the length of the campaign, but determining the right timeframe for a quasi-experiment requires a more thoughtful approach. The goal is to find a balance between two factors:
External bias: The longer your test runs, the more likely external factors (such as holidays, competitor actions, or market fluctuations) will influence the results, adding bias to your counterfactual predictions.
Customer decision-making time: If the test period is too short, it may not allow customers enough time to see your ads, remember your products, and make purchasing decisions. Ads also typically roll out gradually, meaning it takes time for your campaign to reach its full audience and for the effects to become visible.
Make your metrics more sensitive
Often, the KPIs you aim to measure — like sales or revenue — aren’t immediately impacted by ads and don’t reflect short-term changes or the immediate effects of advertising. If that’s the case, consider focusing on an upper-funnel metric, such as installs or signups. However, you’ll need to assume a consistent conversion rate from this upper-funnel metric to your desired KPI.
Another approach is to modify your metrics to enhance sensitivity. For instance, you can use the “cohort metrics” that link early-stage actions (e.g., signups) to later-stage behaviours (e.g., purchases). In this case, you’d track whether a user purchased within a specific time frame after signing up, then attribute that purchase to the original signup date. This provides a clearer picture of how your ads influence later conversions.
Additionally, consider segmenting your users into groups with more similar behaviours. For example, you could evaluate new users separately from returning users or distinguish between local users and tourists.
Which tool should we choose to build a model?
Remember that a counterfactual is just a predictive model. So you can choose any algorithm that fits your needs, from simple Ordinary Least Squares (OLS) regression to more advanced options like XGBoost.
However, also consider the ability to evaluate uncertainty in your predictions. We’ll delve into the topic of uncertainty in more detail in a subsequent chapter, but for now, it’s worth noting that Bayesian models are particularly effective for this purpose. Some tools we frequently explore include Google’s Causal Impact and PyMC’s CausalPy toolbox.
Evaluate your counterfactual
When searching for the best counterfactual, you’ll navigate multiple dimensions — choosing between predictors, training periods, and other factors. But how do you determine which option is the best?
Test your model on historical data. You can simulate the intervention and see how well your model predicts a known period before the actual test. The simplest option is to test the model on a period just before your campaign launch. If you’re fortunate enough to have a stable, long-term relationship between your target variable and counterfactual, you might consider running multiple tests across different periods.
Sometimes, testing a counterfactual can remind A/A testing. You test whether the model predicts a significant impact during a period when you know there were no interventions. If the model shows an effect where none existed, it suggests bias in your counterfactual. In this case, you might want either to choose another control unit or wait until the relation between treatment and control units stabilizes.
Summary
In this chapter, we explored how to prepare and evaluate a counterfactual for your next quasi-experiment. In the next chapter, we’ll delve into uncertainty estimation and uncover how it ties into planning your ad budget.
Join us!
Bolt is a place where you can grow professionally at lightning speed and create a real impact on a global scale.
Take a look at our careers page and browse through hundreds of open roles, each offering an exciting opportunity to contribute to making cities for people, not cars.
If you’re ready to work in an exciting, dynamic, fast-paced industry and are not afraid of a challenge, we’re waiting for you!