Which MMM Priors Actually Matter: A Sensitivity Analysis

Bayesian Statistics
Marketing Analytics
MMM
I ran 14 prior configurations across six parameters on a synthetic MMM with known ground truth. Most priors are inert. One of them, adstock carryover, stays borderline prior-dependent even on clean data. And a ~20% ROAS error sits underneath all of them.
Author

Luca Fiaschi

Published

May 10, 2026

There’s a familiar ritual in Bayesian MMM work. The model won’t converge. You tighten a prior. Now the posterior is implausibly narrow. You loosen another one. Now the chains are diverging. Three hours in, you have no idea which of your changes actually moved the answer, and your stakeholder is asking, again, why the model says to spend $0 on the best-performing channel.

Most of that ritual is wasted effort. The priors that feel like they should matter usually don’t. The one that quietly decides whether your model converges to a sensible answer is a much smaller target than people think. I wanted a clean read on which priors actually move a pymc-marketing MMM, and by how much. So I ran the experiment.

I systematically varied 14 prior configurations across six parameters (grouped into five families) on 104 weeks of synthetic data with known ground truth, and measured how much each one moved the posterior. The full code and notebooks are in advanced-pymc-marketing-examples; the specific analysis lives in notebooks/05_comprehensive_prior_sensitivity.ipynb.

Some of what I found agrees with the conventional wisdom. The most interesting finding doesn’t.

Why priors are doing more work than you think

Media Mix Models are an awkward inference problem. You’re trying to disentangle correlated channels, capture delayed effects that spill past the observation window, and pin down saturation curves you can’t see directly. In a setup like that, priors quietly carry three loads at once:

  1. Computational tractability. Wrong priors send the sampler chasing geometry that doesn’t exist, and you pay for it in wall clock.
  2. Identifiability. When channels move together, priors are often the thing telling the model “these are not the same channel.”
  3. Business alignment. They’re how you encode that brand has long memory and search saturates fast, when nothing in the data actively says so.

Most practitioners either lean on defaults or wiggle priors until the model “behaves.” Both routes produce fragile inference. The point of a sensitivity analysis isn’t to tune priors for accuracy. That hides misspecification under a better-looking fit. The point is to find out which priors are doing real work, so you spend your domain knowledge where it counts.

The experimental setup

I used pymc-marketing’s MMM class with GeometricAdstock and LogisticSaturation. The synthetic dataset has 104 weekly observations, 4 channels, 2 control variables, and known ground-truth ROAS. Crucially, the ground truth was generated with a Hill saturation curve while the model fits logistic. That’s a deliberate structural mismatch I’ll come back to.

# Six parameters, grouped into five families.
parameter_categories = {
    "transformation": ["adstock_alpha", "saturation_lam"],
    "effect_size":    ["saturation_beta"],
    "baseline":       ["intercept"],
    "noise":          ["likelihood_sigma"],
    "control":        ["gamma_control"],
}

# Fourteen prior configurations spanning tight / default / loose
# across all six parameters.
prior_specs = {
    "tight_saturation_beta":  Prior("HalfNormal", sigma=0.5),
    "default":                None,                            # PyMC-Marketing defaults
    "loose_saturation_beta":  Prior("HalfNormal", sigma=5),
    # ... 11 more configurations across adstock_alpha,
    #     saturation_lam, intercept, likelihood_sigma,
    #     and gamma_control
}

I quantified sensitivity three ways:

  • Coefficient of Variation (CV%): relative spread of ROAS point estimates across prior configs.
  • Prior Sensitivity Index (PSI): between-prior variance divided by within-posterior variance. PSI < 0.1 is the conventional threshold for “data dominates the prior.”
  • Posterior contraction: 1 − Var(posterior) / Var(prior). Higher means the data is overriding the prior more.

All 14 configurations converged.

One thing to flag up front: this is a deliberately simple setup. Two channels’ worth of complexity, clean synthetic data, no missing weeks, no holiday effects, no measurement noise beyond what the simulator introduces, and a model that’s close enough to the data-generating process to actually fit. The point is to find out, in the most favorable possible conditions, which priors still move the answer. Anything that matters here matters more on real data, where you have 52 noisy weeks, correlated channels, regime shifts, and reporting gaps. I’ll come back to what changes in those regimes later in the post.

What actually mattered

Category-level prior sensitivity. Only transformation priors land in the “moderate” band; the rest are negligible.

The category view: one family stands out

Parameter family Avg CV% across configs
Transformation (adstock + saturation) 1.77%
Baseline (intercept) 0.31%
Effect size (saturation_beta) 0.25%
Noise (likelihood) 0.15%
Control (gamma) 0.10%

At the category level the story looks neat: transformation priors are roughly 7× more influential than the next family, and everything below baseline rounds to noise. If you only had time for this view, your action would be to spend effort on adstock and saturation and leave the rest alone.

That captures most of the story. The per-parameter view sharpens it.

The per-parameter view: it’s really one prior

Parameter CV% ROAS range Avg ROAS error
adstock_alpha 2.8% 0.76 21.5%
saturation_lam 0.8% 0.17 21.1%
intercept 0.3% 0.10 20.8%
saturation_beta 0.2% 0.07 20.9%
likelihood_sigma 0.1% 0.05 20.8%
gamma_control 0.1% 0.05 20.9%

The category average of 1.77% for “transformation” is hiding the actual headline. adstock_alpha is the only parameter with CV above 1%, at 2.8%. Its category mate saturation_lam (0.8%) is closer in behavior to baseline and effect-size priors than to adstock.

The practical implication is sharper than “spend time on transformation priors.” The adstock carryover prior is the one prior that consistently moves the answer on this problem. If you have one hour to spend on prior specification, spend it there.

Even on clean data, one prior is borderline prior-dependent

PSI and posterior contraction by parameter. adstock_alpha sits at the prior-dependent threshold; everything else is firmly in the data-dominates region.

This is the result I underestimated when I first looked at the analysis. The notebook reports per-parameter PSI and contraction:

Parameter PSI Contraction Notebook label
intercept 0.002 0.00 Strong data (converges)
saturation_lam 0.004 0.48 Strong data (converges)
saturation_beta 0.026 0.09 Moderate data influence
adstock_alpha 0.100 0.16 Weak data (prior-dependent)

PSI = 0.1 is the threshold where data stops dominating the prior. On 104 weeks of clean, fully-specified synthetic data, adstock_alpha is sitting exactly on that line. The notebook itself flags it as Weak data (prior-dependent).

This is a benchmark designed to favor the data. There’s no collinearity, no regime shift, no reporting lag, and no measurement noise beyond the simulator’s own. Real MMM datasets typically have 52 noisy weeks with correlated channels and structural breaks, so adstock_alpha will be more prior-dependent there, not less.

Treat this analysis as a lower bound on how much the carryover prior matters in practice.

A ~20% ROAS error sits underneath every configuration

ROAS MAPE across all 14 prior configurations. The whole distribution lives in a 20–22% band; the floor is structural, not prior-driven. (This figure reconstructs the per-config MAPEs reported in the notebook; values match within 0.1%.)

Across all 14 priors I tried (tight, loose, asymmetric, deliberately wrong), ROAS MAPE landed between 20.3% and 21.9% versus ground truth. Every configuration converged. No prior tweak moved that floor.

The natural inference is that the floor is structural: the data was generated with a Hill saturation curve and the model fits logistic. I didn’t refit with HillSaturation to verify this directly, so treat the causal attribution as a strong hypothesis rather than a tested result. What is established is that no prior in this 14-configuration sweep removes the floor. If you’re seeing systematic ROAS bias in your own MMM, prior tuning is unlikely to fix it. Vary the functional forms instead.

The marketing attribution paradox

Marketing attribution across four very different intercept priors. The posterior converges to the same split every time.

One result was sharper than I expected. I varied the intercept prior from “mostly organic sales” (μ = −1) to “mostly marketing-driven” (μ = +1) to weakly informative (σ = 5). The posterior didn’t care: marketing attribution landed in a narrow 35.1–35.4% band across all four configurations, with the posterior intercept converging to 0.403–0.404 every time.

This is the part of the analysis where the data really is fully in charge. The total-sales-to-marketing split is identifiable from 104 weeks of channel and target data alone. With enough observations and the right model structure, the intercept prior simply doesn’t matter, even when the prior mean is wrong by an order of magnitude. The convergence number is reassuring about the split. It says nothing about whether the individual channel ROAS values underneath are right.

A practical playbook

Priority 1: adstock carryover (the one that genuinely matters)

This is the prior to think hardest about. CV of 2.8%, PSI at the prior-dependent threshold, the largest ROAS swing in the sweep.

if brand_focused:
    # Brand advertising has long memory
    adstock_alpha = Prior("Beta", alpha=5, beta=2)   # mean ~0.7
elif direct_response:
    # Performance marketing decays quickly
    adstock_alpha = Prior("Beta", alpha=3, beta=7)   # mean ~0.3
else:
    # Weakly informative if you genuinely don't know
    adstock_alpha = Prior("Beta", alpha=1, beta=3)   # mean ~0.25

If you have channel-specific knowledge (TV vs. paid search, brand vs. performance), encode it here. The data alone won’t reliably override a wrong choice, even on 104 clean weeks.

Priority 2: saturation point (mild but worth anchoring)

saturation_lam moves things less than adstock but more than the rest. Anchor it on the spend range you actually observe. One gotcha: pymc-marketing scales channel spend internally (default MaxAbsScaler, so each channel is divided by its own max), and saturation_lam is fit in that scaled space, not raw dollars. The prior has to match:

# Scaled "typical" spend lives in (0, 1] when using MaxAbsScaler.
scaled_typical = X[channel].median() / X[channel].max()

if expect_early_saturation:
    saturation_lam = Prior("Gamma", alpha=5, beta=2 / scaled_typical)
else:
    saturation_lam = Prior("Gamma", alpha=2, beta=0.5 / scaled_typical)

If you’ve disabled internal scaling, use raw spend instead and accept that the prior is now in dollar units. The thing to avoid is mixing the two (raw median() with a parameter that lives in scaled space), which gets you a prior off by a factor of X[channel].max().

Priority 3: effect size, intercept, noise, control (defaults are fine)

CV all under 0.35%. The data overrides reasonable priors on these without effort. Don’t spend hours tuning them.

saturation_beta   = Prior("HalfNormal", sigma=2)   # default
intercept         = Prior("Normal",     mu=0, sigma=2)
likelihood_sigma  = Prior("HalfNormal", sigma=1)
gamma_control     = Prior("Normal",     mu=0, sigma=2)

A validation protocol worth running

For any MMM informing real spend decisions, three checks are non-negotiable.

Prior predictive checks. Before fitting, sample from the priors alone and ask: do the implied ROAS values, saturation curves, and carryover patterns look plausible? If your prior allows ROAS = 1000, your prior is wrong.

mmm.sample_prior_predictive(samples=1000)

Sensitivity analysis. At least three configurations (conservative, default, optimistic), with the variation concentrated on adstock_alpha and, if you have spare cycles, saturation_lam. Compare ROAS estimates (a healthy model agrees within ~20%), convergence diagnostics (R-hat < 1.01, ESS > 400), and out-of-sample predictions on a holdout. Document which priors changed which estimates and report the range to stakeholders. This is quality control, not optional.

Posterior predictive checks. After fitting, score the model on CRPS, R², RMSE, Durbin-Watson, and visually inspect prediction intervals. Catch systematic over/under-prediction here, not in the boardroom.

mmm.sample_posterior_predictive()

When prior choice goes from “nice to have” to load-bearing

The clean-data results suggest one prior matters. Several regimes make more of them matter:

  1. Limited data (< 104 weeks). Prior influence grows fast. Lean on informative priors from previous campaigns or comparable products.
  2. High multicollinearity (channel correlations > 0.7). Priors are often the only thing separating two channels that move together (think TV + radio).
  3. New channels (< 13 weeks of history). The data can’t yet distinguish saturation from linear effects. Domain knowledge has to fill the gap.
  4. Tight compute budgets. Bad transformation priors don’t just shift estimates, they make the sampler work harder. The 10× sampling-time figure people quote is heuristic, not measured here, but the direction is right.

Where to actually spend effort

If you’re investing time in an MMM and want the best return on that time, the order is:

  1. Verify your functional forms first. The 20% ROAS error in this analysis sat under every prior I tried. The most plausible explanation is the Hill / logistic structural mismatch. Whatever the cause, no prior tweak removed the floor. The lesson generalizes: if your saturation, adstock, or seasonality specification doesn’t match reality, prior tuning won’t recover the bias.
  2. Then specify adstock carefully. It’s the one prior the data won’t reliably override.
  3. Then use sensitivity analysis as a discipline, not a deliverable. Document a range and report it.
  4. Structural hyperparameters (lag length, seasonality terms, control selection) can be tuned systematically. I wrote about doing this with Bayesian optimization in Let Bayes tune Bayes: hyperparameter optimization for causal MMMs with Optuna.

Final thoughts on automating prior sensitivity using AI

What surprised me here was not that most priors are inert. I expected that. It was that on a clean, well-specified benchmark (104 weeks, no collinearity, no noise beyond the simulator’s own) one prior was still doing real work. On a real dataset, that prior is doing more, and so are several others. Channels are correlated, spend windows shift, reporting changes, and measurement noise scales with budget. Each of those makes more priors prior-dependent, not just adstock.

Under those conditions, the honest workflow is to vary not just priors but model structures (saturation form, adstock form, lag length, control selection) and check whether the resulting decisions actually converge. If three reasonable specifications give you three different budget recommendations, the model isn’t the decision-maker. You are, and you’re guessing. Doing this by hand on every engagement is the part that scales worst. Two or three model variants is feasible. Ten is a week of someone’s time.

We built Decision Lab at PyMC Labs to take that part of the loop off your hands: an agent that runs the analysis multiple ways in parallel, checks whether the conclusions hold across specifications, and tells you what it doesn’t know when they don’t. That’s the role I expect AI to play in data science over the next few years. The judgment about whether a model is fit for a decision still belongs to a human. The mechanical work of “did we check this enough?” is what agents are good at.

So: stop tuning priors that don’t move. Tune the one that does. Verify your model structure. And on real data, where the sensitivity check is the hard part, let the agent do the work that doesn’t need you.


Full code, data, and notebooks: github.com/lfiaschi/advanced-pymc-marketing-examples. The sensitivity analysis is in notebooks/05_comprehensive_prior_sensitivity.ipynb; figures generated by generate_blog_figures.py. All numeric claims in this post are cross-checked against the notebook’s output cells (per-spec MAPE in Cell 12, per-parameter CV in Cell 14, attribution in Cell 18, PSI / contraction in Cell 20, category aggregates in Cell 24).