Agentic MMM: Skills, Guardrails, and the New Job of the Data Scientist

Data Science

Podcast

MMM

On the Measure Up podcast, I broke down how AI agents can build and run marketing mix models end to end: why opinionated ‘skills’ take correctness from near zero to ~70%, why deterministic guardrails beat prompts, and how stakeholders can finally just talk to the model.

Author

Luca Fiaschi

Published

June 3, 2026

I joined Jim and Simon on the Measure Up podcast to talk about agentic marketing mix modeling: what it actually means, what holds it back, and how the data scientist’s role changes once agents do the building. Here’s the episode, with a summary of the ground we covered below it.

Watch on YouTube → | More from Measure Up →

What “Agentic MMM” Actually Means

Measurement has a long history in marketing, and MMM has become one of the primary frameworks companies reach for, partly because cookie deprecation pushed everyone back toward modeling aggregate effects. A recent eMarketer survey put it at roughly 62% of advertisers already using MMM and around 90% working on it. And yet most companies still struggle to use these models in their day-to-day decisions.

Our core belief at PyMC Labs is that the real bottleneck isn’t the math. It’s the language barrier between the people who build the models and the people who have to act on them. LLMs are a kind of universal translator, so they’re unusually well suited to closing that gap. That’s what agentic data science is to me: a way to connect sophisticated models to an interface that makes them usable by everyone who needs them.

There are two stakeholders in the life of an MMM: the data scientist who builds it and the marketer who uses it to allocate budget. Agents help both. Across the end-to-end chain there’s a job for them at every step. They merge messy marketing data and run the validation checks. They fold priors and business knowledge into the model instead of fitting blindly. They keep the deployment and retraining pipeline current. And at the end they translate the model’s output into something a stakeholder understands, then answer the next five questions that get thrown at it.

That last step is the one people underestimate. The interactive back-and-forth, the “okay, but what happens if I change this?”, is where most of the value gets stuck today, because every new scenario has to go back to a human.

Why Agents Ace Code but Stumble on Models

It helps to be precise about what an agent even is. My working definition: an LLM that can act on its environment and remember what happened. A plain API call doesn’t qualify. An agent calls tools, collects the results, and adapts. By that definition Claude Code is a powerful agent: it writes code, runs it, reads the error, and iterates until the code works.

Coding is a verifiable domain. The program runs or it doesn’t, and the agent gets a clean signal either way. Data science is a different animal. Causal inference problems like MMM are only weakly verifiable, for two reasons. First, they have degenerate solutions: you can fit ten models that all match the data equally well but tell completely different stories about the world, and only one is the one you’d actually want for the business. Second, the number of choices along the way is enormous, what people call the “garden of forking paths,” where every preprocessing decision cascades into the modeling decisions downstream.

So what happens when you point a coding agent at a hard Bayesian model, say stochastic volatility, with no extra guidance? Almost 100% of the time it produces code that runs and returns an inference. Then you look closely and the fit is quietly wrong: some parameters are badly estimated because of numerical subtleties, or the agent used an outdated version of the library, or it picked a parameterization that isn’t right for that class of model. It has read every statistics textbook ever written, so it lands somewhere reasonable-looking in that enormous space of choices. Reasonable-looking, and garbage.

Skills: Opinionated Knowledge That Picks the Right Path

This is what “skills” are for. A skill is a codified, opinionated way of using a specific technique or library: the accumulated judgment about which of the five plausible ways to write a model is the right one for this kind of problem, and why.

We measured the effect directly. On some of the hardest problems, the pass rate for a correct inference (not just runnable code) goes from close to 0% without skills to around 70% with them. The skill works less by teaching the agent to reason and more by nudging it toward the right branch in that forking world of choices: if there are five ways to write the model, the one that fits this class of problem is the fourth, and here’s why.

This is the same thing I keep coming back to in our work on agentic systems for Bayesian MMM, and it’s why so much of what we build is really about encoding the right priors and modeling choices rather than chasing raw model horsepower.

Guardrails Beat Prompts: Designing the Dance

Skills aren’t enough on their own, because an agent is still a probabilistic model and it will sometimes ignore the instruction. Larger context windows help, since you can hand it longer skill files, but “usually follows the rule” isn’t good enough for a model someone is about to bet a marketing budget on.

The fix comes from the part of the definition that makes an agent an agent: it acts on its environment. So you put the check in the environment, not in the prompt. A concrete example from Bayesian modeling is R-hat, a statistic that tells you whether the sampler actually converged. Rather than asking the agent to remember to check R-hat, you add it as a deterministic callback that runs on every model the agent produces, a hook in the codebase that always fires. The agent reads that feedback and course-corrects: “I followed the best practice, but the environment says the result still looks wrong, let me pivot.”

That reframes the job. The data scientist stops being the person at the keyboard typing “no, the R-hat is too high, try again,” because that whole loop is now a script. The job becomes designing the dance between the model and the environment so it lands in the right spot, rather than dancing every step yourself. We dig into what that looks like in practice in Agentic Systems for Bayesian MMM and Consumer Testing.

From 72-Page Decks to Talking to the Model

Here’s the bottleneck everyone in this field knows. You build a good model, one with solid fit that validates well out of sample, and then you spend a week assembling a 72-page deck for the CMO. You present it, and either nothing happens, or they ask a handful of sharp follow-up questions and the team spends another three weeks answering them. By the time the answers land, the decision has moved on.

The way out is to put an agent on top of the model so the stakeholder can interrogate it directly. The head of marketing asks, “I was thinking of increasing spend in this channel, what do you think?” or “why are you telling me to put more into Facebook when this other channel has a better ROI?” and the agent reasons over the diminishing-returns curves and answers. It’s chatting with the model. I wrote about why letting stakeholders talk to Bayesian models changes the dynamic, and we showed a live version of it in our recent Ask Your MMM Anything webinar.

This sits inside a broader stack we’ve been building under the banner of Decision AI, our name for the whole field of agentic data science. It has a few open pieces: a repository of agent skills, the agent-driven model builder, and the interactive layer that lets people query a finished model in plain language. PyMC Labs comes out of the open-tools community; that’s the DNA Thomas Wiecki founded it on, and our bet is to open-source as much of this as we can. In a world where building software is cheap, a closed-source MMM platform is hard to defend. The lasting value is the expertise to tell you which solution you actually need and to stand it up for your business, so our position here is advisory rather than a SaaS box.

Agent-to-Agent, and Where the Limits Are

The step after insight is action. The last component of an agentic MMM stack is the ability to act on the recommendation, adjusting live campaigns or launching new ones, and we have prototypes of this for clients. In our Decision AI community there’s an orchestrator agent that sits across the builder and the insight agents and can push changes downstream, like trimming budget on a set of campaigns. Bring the insights, the actions, and the full context of the business together under one set of coordinating agents, and that’s the real differentiator.

I don’t think it looks like “just hand over your credit card,” though. Agents will make mistakes at scale, and there’s a reason I expect “agentic insurance” to become a real cottage industry. There’s also a structural catch: the stock-exchange analogy people reach for breaks down because the platforms keep changing the rules. The financial system doesn’t rewrite how dollars clear; Meta changes its algorithm constantly, so a curve that held historically is no guarantee at the spend level you’re confident in. And there’s a permanent misalignment of incentives between advertisers and walled gardens: the last real lever you hold is moderating the measurement signal they get, so they know how hard to push.

What Stays Human: Intent and Taste

Jim asked the question I can’t quiet down either: at what point do we stop needing the data scientist? My honest answer is that two things separate us from the agents, and I don’t see them going away. Intent: knowing what to point the agents at, what you’re optimizing for, where the boundaries sit. And taste: looking at an output and knowing whether it’s right, whether it’s missing something fundamental, or whether it’s just slop. I unpacked both of these on a previous podcast, and this conversation only sharpened my view.

Counterintuitively, I think this grows demand for data scientists rather than shrinking it. Free up the person who used to do nothing but build one company’s MMMs, and with the same headcount they can take on supply-chain optimization, or a financial model the investment team would otherwise have paid a consultancy to build. The demand for analytics is going to grow far faster than our ability to build models, even with agents. There’s always a shifting frontier: the vanilla case gets solved, so people start asking harder questions, and someone with judgment has to validate the answer.

Where This Is Going: Foundational Models for MMM

A couple of years ago a paper showed you could use neural networks to represent causal structure in marketing data and fold in heterogeneous inputs like creative embeddings, which are genuinely hard to do with traditional Bayesian models. I think the more interesting trajectory is a small angle off that one: foundational models for MMM. Train a network on very large datasets, much of it synthetic, so that when a new advertiser’s data shows up you don’t refit anything. You apply the model and it returns the most plausible causal structure, MMM-quality results from essentially a single line of code.

That would collapse the barrier to entry for the entire industry. Foundational models already exist for adjacent problems: Chronos from Amazon and TimeGPT from Nixtla do this for time series, so there’s no fundamental reason it can’t be done for MMM. Someone just has to do the research. We already have agents working on exactly this problem as we speak.

One Piece of Advice

If you’re getting into this, my tip is unglamorous: just try things and break stuff. What separates people who thrive from people who don’t is the number of repetitions they get with the new tools, and how early they understand where those tools are headed. I was nervous about giving an assistant agent access to my own digital life, so I set it up carefully. It has its own email and a few specific channels, with no access to my personal or work inboxes, and from there I started tinkering. Find your own comfortable version and get the reps in.

If you want to go deeper, PyMC Labs runs an Agentic Data Science course, which you can find on our site, along with the Decision AI community where a lot of these conversations happen. You can also reach me on LinkedIn or through my website.