Agentic Systems for Bayesian MMM and Consumer Testing
I gave this talk at the Databricks Data + AI Summit on how we’re using agentic AI to change the way marketing science operates. Below is a summary of the two systems I presented. Watch the full talk here:
The Problem AI Hasn’t Solved Yet
AI is already automating a lot of marketing workflows. 71% of marketers use AI tools weekly for content generation. Personalization platforms claim 40% boosts in AOV. But when it comes to the harder questions, what should we actually market and how should we allocate our spend, we’re still lagging. A Gartner survey found that 61% of CMOs lack consumer testing insights in their plans, and 70% face challenges improving campaign ROI.
From my experience leading data teams at HelloFresh, Stitch Fix, and Rocket Internet, three things hold marketing science back:
An expertise gap. Building sophisticated models requires advanced data scientists, and 76% of data and AI professionals report a talent shortage.
An iteration lag. Media mix models and LTV models traditionally run once or twice a year because collecting data, building models, and delivering reports takes weeks or months.
A translation barrier. Business stakeholders and data scientists speak different languages. The result: important decisions get made on gut feeling instead of proper modeling.
Solution 1: The Innovation Lab (What Should We Market?)
This is a multi-agent system that rapidly validates new product concepts using synthetic consumer panels. The workflow has two components:
Agentic product ideation. A team of AI agents (supervisor, marketing expert, designer, product manager) collaborates to generate product concepts from a brief. Each agent is specialized: the designer creates product images, the marketing expert refines positioning, the product manager evaluates feasibility. The multi-agent approach produces higher quality output than a single LLM prompt because the agents refine each other’s work.
Synthetic consumer testing. The generated concepts are then shown to a synthetic panel representative of the US population. Each synthetic consumer has a demographic profile (age, income, location, occupation) and is wrapped in a multimodal model that can view product images and understand product characteristics.
What you get back: purchase likelihood scores, price elasticity curves, and estimates of market size at different price points. For CPG companies, this means you have a data point before you go to market telling you what you’ll get at a certain price point.
Validation
In our published study with Colgate-Palmolive (57 products, 9,300 unique consumers), we found a 0.7 correlation with human panels, which is 90% of what two human panels would achieve with each other. The distribution similarity scored 0.88.
The synthetic responses are twice as long as human ones on average. That’s actually useful: you get reasoning about why a product was liked or disliked, not just a number. And when we tested absurd products (a “dental microbiome fortifying elixir”), the LLMs rated them lower than real humans did, who tend to be overly polite in surveys. The synthetic consumers called out the products as gimmicky and untrustworthy.
Price sensitivity patterns also replicate: higher-income segments show higher purchase probability, and premium products show lower purchase probability than mass-market ones, both in human and synthetic data.
Solution 2: MMM Co-pilot (How Should We Market?)
The second system is a conversational co-pilot for building Bayesian Media Mix Models. The analogy: Databricks Genie lets you talk to your data (descriptive analytics). Our co-pilot lets you talk to your models (predictive and causal analytics).
The typical scenario: your CMO needs to know how to allocate next quarter’s budget across media channels to hit an ROAS target. Your data science team says “give us a few weeks.” Maybe the data isn’t in the right format. Maybe a new channel was added. Maybe the question is framed differently than the last model was built for. But the CMO needs the answer by Tuesday.
The co-pilot has three phases:
Configure. Upload data to Unity Catalog, validate it, set up MLflow tracking.
Explore. Ask the agent to build plots and do exploratory analysis. Because it has context about what good media mix modeling looks like, it proactively flags things like trend breaks or seasonality patterns that could influence results.
Understand. Once the model is fit (training runs on Databricks Workflows with GPU compute), ask causal questions: plot saturation curves, interpret adstock effects, test scenarios like “what if competition increases by X%?” or “what’s the optimal allocation given a budget cap?”
Results from Early Clients
We tested this with 10-15 clients:
- 10-50% reduction in modeling effort
- Junior data scientists can now carry the full process with minimal senior oversight
- Richer insights because the agent has domain knowledge baked in (things the team wouldn’t have thought to check)
- Clients running MMMs report 10-20% increases in ROI effectiveness, which is significant when you’re managing hundreds of millions in marketing spend
How the Two Connect
The Innovation Lab answers “what should we market?” The MMM co-pilot answers “how should we allocate spend?” The missing link: after you launch a product, you can use synthetic consumer panels to test how new media channels and ad creatives resonate with different audiences, then translate that feedback into Bayesian priors that inform your media mix model. We’re actively building this closed-loop system with clients now.
Built on Databricks
Both systems run on Databricks: Unity Catalog for data, MLflow for model versioning and diagnostics, Databricks Workflows for training, and Databricks Apps for the conversational interface. The multi-agent systems use specialized agents for code interpretation, plot analysis, and model building. We’re open-sourcing accelerators so you can deploy these on your own Databricks environment.
Key Takeaways
We’re moving past AI as task automation in marketing toward AI as a cognitive partner for strategic decisions. The combination of agentic workflows, Bayesian modeling, and synthetic consumer panels lets teams go from question to answer in hours instead of months. And the tools to do this on Databricks exist today.
If you’re interested in applying any of this, get in touch.