Skip to main content
Experimenting with Aampe opens the door to a more personalized, thoughtful way of connecting with your customers. Because Aampe adapts to each individual rather than relying on one-size-fits-all campaigns, experimenting within this new model can feel a little different at first. This guide is here to help you understand what to expect, why the process works the way it does, and how to get the most out of your testing experience.

A Fundamental Difference

Traditional CRM systems operate through campaigns: fixed messages sent to defined audiences at specific times. Each campaign typically has its own random holdout group, and attribution is straightforward because all treated users receive the same message simultaneously. Agentic AI systems like Aampe operate differently. Instead of campaigns, they create personalized message sequences for each user. This level of personalization is great for customers, but creates difficulties for traditional measurement techniques. Path Dependency and Its Implications In true 1:1 personalized systems like Aampe, actions become path-dependent. Each message influences the messages that follow depending on how it performs. This creates several measurement challenges for an individual message:
  • Random Assignment: Recipients and non-recipients of a given message have been chosen strategically, not randomly.
  • Cumulative Impact: Traditional A/B tests measure the immediate response to a message. Personalized sequences build impact over time as each message learns and improves from prior messages.
  • Treatment Timing: Measuring impact is difficult when some users receive a message on a Tuesday morning, while other users receive the same message on a Saturday evening.
  • Learning Phases: Aampe agents learn in real-time from watching users respond to different messages. It can take several customer interactions to discover strong patterns. Messages at the start of a sequence are not typically as impactful as messages later on.

Implications for Testing

Standard campaign-level holdouts lose meaning when users receive different messages at different times. But 1:1 personalization is exactly that, different messages at different times. So rather than evaluate individual messages, we need to step back and evaluate the message-generation process. Instead of asking, “does message A beat message B?” we start asking, “does an adaptive, learning system outperform static rules?”

This requires longer test periods, different group structures, and metrics that capture cumulative rather than immediate effects.
What does this look like in practice? Learn how to set up your first holdout test in Aampe.

Experiment FAQs