Comparing Systems
When 1:1 personalization makes traditional message-level testing impossible, we need to evaluate the entire system instead.- This is true when comparing Aampe’s agentic personalization against non-personalized customer experiences
- This is also true when comparing different agentically-personalized systems, such as when A/B testing various Aampe configurations.
Group Structure
In the scenario when we are comparing agentic personalization against a non-agentic system, it is helpful to create three user groups:- No-Message Group: Receives no marketing messages. This establishes your true baseline behavior—what happens without any messaging influence. While it may feel counterintuitive to leave out some users, this group reveals which behaviors are truly driven by messages.
- Business-as-Usual Group: Continues receiving existing campaign messages through current tools. This group provides the benchmark for what you’re currently achieving.
- Aampe Group: Receives messages exclusively from Aampe’s agents. As much as possible, no business-as-usual marketing messages should influence this group. This isolation allows you to measure what the AI system can achieve when given full control.
Another consideration is the size of the Aampe group. Because agents can learn from each other, a small Aampe group is inferior to a large Aampe group. To get the most accurate comparison of Aampe and a traditional system, it makes sense to have a larger Aampe group.
User Group Assignment
Assignment should be randomly assigned to groups in a way that is easy to extend to new users who join during the experiment. The assignment is permanent and consistent throughout your system. For this example we’ll randomize over user IDs.- Data warehouse (for analysis)
- Existing marketing tools (for excluding the aampe-only and no-message groups)
- Aampe (for excluding the business-as-usual and no-message groups)
- Any other messaging systems
- Example: Business-as-usual sends coupons for a product. The product has limited supply and runs out of stock. The coupon recipients buy more of the product than usual, while the no-message group buys less of the product because it’s out of stock.
Content Requirements
For the AI to learn effectively, it needs sufficient material to work with:- Diverse Labels: Create content across multiple distinct label categories representing different value propositions. If you only provide discount messages, the agents will only learn discount preferences.
- Balanced Channels: If business-as-usual messages apply to several channels, ensure the same channels are available to the agentic personalization group.
- Comprehensive Audiences: If the goal is to learn if agentic personalization performs better or worse than another approach, the Aampe audiences should be as diverse as the business-as-usual audiences.
Message Delivery
To ensure the test is running as expected, it is key to have visibility into what messages are delivered and when.- The primary concern is that the business-as-usual messages are sent only to the business-as-usual group (same for the aampe group).
Pre-Test Planning
Before launching your experiment, establish your measurement framework:- Primary KPI: Select your primary metric before starting. This prevents post-hoc rationalization and keeps analysis focused.
- Statistical Power: Understand how long you’ll need to run the test given your metric’s natural variance and the effect size you hope to detect. Higher variance metrics and smaller expected effects require longer test periods.
- Burn-in Period: Plan to exclude an initial period from primary analysis. Early results often reflect adjustment periods rather than steady-state performance—users adapting to no messages, agents still learning preferences, etc.