Measuring Customer Usage Retention Rate (CURR) lift through A/B testing involves a disciplined approach to experimentation, focusing on understanding user behavior and the long-term impact of product changes. While the term "CURR lift" may not be universally familiar, the principles of measuring retention lift are well-established and critical for sustainable growth

Core Principles for A/B Testing Retention

Successful A/B testing for retention begins with a clear understanding of the experiment's purpose and careful design:

Define a Clear Hypothesis and Goals Experimentation is a discovery tool that helps avoid biases and gather quantitative data . A hypothesis should outline how a product change will alter user behavior or outcomes, informed by existing data or insights . For instance, a problem like "too many people open the app and don’t request a ride" can lead to a hypothesis that "passengers don’t request rides because they don’t see enough cars on the map" . It's crucial to define what questions need answering and the simplest way to answer them confidently .

Design the Experiment and Measurement Strategy Clearly identify the success metrics for retention lift and potential tradeoff metrics that might be negatively impacted . This includes defining the experiment's trigger, targeting, eligibility, assignment criteria, and response metrics . For retention, key metrics often include daily active users (DAUs), D7 retention, and D14 retention .

Ensure Statistical Significance and Power Calculate the Minimum Detectable Effect (MDE) to determine the smallest improvement that isn't due to chance, which helps set the experiment's runtime . Low statistical power can cause real effects to be missed . It's important to run tests long enough to achieve statistical significance and avoid "peeking early" at results . For early-stage companies with lower traffic, achieving statistical significance can be challenging, sometimes making it more practical to launch changes and monitor pre/post data if the expected impact is substantial .

Implement Guardrail Metrics For every metric optimized, pair it with another to address potential adverse consequences, preventing short-term gains at the expense of long-term customer satisfaction or brand value . These are business-critical metrics that should never significantly decline, such as app crashes or app store ratings . For example, a change in subscription button copy at Duolingo increased revenue but decreased new user D1 retention, leading to the experiment being killed to prioritize long-term trust . Net Promoter Score (NPS) can also serve as a long-term guardrail .

Strategies for Measuring Retention Lift

Retention is often considered one of the hardest metrics to move and can take time to manifest

Focus on Early Proxies and Correlated Behaviors Since true long-term retention takes time, identify and experiment on early proxies that strongly correlate with future retention, often occurring early in the customer's journey . For example, at Thumbtack, optimizing for "the number of customer requests that had a certain number of quotes" became a key proxy for retention . Similarly, instead of directly measuring long-term retention for a single feature, focus on behaviors and "setup moments" that correlate with it . Facebook's "10 friends in 7 days" was a setup moment correlating with long-term retention .

Long-Term Measurement and Holdout Experiments Many experiments showing short-term lift may not result in long-term incremental lift . To understand long-term effects, implement holdouts, such as holding out a percentage of users from seeing a new change for an extended period, or running a 50-50 split for new users and tracking the original cohort for a year or more . At Shopify, a 5% general holdout and 50/50 splits for new merchants allowed for quick shipping while tracking long-term Gross Merchandise Volume (GMV) and merchant activation .

Beware of the "Pull Forward Effect" Short-term wins can sometimes just pull revenue or success forward in time without generating additional long-term value . An experiment at Udemy that increased day-of conversion by 7% on a success page only increased total session revenue by 2% because users were less likely to browse afterward .

Practical Examples of Driving and Measuring Retention Lift

Gamification and Engagement: At Duolingo, a leaderboard system, adapted from a game, led to a 17% increase in overall learning time and tripled highly engaged learners . Badges also proved to be a significant win for retention .

Notifications and Email Systems: These are powerful tools for driving engagement. Pinterest built a system that logged user preferences for email delivery and ranked content based on likelihood of engagement, dramatically improving engagement . Duolingo found that optimizing notifications significantly impacted D1 retention . However, aggressive A/B testing of notifications can lead to users opting out, destroying the channel .

Onboarding and Activation Flow Improvements: Improving the initial activation experience directly impacts long-term retention . At Airtable, changing the onboarding flow to encourage document uploads led to higher activation and monetization rates . Pinterest doubled its activation rate through dozens of experiments, simplifying the experience and customizing initial content recommendations .

Core Product Improvements: Retention is often driven by reducing friction in the core product . At Grubhub, improving restaurant variety, lowering minimums, and burying guest ordering increased retention .

Behavioral Psychology and Messaging: Duolingo A/B tested emotional cues, like the number of tears its mascot cried, and growth mindset messaging, which showed exciting results in retention .

Identifying "Setup Moments": For Gojek, users who used the app for transportation during the morning rush were more likely to retain, indicating an "aha moment" tied to urgent, reliable service .

Tools and Frameworks

A robust experimentation framework is essential, whether built internally or using third-party tools

. Tools like Optimizely, Amplitude, Eppo, and Statsig are commonly used

. For in-app experiments, a system to bucket users and integrate experiment IDs with data analysis tools like Segment and Amplitude is beneficial

. The TARS framework (Target Audience, Adoption, Retention, Satisfaction) can help evaluate features by focusing on proxy metrics rather than direct, isolated retention measurements

Common Mistakes and Best Practices

Avoid Overbuilding and Over-testing: For discovery, experiments don't need to be perfect or infinitely scalable . Not every change requires an A/B test; if a feature is clearly better or required, it might be more efficient to launch and monitor .

Clean Experiments: Limit the difference between test and control to only the variation being tested, avoiding multiple layered experiments .

Culture of Experimentation: Make experiment results transparent across the company to foster learning . Any code change or feature introduction should ideally be part of an experiment .

Focus on Directional Impact: Especially for early-stage companies, focus on clear, directional impact rather than fake precision, as over-engineered attribution models can break down .

Avoid Data Theater: Be wary of collecting data and creating dashboards that don't improve decision-making quality .

In conclusion, measuring retention lift through A/B testing requires a strategic approach that balances rigorous experimental design with an understanding of long-term user behavior. By defining clear hypotheses, focusing on early proxies, implementing guardrail metrics, and embracing a culture of continuous learning, organizations can effectively identify changes that drive sustainable customer usage and retention.

I want to measure CURR lift via A/B testing. Can you give practical advice and real examples?

Core Principles for A/B Testing Retention

Strategies for Measuring Retention Lift

Practical Examples of Driving and Measuring Retention Lift

Tools and Frameworks

Common Mistakes and Best Practices