Improve your app by running in-app A/B tests

  • Develop
  • Test
  • Analytics
  • Engage
  • Grow

A/B testing helps you test improvements to your app on a subset of your users so you can use data to choose the best solution for your entire user base.

Why it works

A/B testing takes the guesswork out of determining whether a change to your app’s features or content is beneficial. And, because you can test changes on a subset of your users, it helps you avoid releasing an update to all your users only to find that it has an unexpected impact.

How to do it

  1. Select a suitable A/B testing platform, such as Firebase Remote Config random percentile targeting with Google Analytics for Firebase with Google Tag Manager, and integrate it with your app.
  2. Determine the feature or content variants you want to test and how you’ll measure their success.
  3. Set up the features or content to be shown in each test variant and to those users not under test, for example:

    Scenario: New implementation of existing feature

    Example: Using bottom navigation instead of tabs to increase user engagement.

    Group What users see
    Users excluded from test Existing implementation (tabs)
    Variant A Existing implementation (tabs)
    Variant B New feature implementation (bottom navigation)
    Variant C, D, etc. (optional) Additional feature implementations (e.g. a navigation drawer)

    Scenario: New feature that creates a new metric

    Example: Listing in-app purchase items by popularity rather than price to generate more revenue.

    Group What users see
    Users excluded from test No new feature (in-app purchases aren't enabled)
    Variant A New feature implementation 1 (in-app purchase items listed by popularity)
    Variant B New feature implementation 2 (in-app purchase items listed by price)
    Variant C, D, etc. (optional) Additional feature implementations (e.g. purchase items ordered alphabetically)

    Scenario: New feature measured with an existing metric

    Example: Allowing users to mark items to increase user engagement.

    Group What users see
    Users excluded from test No new feature (marking items is disabled)
    Variant A No new feature (marking items is disabled)
    Variant B New feature implementation (e.g. mark items using a heart symbol)
    Variant C, D, etc. (optional) Additional feature implementations (e.g. mark items using a star symbol)
  4. Select the size of your test population or the duration of the test , depending on the features of your A/B test platform, with the goal of achieving a test population of at least 1000 users.
  5. Run the test.
  6. Review the test results to determine whether they are statistically significant and if any of the tested variants improved the performance of your app.
  7. Roll out the "winning" change to all your users.

Best practices

  • Select a platform that enables testing at scale. As your app and business grow, you’ll want to run more A/B tests more frequently. Make sure your chosen platform can run multiple tests in parallel on the same user population, ideally using a shared population so that a user can be in multiple tests simultaneously.
  • Test as few or as many variations as you need to make the test useful. Consider testing more than two variants if there are several useful alternative features or content options you think could offer improvements.

    Consider using a multivariate approach to define the variants. For example:

Button Text (Aspect 2)
Buy Purchase
Button Color (Aspect 1) Blue Variant A Variant B
Green Variant C Variant D
  • Run the test long enough to remove periodic variations. User behavior may vary with hourly, daily, weekly, or similar cycles. Consider this cyclic behavior when setting the duration of your test. Where behavior is known to vary over longer cycles, it may be necessary to use a shorter test period and extrapolate the results.
  • Ensure known variations between user segments don’t affect your test. If you think user behavior varies between segments of your users, run the test within one segment, or make sure you use a representative sample of all users. For example, if revenue per user is known to vary by country, test with users from one country or take a sample of users from all countries.
  • Test across multiple segments. Where you have useful, known user segments—such as country or acquisition channel—consider running the test on different segments to see if the results vary between them. You may then roll out the change to only some segments or provide different changes to different segments.
  • Consider potential business benefits when setting test duration. When setting the duration of the test or the size of the test group, and hence the time it will take for the variants to be displayed to the testers, consider whether a shorter test may have business benefits, such as realizing benefits more quickly.
  • Monitor the tests for any unexpected negative outcomes, and be prepared to halt the test. Even though the test may involve only a small percentage of your users, a very poor outcome could affect your ratings and reviews or negatively impact your other users through information shared on social media.
  • If your platform permits, roll out changes incrementally. Even though your testing may indicate a statistical benefit for making a change, there may be unexpected results when all your users receive the change. Rolling the change out incrementally allows you to monitor its effect as more users receive it and halt the rollout process if the change doesn’t provide the expected benefits.
  • Exclude opted-in users from your metrics. If you allow users to opt-in to viewing or using a new feature you’re testing, remember to exclude that user from the metrics.