A/A testing

Created: 14.11.2022

Updated: 31.10.2023

Author: Polina A.

One of the methods for verifying A/B testing platforms’ user split is A/A testing. The principle of A/A testing involves creating two or more identical variations and conducting A/B testing with them to check how accurately the system works with campaigns. The success of testing is evaluated based on the uniformity of campaign results. With A/A testing, you can ensure that:

The distribution is truly random.
All data is retained.
The "P2BB" metric is reliable.

Conducting A/A Testing

Create a new campaign of the 'Custom Code' type.
Name your campaign (for example, "AA Test 1").
If you have integration with an analytical platform, ensure the data from your A/A test is passed to your analytics.
Set targeting to all users.
On the Variations page, create a new variation.
In the JS field, add the following code (you might need to add some code to pass variation impressions to your analytical platform):

console.log('A/A test variation A');

Save the variation.
Create a second variation with custom code and enter the following code in the JS field:

console.log('A/A test variation B');

After saving the second variation, select a traffic split to 50%/50%.
Use standard primary metrics (e.g., purchases). Do not change the default settings for variation stickiness (by default “Sticky for the user (multi session)”) and the default attribution window (starts after the variation is served and ends as the session ends).
Launch the campaign. The campaign will not impact the user experience. Users assigned to variations will only see a console.log message in the browser console.
In the campaign list, find the campaign you have just created and duplicate it. Do this 9 more times to have a total of 10 A/A tests. This limits the impact of false positive / false negative results of testing.
Run the test for at least a week before evaluating the results.
After the test ends and the results are analyzed, archive the campaigns.

Analyzing A/A Testing Results

Since A/A testing variations are identical, there is no need to wait for two weeks. It is recommended to run the test for a week, allow the system to accumulate data, and then evaluate the results based on the following parameters (an error in step 1 can lead to an error in step 3):

Variation Distribution: Check the number of users in each of the variations to ensure that the distribution was even.
a. OK: Each variation was shown to between 48% and 52% of users.
b. Not OK: Each variation was shown to less than 48% or more than 52% of users.
Data Collection: Compare the number of users with purchases and revenue between variations.
a. OK: The difference is less than 5% in terms of both the number of users with purchases and revenue (excluding outliers).
b. Not OK: The difference is more than 5% in terms of both the number of users with purchases and revenue.
"P2BB" Metric Value: Examine 2 metrics. For e-commerce, these are "Add to Cart" and "Purchase." In each campaign, count how many variations have a value of more than 95% in one of the specified metrics. This is a false winner, which can occur with a probability of about 5%.
a. OK: 0-2 metrics achieved statistically significant results (more than 95%).
b. Not OK: 3 or more metrics achieved statistically significant results.

FAQ

Why 10 tests? A 95% value for the "P2BB" metric is considered reliable. However, there remains a 5% chance of a false winner in A/A testing. If you run a single test that reaches statistical significance, you won't know (1 in 20 chance) if it's a real winner or a false one. Checking 3 metrics in 10 tests (a total of 30 metrics) reduces the randomness of the test.

What to do if A/A testing fails?

It depends on the metric with which the issue arises:

Variation Distribution: If the distribution between variations is not within the 48-52%, ensure the system settings are correct. Initial check: the number of users on the dashboard should match the number of users in your analytics system. If no setup errors are found, contact platform consultants.
Data Collection: If the difference between the personalization platform and your analytics platform is more than 5%, check if events are triggered correctly. 3. Events
"P2BB": If 2 or more tested metrics achieve significant results, extend the test for a second week before analyzing the results. If more than 6 metrics have values of more than 95%, contact platform consultants.

What to do after successfully A/A testing?

Simply archive all A/A campaigns and start real A/B tests!

PreviousA/B testing Approaches and Framework NextPurchase data export

Last updated 1 year ago

Was this helpful?