Understanding the impact of your experiments is an essential part of running an end-to-end A/B test.
For every experiment that you run, you can monitor results in real-time and track conversion goals against your own product events. Candu’s statistical engine simplifies data analysis for you and helps to speed up the time it takes to run an end-to-end experiment. 🚀
Bayesian A/B Testing Method
Before we dive into Candu's results page, here is an overview of how our statistical engine works, so you can better understand how results are calculated. (If you want to get into the weeds, you can read about the technical side of our statistical engine here).
Methodology: The two main types of A/B testing methods are Frequentist and Bayesian. Both methods allow you to compare different versions of content to see which one is better. They differ in their approach to calculating and interpreting results. Candu uses the Bayesian method.
How it works: Users are randomly split into different groups for each version of content or control group. We do not have any prior knowledge or historical data to use, so we use an uninformed prior (default value selected) - essentially a starting point of "I don't know" how this is likely to perform.
What it tells us: With the Bayesian method of A/B testing, instead of saying, "Yes, this version is better" or "No, this version is worse", results are expressed as a probability distribution. We use probabilities to measure the likelihood that one version is better than another. For example, you can say, "There is an 85% probability that Version A is better than the control group, but we can't be 100% sure."
What are the benefits of the Bayesian method over the Frequentist method?
Small sample sizes: Bayesian methods can be more reliable when you have relatively small sample sizes (not many users to test on). In some cases, Frequentist methods struggle to provide confident results with limited data. Bayesian methods handle small sample sizes better and give you more reliable insights. For in-product experiments and for B2B businesses, having reliable results with small sample sizes is very important.
No peeking problem: With the Bayesian method, results are valid at any point and will not be skewed by unusually large results. For example, if most upgrades are for $10, but a single upgrade is for $10,000, this will not unduly skew the experiment results. (The peeking problem can occur with the Frequentist approach when you check the intermediate results for statistical significance between the control and test groups and make decisions based on your observations.)
Probabilities improve over time - As new information comes in, your probabilities will continue to be refined. This means you can keep learning and adapting your decisions throughout your experiment.
Results interpretation: Probability-based results from Bayesian testing are often easier to understand and explain to non-technical stakeholders. Rather than communicate results with p-values and statistical significance, you can say, "There's a high chance that the new design will attract more users."
Informative results: Because Bayesian A/B testing provides probabilistic statements instead of just a simple 'yes' or 'no', this means you get a better understanding of how likely it is that one version is better than the other. Saying, "There's a 90% chance that Version A is better than Version B," gives you a more nuanced and informative answer.
The Candu Results Table
Once you have launched your experiment, you can click on the 'Analytics' tab of your experiment and start tracking results in real time.
Below is a rundown of our results table and how to interpret the metrics!
Rows: All of the goals you have set for your experiment will appear as a new row so you can measure the conversion rate and impact for each.
Columns: Each version/variant and/or control group will appear in a new column so you can compare results across all options.
Users: This is the total number of your users who have viewed each version.
❓What percentage of users did the conversion action that you wanted them to do? How much revenue was generated by how many users? How many clicks occurred by how many users?
You can track different conversion metrics - Counts, Conversion Events, or Revenue generation - so the conversion value may be calculated slightly differently depending on how you set your goals.
For each, you should be able to see the conversion value as a % and the raw numbers used to calculate the %.
Count: the total number of event counts/total users who did the action
Conversion: the total users who converted/ total users who viewed the content
Revenue: the total revenue generated/total users who contributed
Chance to beat control
ℹ️ This is what you might call statistical significance.
❓ If we roll this version out, how likely is it that the variant will perform better than the control/version X?
Chance to beat control tells you the probability that the variation is better. Anything above 95% is highlighted green, indicating a very clear winner. Anything below 5% is highlighted in red, indicating a clear loser. Anything in-between is grayed out, indicating the results are inconclusive. If that's the case, there's either no measurable difference, or you haven't gathered enough data yet.
Risk of choosing X
❓ If we roll out the variant and it performs worse, how much worse can we expect it to be?
Even if the chance to beat the control or winning version is high, there is still a probability that it will be worse. The loss lets you decide whether to accept the risk posed by the chance to beat the control/version.
Anything below 0.25% is highlighted green, indicating the risk is very low and it is safe to call the experiment.
Percentage change, the impact of the changes, as well as the understand the uncertainty in our results. They show that our sample data might not perfectly reflect what's happening with all users, e.g. the Peeking problem mentioned above.
In this field, there are two values: the percentage change compared to the control variant and the confidence interval.
The wider the confidence interval, the more uncertainty we have about the true value of the whole user base. The narrower the interval, the more confident we are in our results.
For example, let's say the percentage change in Version A is 25%, and the confidence interval is from +/-5%. This means we're pretty sure (with a certain level of confidence) that the true conversion rate in the whole user base falls somewhere between 20% and 30%.
To support the results table, you will also have the ability to view graphs over the length of your experiment that highlight the following:
Conversion rates for each version
Total content views and unique content views per version
Any interactions that have occurred in your content