Understanding the impact of your experiments is essential to running an end-to-end A/B test.

For every experiment you run, you can monitor real-time results and track conversion goals against your product events. Candu’s statistical engine simplifies data analysis for you and helps to speed up the time it takes to run an end-to-end experiment. 🚀

Bayesian A/B Testing Method

Before we dive into Candu's results page, here is an overview of how our statistical engine works so you can better understand how results are calculated. (If you want to get into the weeds, you can read about the technical side of our statistical engine here).

Methodology: Frequentist and Bayesian are the two main types of A/B testing methods. Both methods allow you to compare different versions of content to see which one is better. They differ in their approach to calculating and interpreting results. Candu uses the Bayesian method.

How it works: Users are randomly split into different groups for each content or control group version. We do not have prior knowledge or historical data to use, so we use an uninformed prior (default value selected) - essentially a starting point of "I don't know" how this will likely perform.

What it tells us: With the Bayesian method of A/B testing, instead of saying, "Yes, this version is better" or "No, this version is worse," results are expressed as a probability distribution. We use probabilities to measure the likelihood that one version is better. For example, you can say, "There is an 85% probability that Version A is better than the control group, but we can't be 100% sure."

What are the benefits of the Bayesian method over the Frequentist method?

Small sample sizes: Bayesian methods can be more reliable when you have relatively small sample sizes (not many users to test on). In some cases, Frequentist methods struggle to provide confident results with limited data. Bayesian methods handle small sample sizes better and give you more reliable insights. For in-product experiments and B2B businesses, having reliable results with small sample sizes is crucial.

No peeking problem: With the Bayesian method, results are valid at any point and will not be skewed by unusually large results. For example, if most upgrades are for $10, but a single upgrade is for $10,000, this will not unduly skew the experiment results. (The peeking problem can occur with the Frequentist approach when you check the intermediate results for statistical significance between the control and test groups and make decisions based on your observations.)
Probabilities improve over time - As new information comes in, your probabilities will continue to be refined. This means you can keep learning and adapting your decisions throughout your experiment.

Results interpretation: Probability-based results from Bayesian testing are often easier to understand and explain to non-technical stakeholders. Rather than communicate results with p-values and statistical significance, you can say, "There's a high chance that the new design will attract more users."

Informative results: Because Bayesian A/B testing provides probabilistic statements instead of just a simple 'yes' or 'no,' this means you better understand how likely one version is that it is better than the other. Saying, "There's a 90% chance that Version A is better than Version B," gives you a more nuanced and informative answer.

The Candu Results Table

Once you have launched your experiment, you can click on the 'Analytics' tab to start tracking results in real-time.

Below is a rundown of our results table and instructions on interpreting the metrics!

Rows: Each of the goals you have set for your experiment will appear as a new row, allowing you to measure the conversion rate and impact for each.
Columns: Each version/variant and/or control group will appear in a new column, allowing you to compare results across all options.
Users: This is the total number of your users who have viewed each version.

Conversion Value

❓What percentage of users did you want them to do the conversion action? How much revenue was generated by how many users? How many clicks occurred by how many users?

You can track different conversion metrics—Counts, Conversion Events, or Revenue generation—so the conversion value may be calculated slightly differently depending on how you set your goals.

For each, you should be able to see the conversion value as a % and the raw numbers used to calculate the %.

Count: the total number of event counts/total users who did the action
Conversion: the total users who converted/ total users who viewed the content
Revenue: the total revenue generated/total users who contributed

Chance to beat control

ℹ️ This is what you might call statistical significance.

❓ If we roll this version out, how likely will the variant perform better than the control/version X?

Chance to beat control tells you the probability that the variation is better. Anything above 95% is highlighted green, indicating an obvious winner. Anything below 5% is highlighted in red, indicating a clear loser. Anything in between is grayed out, indicating the results are inconclusive. If that's the case, there's either no measurable difference, or you haven't gathered enough data yet.

Risk of choosing X

❓ If we roll out the variant and it performs worse, how much worse can we expect it to be?

Even if the chance to beat the control or winning version is high, there is still a probability that it will be worse. The loss lets you decide whether to accept the risk posed by the chance to beat the control/version.

Anything below 0.25% is highlighted green, indicating that the risk is very low and that the experiment can be called.

Percentage change

Percentage change, the impact of the changes, and understanding the uncertainty in our results. They show that our sample data might not perfectly reflect what's happening with all users, e.g., the Peeking problem mentioned above.

This field has two values: the percentage change compared to the control variant and the confidence interval.

The wider the confidence interval, the more uncertainty we have about the actual value of the whole user base. The narrower the interval, the more confident we are in our results.

For example, let's say the percentage change in Version A is 25%, and the confidence interval is from +/-5%. This means we're pretty sure (with a certain level of confidence) that the actual conversion rate in the user base falls between 20% and 30%.

Graphs

To support the results table, you will also have the ability to view graphs over the length of your experiment that highlight the following:

Conversion rates for each version
Total content views and unique content views per version
Any interactions that have occurred in your content

Understanding Content Analytics

Overview of Experimentation

Experimentation Best Practices

How to set up Experiments in Candu

Understanding Analytics

Analyzing the results of your experiment