Monitor, test, and improve your wizard

Candu Actions is currently available to select customers. If you're interested in wizards for your team, contact us.

Building a wizard is the first step. Once it's live, the work shifts from "does it work in my head?" to "does it work for real users?" — and the only reliable way to answer that is to look at what's actually happening.

There are two ways to find issues with a wizard:

Monitoring — watching what real users actually do, after the wizard is live
Testing — running structured scenarios you've designed yourself, before and after launch

You need both. Monitoring shows you problems you didn't anticipate. Testing catches problems you can anticipate before users hit them.

This article walks through how to do both, how to diagnose what's going wrong, and where to make changes.

Monitoring: the Activity view

Once your wizard is live, go to Actions → Activity. Every wizard run is recorded here.

Each row shows you one conversation: who triggered the wizard, what they asked for, how many actions ran, and the run's status. Click into any row to see the full conversation — every message, every action call, every payload.

This is the single most important tool for improving a wizard.

Read the statuses first

Every run lands in one of three buckets:

Completed — The wizard reached its goal. Useful for confirming what good runs look like.
Abandoned — The user left before completing. Usually friction: too many questions, unclear steps, slow progression.
Failed — Something broke. Usually action errors, missing context, or completion criteria that couldn't be evaluated.

If you're not sure where to start: look at failed runs first, then abandoned, then a few completed ones for comparison.

Diagnose a single run

Click into a conversation. You'll see the full back-and-forth, every step the wizard moved through, every action the AI called with the payload it sent and the response it got back.

For most problems, this view tells you exactly what went wrong. A few patterns to look for:

The wizard called the wrong action. Check the booster — usually the issue is that the instructions weren't specific enough about which action to use, or the available actions for the step included one that shouldn't have been there.

An action returned an empty or unexpected result. Look at the response payload. Is the data shape what the wizard expected? An empty response isn't always a bug — sometimes it's the right answer, and the wizard just doesn't know what to do with it.

The wizard kept asking the same question. Usually the completion criteria isn't evaluating to true. Either the AI is missing information it needs, or the criteria is written in a way the AI can't reliably check.

The wizard moved on too early. The opposite — completion criteria too loose. Tighten it.

The user gave up mid-step. Read the conversation. The friction is usually concrete: a confusing question, a long pause while an action runs, a request for information the user can't easily provide.

Before changing anything, identify which part of the system caused the issue. Most fixes are small and targeted.

Testing: build a QA sheet

Monitoring shows you what users actually did. Testing lets you find problems before users hit them — and tells you whether your fixes worked.

The most reliable way to test is with a structured QA sheet. List the scenarios your wizard should handle, run each one, and record what happened.

A QA sheet has seven columns. You can copy our template to start, or build your own with this structure:

Prompt / Scenario — what you'll say to the wizard
Category — what kind of test it is
Expected Result — what the wizard should do
Actual Result — what it actually did
Pass / Fail
Link to Log — Link the activities page logs for that run
Notes — anything to fix or revisit

The categories are the structure. A complete QA sheet covers six types of scenarios, in roughly this order:

1. Happy path

The wizard's core use cases, with clean inputs. "Create an invoice for $500 to client Acme Corp." These confirm the wizard works when nothing goes wrong.

2. Disambiguation

Inputs that could mean more than one thing. "Create an invoice for John." Tests whether the wizard asks instead of guesses when the user is ambiguous.

3. Validation and edge cases

Inputs that are technically valid but unusual. "Create an invoice for -$500" or "a due date in the past." Tests whether the wizard catches questionable inputs before acting on them.

4. Error handling

Inputs that reference things that don't exist. "Create an invoice for a client that doesn't exist in the system" or "send an invoice when the client has no email on file." Tests the wizard's fallback behavior when an action fails or returns nothing.

5. Multi-step

Workflows that require the wizard to carry context across actions. "Look up my last invoice to Acme Corp and create a similar one for this month." Tests whether the wizard chains correctly without duplicating or losing state.

6. Duplicate and conflict

Scenarios where the wizard should notice it's about to do something redundant or contradictory. "Create an invoice that's identical to one already sent this week." Tests whether the wizard reasons about existing state, not just the current request.

A first QA sheet doesn't need to be exhaustive — start with two or three scenarios per category. Add more over time as you spot patterns in real Activity data.

Run the sheet

Work through each row in order. For each one, run the wizard, fill in the actual result, mark pass/fail, and note anything to address.

Run the sheet:

Before first launch — to catch obvious problems
After significant changes — to confirm a fix didn't break something else
On a regular cadence after launch — weekly or biweekly while iterating, then less often once stable

Save your QA sheet. Every test you've already designed is a test you don't have to design again, and the same scenarios let you compare wizard behavior over time.

The four places to make changes

Once you've found a problem — either from monitoring or QA — the fix usually lives in one of four places.

Stage booster

Update when the AI is making the wrong call, asking too many questions, missing context, or not handling edge cases. The booster is where most issues actually get fixed — it's where you tell the AI what to do.

Action scope per step

Update when the wizard is calling the wrong action, has too many to choose from, or is missing one it needs. Tighter scoping leads to more predictable behavior.

Completion criteria

Revisit when the wizard moves forward too early, gets stuck, or has a definition of "done" the AI can't reliably evaluate. Criteria should describe a clear outcome, not a click or a feeling.

Step structure

Sometimes the issue is the step itself. Consider restructuring when:

One step is trying to do too much (split it)
Two adjacent steps are doing related work (combine them)
The wizard needs to gather context earlier in the flow

A simpler step structure usually beats more sophisticated instructions.

Make one change at a time

When you spot a problem, change one thing and re-test. If you update the booster, the step structure, and the actions all at once, you won't know which change actually helped — and you'll be guessing on the next round.

Signs your wizard needs work

Watch the Activity view for these patterns:

Abandoned runs clustering around the same step — that step has friction
Failed runs tied to the same action — that action is unreliable or returning unexpected data
High turn counts — the wizard is asking too much or in too many small pieces
Users re-entering information — context isn't being carried forward between steps
The wizard getting stuck in one step repeatedly — completion criteria isn't evaluating reliably

When you see a pattern, look at three or four runs in the cluster before making a change. The fix is usually obvious once you see the same problem play out a few times.

A note on empty results

A common edge case worth calling out: in some workflows, an action returning "nothing found" is the answer, not an error.

If your wizard checks segment membership and gets back "this user isn't in the segment," that's a successful run — but only if you've told the AI in the booster what to do with that result. Otherwise the AI gets stuck or calls more actions trying to find data that isn't there.

Whenever an action might legitimately return nothing, the booster should tell the AI what to do in that case.

Improving a wizard becomes part of the normal workflow once it's live:

Run your QA sheet before launch and after meaningful changes
Check Activity weekly to spot patterns in real runs
When you find a problem, diagnose with the conversation detail
Make one focused change, re-test, and watch the next batch of runs

Send Candu Data to Make via Webhooks

Build your first wizard

Set up actions for your wizard

Build a wizard with AI