Automated batch testing

Automated batch testing helps you validate multiple business scenarios before launching an assistant or after changing its configuration. You can organize high-frequency questions, exception flows, edge cases, and historical failed conversations into test cases, then run and review them through test sets.

Compared with one-off text/voice testing, automated batch testing is better suited for larger scenario coverage. It helps confirm that after you update prompts, knowledge bases, workflows, or tool settings, the assistant can still answer, follow up, transfer to a human, and call tools as expected.

Before You Start

  • You have created and saved an assistant.
  • You have completed the related prompt, knowledge base, workflow, or tool configuration.
  • You have prepared the business scenarios or historical failed cases that need to be verified.
  • You have defined the judging criteria for each test case.
  • If you need to test tool calling, you have configured real tools or prepared Mock settings.

Workflow

The main workflow is: create a test set → create test cases → run tests → review test results.

Create a Test Set

Path: Assistant → Assistant configuration page → Debug → Test management.

After entering Test management, select the test type based on your testing goal, then click New test set in the upper-right corner. If there is no test set yet, you can also click New test set from the empty state.

Enter test management

New test set entry

On the new test set page, fill in the test set information:

  • Test set name: required. We recommend naming it after the testing goal, such as "Appointment flow test".
  • Test set description: optional. You can describe the business scope, applicable version, or notes for this test set.

After filling in the information, click Create in the lower-right corner.

Fill in test set information

After the test set is created, it appears on the unit test or regression test page in Test management.

Test set list

Create Test Cases

Click a target test set from the test set list to enter the test case management page. You can click New in the upper-right corner to create a test case. If there are no test cases in the test set yet, you can also click New from the empty state.

Create test case entry

On the new test case page, edit the case name, conversation content, and judging criteria.

New test case page

Case Name

The case name can be customized. We recommend naming it after the testing goal or scenario, such as "Appointment exception handling - Case 1".

Conversation Content

Conversation content supports three input methods:

  • Manual input.
  • Direct JSON import.
  • Import from call logs.

Manual input

Click User or Assistant in the lower-left corner of the conversation content area to add turns in sequence. After a turn is added, it appears in the conversation content box. You can adjust the order with the arrow buttons below each chat box.

After creating a chat box, click it to enter the conversation text.

On the assistant side, you can click the Tool icon below the chat box to add model tools to be called, such as "query time" or "query business status". This helps test the assistant's tool-calling capability.

On the user side, you can click the Knowledge base icon below the chat box to configure the retrieval result from the knowledge bases bound to the current assistant for this conversation.

Manually edit conversation content

Direct JSON import

Click JSON in the upper-right corner of the conversation content box to enter JSON editing mode. You can paste JSON from a historical session to quickly create the conversation content for a test case.

Import conversation content with JSON

Import from call logs

In Call Logs, select a specific call and click the test-tube icon next to the chat bubble. The call log can then be imported into a test set as a test case.

Import from call logs

Judging Criteria

Judging criteria define the pass conditions for each test case. You can use the template shortcuts above the editor to quickly insert common judging criteria.

Set judging criteria

Manage Test Cases

After entering a test set, use the action buttons on the right side of each test case to disable/enable, copy, or delete it. Batch disable/enable, copy, and delete are not supported yet.

Manage test cases

Edit Test Cases

Click a test case in a test set to enter the edit test case page. The editing page provides the same overall capabilities as the new test case page.

Edit test case

AI-Generated Variants

On the edit test case page, click AI-generated variants in the lower-right corner to generate similar test cases based on the current conversation content. This is useful for batch-testing different expressions of the same scenario and checking whether the assistant remains stable.

AI-generated variants

Run Tests

There are two ways to enter the test execution page:

  • Path 1: Left sidebar → AI automated testing → Go to test on the right side of the test list. This opens the test management page for a specific assistant.
  • Path 2: Left sidebar → Assistant → Debug → Test management.

In Test management, select test sets under unit tests or regression tests, then click Run in the upper-right corner. Configure the model and repeat count, then start the test. The running status and results are shown in the list.

Run tests

Review Test Results

After the test is complete, go to Test management → Test results.

Test results entry

On the test results page, click a test set result to enter the test case result list and view multi-run results for each case.

View test case results

Advanced: Mock Configuration

If you have not configured real assistant tools, you can use Mock configuration to create virtual tools and returned content. This helps simulate tool-calling behavior in a real environment.

Path: Test management → Mock configuration.

Click New Mock, then enter the tool name and virtual tool-calling JSON to create a Mock configuration.

Mock configuration

Passing Criteria

  • Test sets and test cases are created successfully.
  • Test tasks can run normally and generate test results.
  • You can view multi-run results for each test case.
  • High-frequency questions, exception flows, and edge cases are handled as expected.
  • Failed cases can be traced back to prompts, knowledge bases, workflows, tool configuration, or judging criteria.

Next Steps

FAQ

When should I use automated batch testing?

Use it before launching an assistant, after prompt changes, after knowledge base updates, after workflow adjustments, or after tool configuration changes.

If one-off debugging passed, why do I still need automated batch testing?

One-off debugging only validates a small number of questions. Automated batch testing can validate multiple scenarios at once, making it better for finding regressions and edge cases.

Can I test tool calling without configuring real tools?

Yes. You can use Mock configuration to simulate tools and returned content, then validate how the assistant behaves in tool-calling scenarios.

Can AI-generated variants be used directly?

We recommend reviewing them manually before use to ensure the conversation content, business boundaries, and judging criteria match your testing goal.