Source: TesterHome Community
Online system anomalies are a persistent and formidable challenge. Traditional testing methods often fail to catch these issues efficiently or cost-effectively, allowing them to reach production.
This article presents a proven solution: a fully automated, general-purpose unit test generation system. Built on a white-box approach, this system has been deployed across over 100 modules at Baidu, generating millions of lines of test code and detecting over a thousand potential regression issues.
We will walk you through the solution’s architecture, covering four core components:
Online stability is critical for both user experience and business revenue. It also directly reflects on the effectiveness of the Quality Assurance (QA) team.
To prevent issues, QA teams typically employ a mix of:
Despite these comprehensive measures, some issues inevitably slip through. This begs the question: Why does this gap persist?
We conducted a comparative analysis of existing anomaly detection methods, focusing on two key pain points: high cost and low recall (see Table 1).
Table 1: Comparative Disadvantages of Common Anomaly Detection Methods
|
Method |
Key Disadvantages |
|
Stress Testing |
High resource consumption; reactive detection. |
|
Functional Testing |
High development cost; difficult to create exceptional scenarios; reactive. |
|
Unit Testing |
High development cost; heavily reliant on developer expertise and effort. |
|
Static Code Analysis |
Reactive (rules created post-incident); low precision/recall; unsustainable ecosystem. |
Overall, current methods are either too costly or are reactive, detecting issues only after they have occurred. While Static Code Analysis is widely adopted, it suffers from:
Unit testing, however, offers substantial advantages:
This led us to a critical hypothesis: Can we maximize the benefits of unit testing while eliminating its dependency on manual coding and developer experience?
Our answer was to build an intelligent Unit Test generation system that acts as a proactive detection layer.
Figure 2.1: Stability Testing Funnel (A conceptual diagram showing a multi-layered approach, with Intelligent UT as a key proactive step after static analysis).
In early 2019, we evaluated existing C/C++ unit test generation tools like C++test and Wings. However, neither could meet our requirements for fully automated test generation for complex data types in intricate business scenarios, nor were they easily extensible. We therefore decided to build our own solution.
We first deconstructed the manual process a developer follows to write a unit test:
Our core strategy was to replicate these steps using white-box static code analysis, thereby achieving full automation.
Figure 3.1: The Manual Unit Test Writing Process (An illustration of the typical developer workflow that our system automates).
The overall architecture of our solution is depicted in Figure 4.1. We will now detail the implementation of its four core capabilities.
Figure 4.1: Technical Architecture Diagram (A high-level diagram showing the data flow from source code through analysis, data generation, and code generation, culminating in compilation and execution).
The goal of this stage is to use static code scanning to abstract complex function code into structured feature data, akin to a compiler’s symbol table. This data allows the system to programmatically understand the code.
We identified the essential information to be extracted from C/C++ code:
Table 4.1 (details omitted) summarizes the full set of features, including function names, class/namespace, parameter names/types, return type, and modifiers.
The extracted features are stored in an XML format called Code Struct Data (CSD) . This ensures easy access for other system modules.
Figure 4.2: CSD Example (A snippet of the XML-like structure, showing fields like function, param, and their respective attributes).
We required a lightweight, efficient, and open-source static analysis tool. We chose cppcheck and performed a secondary development to collect function call chain information and other global data.
Figure 4.3: Code Analysis Flow using cppcheck (A diagram illustrating how cppcheck parses the source code and generates the CSD).
This is the core fuzzing engine of our system. We use both generation-based and mutation-based fuzzing methods.
Our approach extends the generation-based method by using white-box information (paths, branches, and variable propagation) to guide the fuzzing process. This aims for better coverage and fewer invalid test cases.
Figure 4.5: Test Case Generation Architecture (A diagram showing the process: CSD + Source Code -> Path Selection -> Parameter Selection -> Candidate Data Sources -> Generation & Filtering -> Final Test Case Set).
This module performs three tasks to guide data generation:
Figure 4.6: Program Example for Path Analysis (Illustrates how multiple branches can be covered by a single merged test case).
Candidate values for parameters come from:
Combining candidate values for multiple parameters can lead to an explosion in test cases. We tackle this in two stages:
Figure 4.9: 2-Wise Pairwise Testing Example
|
Test Case |
X |
Y |
Z |
|
1 |
x1 |
y1 |
z1 |
|
2 |
x1 |
y2 |
z2 |
|
3 |
x2 |
y1 |
z2 |
|
4 |
x2 |
y2 |
z1 |
|
5 |
x3 |
y1 |
z2 |
|
6 |
x3 |
y2 |
z1 |
|
Caption: The 6 test cases generated by 2-Wise testing from 3 parameters (X: 3 values, Y: 2, Z: 2), compared to 12 for a full combinatorial approach. |
|
|
|
This method has eliminated over 90% of redundant test cases in our deployments. The final set is stored in JSON format for flexibility.
Figure 4.10: Test Case Set JSON Demo (A JSON snippet showing a function name and a map of parameter values for one test case).
Future work will incorporate meta-heuristic algorithms like genetic algorithms to handle interdependent parameters, further enhancing the system’s intelligence.
For robustness and reliability, we chose a syntax-rule and template-based generation method. This guarantees the generated code is syntactically correct and compilable, unlike deep-learning approaches which are not yet reliable for industrial use.
Figure 4.11: Code Generation Architecture (A diagram showing the flow from the Test Case Set and CSD through a Code Generator that uses a Template Engine to produce the final Unit Test Code).
For C/C++, we generate test code using the Google Test (GTest) framework’s “death test” functionality, which validates that a function terminates the program as expected (or unexpectedly, indicating a crash).
Figure 4.12: Complete Generation Example (Walks through the entire process from the source code of explore_filter to the final generated GTest death test).
Analyzing test failures is a major challenge. Key issues we addressed are:
Our solution implements a processing pipeline that performs:
Figure 4.16: Stack Trace Analysis Process (A flow diagram illustrating the steps from test execution failure to the generation of a structured, actionable failure report).
We designed two deployment modes to fit development workflows:
A risk-assessment step further optimizes the process, filtering out low-risk changes like logging modifications.
Figure 4.17: Deployment Architecture (A comprehensive diagram showing how the core capabilities are integrated with Baidu’s internal platforms to support the entire testing lifecycle).
Figure 4.18: Task Execution Result Example (A screenshot of a CI/CD task report showing a detected crash with details like failure type, stack trace, and the triggering test case).
Engineering Outcomes:
Business Outcomes: