I currently lead test management at a B2B company, where most team members are mid-level QA engineers who have not yet built systematic testing methodologies. Early in my tenure, during performance test plan reviews, I noticed a critical gap: my team can execute performance tests proficiently—they write JMeter and Python scripts with ease—but communication and execution break down when it comes to core performance testing concepts. This ambiguity leads to wasted engineering effort, misaligned test results, and eroded customer trust.
Common pain points we faced include:
Confusion between concurrent users, online users, QPS, and JMeter threads: If a contract requires “supporting 50 concurrent users”, is setting JMeter threads to 50 a valid simulation?
Misunderstanding “concurrency” for asynchronous tasks: Does testing only the async trigger endpoint (with no errors) mean the entire async flow meets performance requirements?
These issues are not unique to my team. Many QA professionals struggle with these concepts due to their inherent complexity, and teams often lack unified terminology. This guide is designed to clarify these error-prone concepts—it does not teach basic performance testing or tool usage from scratch. Instead, it serves as a go-to reference for teams to communicate using the same conceptual framework, ensuring performance tests are credible, consistent, and actionable.
For most B2B and B2C projects, performance testing focuses on three core goals: Fixed-Load Testing, Capacity Stress Testing, and Endurance Testing. Choosing the right objective is critical to aligning test results with business and customer requirements.
Fixed-load testing involves testing under stable, predefined load to quantitatively measure response time and throughput. It is used to verify whether the system meets contractual or SLA performance targets under expected business pressure—making it ideal for validating compliance with customer requirements.
Fixed-load testing is divided into two types:
Pulse fixed-load testing: Short duration (seconds or ≤ 5 minutes), used to simulate sudden traffic spikes.
Non-pulse fixed-load testing: Medium duration (10–60 minutes), used to simulate steady, sustained traffic.
Unlike fixed-load testing (which validates behavior under known load), the goal of capacity stress testing is to find the system’s maximum pressure limit. This helps identify bottlenecks and guide optimization efforts.
How to execute capacity stress testing:
Start with low load and gradually increase it in stages.
Stop when throughput stops scaling linearly or degrades sharply (this inflection point is the system’s bottleneck).
Run each load stage for 10–20 minutes to ensure stable metrics.
Endurance testing simulates sustained real-world load over an extended period to detect slow, chronic issues that short-term tests miss. These issues include memory leaks, abnormal resource recycling, connection pool exhaustion, and thread pool degradation.
Best practices for endurance testing:
Recommended duration: 6–12 hours (long enough to expose chronic issues).
Continuously monitor key metrics: CPU usage, memory usage, GC frequency, and connection/thread pool stability.
Migration/refactoring projects: Use fixed-load testing to compare performance before and after changes.
Projects with clear performance goals: Use fixed-load or capacity stress testing. Fixed-load testing is faster, while capacity stress testing is better for identifying bottlenecks.
Projects without clear goals: Use capacity stress testing and report only factual data (no pass/fail judgment).
All projects: Include endurance testing, with lower priority than fixed or stress testing based on project schedule and risk.
A common mistake in performance testing is applying the same testing patterns to synchronous and asynchronous tasks. Understanding the difference is critical to creating realistic test scenarios and accurate results.
Synchronous interfaces require the client to send a request, wait for processing to complete, and receive a full response before proceeding. The user receives a response immediately, indicating the task is complete.
Examples of synchronous interfaces:
Chat APIs and customer support interfaces
Recommendation and search APIs
Form and questionnaire submission interfaces
Asynchronous tasks work differently: the system immediately accepts the task, returns a task ID to confirm receipt, and processes the work in the background. The client must verify task completion via polling, callbacks, logs, or database queries.
Examples of asynchronous tasks:
Offline data computing and batch processing
Offline labeling and machine learning inference tasks
Bulk export and report generation tasks
Many QA engineers apply synchronous testing logic to asynchronous flows, resulting in unrealistic load and incorrect test conclusions. Here’s a real-world example:
Requirement: Support 10 concurrent asynchronous tasks, with an average completion time of ≤ 10 minutes.
Mistake: Hitting the async trigger endpoint at 10 requests/sec for 5 minutes → total tasks = 3000 (far exceeding the system’s design capacity) → false “test failure”.
Correct Approach: Trigger 10 tasks at once, wait for them to complete, then trigger another 10—ensuring no more than 10 tasks are processed concurrently at any time.
Concurrency control: Async testing uses one-time batch triggering (not continuous requests) to maintain a fixed number of background tasks.
Metric collection: Tool-reported response time and success rate only reflect “task submission” status—not the actual task processing time or success rate. Real metrics must be collected from UI logs, databases, or task status APIs.
A valid performance test plan requires modeling four key dimensions to ensure test results reflect real-world conditions. These dimensions are traffic model, business model, data volume model, and cache model—all critical for Google’s search crawlers to understand the depth and relevance of your content.
One of the most common sources of confusion in performance testing is mixing up traffic-related terms. Customers often use “online users” or “concurrent users” in requirements, while testers report “threads” or “QPS” in results—leading to misalignment. Below are clear definitions to resolve this:
Online users: Logged-in users who are active on the platform but not necessarily interacting. Does not directly equal system pressure (e.g., a user logged in but idle creates no load).
Concurrent users: Online users who are actively interacting with the system during a specific period. This is the most relevant metric for B2B performance requirements.
QPS / RPS: Server-side requests received per second (HTTP or RPC). Directly measures system load and is ideal for ToC systems with high real-time traffic.
TPS: Transactions completed per second (a transaction = a logical business flow, e.g., “query balance → transfer → verify balance”). Common in complex B2B workflows.
JMeter threads: Virtual users in testing tools (BIO model), where each thread waits for a response before sending the next request. Threads ≠ QPS.
3 online users, 2 concurrent users:
User A: 2 requests/sec → completes 1 transaction per second.
User B: 1 request/sec → completes half a transaction per second.
Result: Concurrent users = 2, QPS = 3, TPS = 1.5.
ToC systems: Use QPS/RPS (high traffic, real-time requirements, micro-service architecture).
ToB systems: Use concurrent users or TPS (business-focused, easier for customers to understand).
Customer alignment: If a customer uses “online users” in requirements, clarify the difference and translate it to concurrent users or TPS to ensure test goals are realistic.
The business model defines the ratio of different business flows under the same traffic. Even the same QPS can create drastically different system pressure depending on the business mix—making this a critical component of accurate performance testing.
Example: A chatbot API with a target QPS of 100. Different query types hit different algorithms, leading to varying resource consumption:
Balanced query mix: 137X CPU usage per second.
60% single-intent, 30% multi-intent, 10% casual chat: 250X CPU usage per second.
Existing systems: Analyze 7 days of real production logs to identify traffic ratios by interface, intent, and request parameters. Use Linux Shell scripts for quick analysis (no complex coding required).
New systems: Agree on an initial ratio with product managers, customers, or stakeholders. Use this as the baseline for testing if no objections are raised.
For most database-reliant systems, performance is highly sensitive to data size. Performance testing must use data volumes that reflect real-world production conditions to ensure accurate results.
New projects: Agree on the expected production data scale with customers and use this as the testing baseline.
Migration projects: Use production-equivalent data in the new environment (no extra data construction needed).
Limited environments: If simulating production data is too costly, focus on performance regression between versions (same data size for before/after comparisons).
Over-reliance on cached data during testing can underestimate real server load. To avoid this, design test data to reflect real-world cache hit rates.
Prepare enough unique test data (≥ QPS × test duration) to ensure natural cache behavior.
Clear the cache or use new test data between test runs to avoid skewed results.
Performance exit criteria are the benchmarks that determine whether a system meets requirements. These criteria must be clear, measurable, and aligned with customer expectations—critical for avoiding post-test disputes.
System throughput (QPS/TPS) must match the target defined in the traffic model. Any deviation indicates a performance gap that needs addressing.
Existing systems: No regression compared to peak production response times.
Customer-specified requirements: Follow the customer’s defined thresholds.
No explicit requirements: Default to 95th percentile response time ≤ 1s for synchronous interfaces; report metrics only (no hard thresholds) for asynchronous tasks.
Synchronous HTTP/RPC interfaces: Response time = time from request sent to full response received.
Streaming interfaces (e.g., Agent类): Response time = time from request sent to first byte received.
Asynchronous tasks: Response time = task end time − task start time (queue time optional, based on requirements).
Follow customer or stakeholder requirements.
Default threshold: Error rate ≤ 1% (for most B2B systems).
Follow customer or stakeholder requirements.
Default thresholds: CPU ≤ 80%, Memory ≤ 80% (to avoid resource exhaustion).
No continuous memory growth (indicates potential memory leaks).
After GC, memory usage returns to near pre-test levels (minimal residual growth).
Connection/thread pool sizes remain stable (no exhaustion).
Disk space growth is within expected limits (avoid log overflow).
A dangerous illusion in performance testing: 0 error rate and normal average response time, but the system cannot sustain the target QPS over time. This happens due to the system’s “water tank” (queue/buffer) capacity.
Example: System throughput = 5 QPS, queue capacity = 50, test input = 10 QPS for 3 seconds. The queue does not fill in 3 seconds, so metrics look good. But if the test runs for 10 seconds, the queue overflows, errors spike, and the system bottleneck is exposed.
Conclusion: For systems with queuing or buffering mechanisms, short tests are misleading. Always run sufficiently long sustained pressure tests to validate real-world performance.
JMeter is one of the most popular performance testing tools, but many QA engineers misuse it by equating threads to QPS. Below are proven methods to control traffic and business models in JMeter, aligned with real-world testing best practices.
JMeter threads use a BIO model: each thread waits for a response before sending the next request. The relationship between threads, QPS, and average response time is defined by this formula:
QPS ≈ Threads × (1000 ms / Average Response Time (ms))
Use 1 thread to run a 10-second test and get the average response time.
Estimate required threads using the formula: Threads ≈ (QPS × Average Response Time) / 1000.
Add a Constant Throughput Timer to limit the QPS upper limit (no lower limit).
Note: Target throughput is in QPM (requests per minute) → multiply target QPS by 60. Select “Calculate Throughput based on: All active threads”.
If you have a clear understanding of the system’s performance, you can control QPS by adjusting the thread count directly (no timers needed). This requires familiarity with the thread-to-QPS mapping for your system.
Set the target QPS curve in the Throughput Shaping Timer (name the component “timer”).
In the Ultimate Thread Group, use the dynamic feedback function: ${__tstFeedback(timer,1,500,30)} to adjust threads automatically and follow the QPS curve.
Two effective methods to control business flow ratios, depending on your test needs:
Use a Random Variable to generate a uniform random number (1–100, e.g., variable name “prob”).
Add three If Controllers with conditions to achieve the desired ratio (e.g., 60%/30%/10%):
60%: ${__jexl3(${prob}>=1 && ${prob}<=60,)}
30%: ${__jexl3(${prob}>=61 && ${prob}<=90,)}
10%: ${__jexl3(${prob}>=91 && ${prob}<=100,)}
For more precise control, create a CSV file with 100 rows (matching the desired ratio: e.g., 60 rows for scenario 1, 30 for scenario 2, 10 for scenario 3). Use CSV Data Set Config to loop through the file, naturally achieving the desired business ratio.
Even the best test plan can fail if execution is not carefully managed. Follow these best practices to ensure accurate results and avoid costly mistakes:
Notify developers: Sync with the development team before testing to facilitate issue troubleshooting.
Monitor both sides: Track metrics for both the system under test (SUT) and the pressure generator. If the generator’s resources are saturated, the bottleneck may be in the testing tool—not the system.
Verify the environment: Double-check IPs, domains, and test identifiers to avoid accidentally testing production systems (a critical mistake that can cause downtime).
Avoid peak conflicts: Do not run performance tests during integration testing, regression testing, or other high-load activities to prevent interference.
Performance testing success depends on clarity—clear concepts, clear objectives, and clear alignment with business and customer requirements. This guide is designed to resolve the most common ambiguities that plague QA teams, especially in B2B environments. By following these practices, you can ensure your performance tests are credible, actionable, and aligned with real-world needs.
For QA managers and engineers looking to refine their performance testing process, this guide serves as a reference to standardize terminology, avoid common pitfalls, and deliver results that customers trust. Remember: performance testing is not just about running scripts—it’s about validating that the system can deliver value under real-world conditions.