Source: TesterHome Community
The popularization of 5G technology and the all-scenario Internet of Everything have fueled explosive growth in cloud-native applications and cloud services, accompanied by massive data accumulation.
The global pandemic starting in 2020 accelerated cloud migration for enterprises across all vertical industries, generating a large number of highly customized digital products.
The future industrial ecosystem will follow a concentrated development pattern: a small number of super-large platform products will dominate the market, while countless medium and small vertical products will serve segmented user groups. Most emerging products will adopt a thin-client, heavy-server technical architecture.
This architectural trend creates a sharp surge in demand for server-side performance testing. However, most small and medium teams underestimate user experience value, resulting in compressed testing cycles and insufficient resource allocation for performance verification.
Even if performance testing is not your daily core responsibility, mastering its basic logic is a mandatory skill for all QA engineers.
Performance testing is a cross-functional system engineering task. Different team roles evaluate “performance” from completely different dimensions. Below is a clear breakdown by stakeholder group:
Users only care about two intuitive indicators: operation response speed and whether system crashes interrupt usage.
Didi’s Valentine’s Day large-scale service outage, which was directly triggered by insufficient performance capacity under traffic spikes.
Executives focus on three core business outcomes tied to performance: total revenue, infrastructure cost efficiency (user volume supported per unit cost), and overall user satisfaction retention.
Ops teams prioritize server hardware resource utilization, long-run system stability under continuous load, and fault automatic recovery capability.
Engineers pay attention to code execution efficiency, SQL query latency, thread lock contention, memory leakage and internal service call bottlenecks.
Performance testers quantify system throughput, latency and error rate, mine hidden bottlenecks, verify SLA compliance and output executable optimization suggestions for R&D teams.
For all C-end products, system performance directly determines user churn rate and long-term growth. Even market leaders like Alibaba and JD cannot ignore slow page response; obvious latency degradation will trigger mass user loss.
Most teams acknowledge latency affects revenue, yet few conduct quantitative operational analysis to calculate specific financial losses.
The 2016 Global Retail Digital Performance Benchmark Report released authoritative correlation data between latency and conversion rates:
Walmart’s official data proved that reducing page latency by merely 0.1 seconds lifted overall revenue by 1% — a tremendous profit increment for large retail platforms.
The performance link of medium-to-large e-commerce platforms covers every layer from client terminal to database storage, forming a closed-loop system. All links require performance verification:
Enterprise-level full-link performance testing is a cross-department large-scale engineering project that requires sufficient manpower and test cycle support.
All test design work must start with clear test objectives; every subsequent step serves these defined targets.
All formal performance tests must cover throughput, latency, hardware utilization, request success rate and long-run stability. Below are standardized measurement standards for each key metric:
Use high-percentile latency (P95/P99) as official evaluation standard instead of average latency.
Industry general baseline standards: read interface ≤ 200ms, write interface ≤ 500ms. If internal SLAs are unavailable, benchmark against competing products.
Expressed via QPS (Queries Per Second) or TPS (Transactions Per Second): the peak concurrent traffic the system can process while meeting latency SLA requirements.
High QPS and low latency are meaningless if most requests fail. Under target load pressure, the success rate must remain close to 100%.
Each server cluster has a critical load threshold. Once traffic exceeds this inflection point:
Run the system under target peak throughput for continuous 7×24 hours. Monitor CPU, memory, disk I/O and network bandwidth to confirm flat, stable resource consumption curves — this steady-state capacity is your production safe performance ceiling.
Gradually increase concurrent load to find the maximum traffic volume that maintains 100% request success rate for at least 10 minutes (latency SLA limits are temporarily ignored in this test).
Alternate stable peak load and extreme burst load cyclically for up to 48 hours:
Repeat cycles continuously, observe resource utilization and latency fluctuation to verify stability under irregular sudden traffic spikes.
Latency anomalies may occur even under minimal traffic. For example, missing TCP_NODELAY configuration will introduce unnecessary request delays; ultra-small network packets cannot fully utilize bandwidth and limit throughput. Design edge test scenarios according to real online traffic characteristics.
Concurrent user volume growth increases server pressure, and TPS changes follow a fixed curve trend. Below is the standardized classification of performance testing, clear definition and applicable scenarios for each type:
Definition: Simulate real production traffic volume and business processes to verify the system meets formal performance SLAs.
Core goal: Confirm the system reaches agreed service capacity
Prerequisite: Clear standardized business processes and quantifiable performance targets
Application scenario: Formal acceptance testing against performance requirements
Definition: Gradually raise concurrent load until latency exceeds SLA limits or hardware resources reach saturation.
Core goal: Explore the maximum sustainable processing capacity of the system
Application scenario: Performance tuning verification, pre-launch capacity assessment
Definition: Run tests when core hardware resources (CPU, memory) hit saturation.
Core goal: Observe system stability under extreme resource pressure
Application scenario: Expose latent hidden bugs and extreme risk points
Definition: Simulate massive users simultaneously accessing identical interfaces, modules or database data rows.
Core goal: Discover concurrency-specific hidden defects
Common defects captured: memory leaks, thread deadlocks, database row lock contention
Application scenario: Mid-stage development concurrency risk inspection
Definition: Iteratively adjust hardware and software configuration parameters, measure performance changes and screen optimal resource allocation schemes.
Core goal: Quantify performance gains from parameter adjustments and prioritize high-impact optimization items
Prerequisite: Completed baseline test data for comparative analysis
Application scenario: Infrastructure capacity planning, service parameter fine-tuning
Definition: Run continuous tests under 70%–90% production standard load for multiple consecutive days.
Core goal: Verify long-running service stability
Standard test cycle: 2–3 consecutive days
Key risk signals: Gradually rising latency, continuously fluctuating resource consumption
Application scenario: Pre-launch long-run stability verification
Definition: Simulate partial service offline faults and measure actual user impact.
Core goal: Confirm available service capacity under partial failure
Deliverables: Document supportable concurrent user volume during faults, standardized emergency response playbooks
Application scenario: Systems with strict zero-downtime SLA requirements
Targeted verification for storage, data transmission and report statistics modules processing massive data records.
Test type divisions are not rigid and isolated. A single multi-day reliability test can integrate endurance, stress and concurrency testing logic. Design test suites around core business objectives instead of rigid classification rules to improve testing efficiency.
The complete standardized process is divided into five sequential phases, each with clear deliverables:
Performance testers cooperate with product managers and R&D engineers to sort project documents, analyze system architecture, and translate vague business demands into measurable quantitative metrics.
Covers scenario modeling, script development, test environment deployment, test data construction and pre-test environment optimization:
Two parallel core tasks run throughout the test cycle:
If measured metrics fail to meet SLA standards, troubleshoot root causes, implement optimization schemes and re-run verification tests.
Performance anomalies rarely appear independently; surface latency spikes or throughput drops are usually symptoms of upstream link bottlenecks. Full-stack multi-layer monitoring (application, database, OS, network) is required for accurate root cause analysis. All tuning work requires balancing trade-offs across every system layer.
Compile standardized formal performance test reports including test objectives, final measured metrics, environment configuration, test data rules, discovered defects and optimization solutions. Summarize core takeaways for internal knowledge precipitation and reference for subsequent testing projects.
This chapter provides objective horizontal benchmark data for common open-source pressure testing tools to help engineers select matching tools for different testing scenarios.
Controlled unified environment to eliminate hardware interference in comparison results:
wrk is a lightweight high-performance HTTP benchmark tool optimized for multi-core servers. It relies on high-performance native IO mechanisms (epoll / kqueue) and asynchronous event-driven architecture, generating massive concurrent load with minimal working threads.
Technical background: wrk reuses Redis’s ae asynchronous event loop framework, which originates from the Tcl jim interpreter.
Only supports single-machine execution by default; distributed pressure testing requires secondary customized development with high R&D costs.
Positioning: Not a full-function replacement for JMeter/LoadRunner; best for backend engineers’ quick ad-hoc interface performance verification.
wrk [OPTIONS] URL
Numeric parameters support unit suffixes (1k, 1M, 1G); time parameters support s/m/h units (2s, 2m, 2h).
wrk -c400 -t24 -d30s --latency http://10.60.82.91/
Running 30s test @ http://10.60.82.91/
32 threads and 400 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.31ms 40.13ms 690.32ms 98.33%
Req/Sec 2.14k 482.15 6.36k 77.39%
Latency Distribution
50% 5.11ms
75% 7.00ms
90% 9.65ms
99% 212.68ms
2022092 requests in 30.10s, 1.62GB read
Socket errors: connect 0, read 0, write 0, timeout 311
Requests/sec: 67183.02
Transfer/sec: 55.03MB
Key output information breakdown: total test duration, thread & connection quantity, average latency fluctuation distribution, P50/P75/P90/P99 percentile latency, total request volume, total transmission data, error statistics, final QPS and bandwidth throughput.
JMeter is a Java-based multi-thread open-source load testing tool, the most widely used enterprise-level pressure testing solution worldwide. Its virtual user (VU) model maps one OS thread to one simulated user.
Critical limitation: Single CPU core can only run one thread at a time. Mass concurrency triggers frequent thread context switching and heavy machine resource overhead. Excessive VU count will create bottlenecks on the load generator itself and distort test data.
Locust is a Python-based distributed load testing framework favored by modern R&D teams. Unlike JMeter’s OS thread model, Locust uses gevent coroutines built on libev/libuv event loops to simulate thousands of concurrent users with low resource consumption.
When the Locust load generator hits CPU saturation, measured latency data will deviate severely from real values. Example: Under identical traffic pressure, saturated Locust may report P90 latency of 340ms, while wrk captures the true latency of only 59.41ms.
Standard HttpUser Script
from locust import HttpUser, task
class QuickstartUser(HttpUser):
@task(1)
def fetch_detail(self):
self.client.get("http://10.60.82.91/")
def on_start(self):
pass
High-throughput FastHttpUser Script
from locust import task
from locust.contrib.fasthttp import FastHttpUser
class QuickstartUser(FastHttpUser):
@task(1)
def fetch_detail(self):
self.client.get("http://10.60.82.91/")
def on_start(self):
pass
Single Machine Headless Test
locust -f load_test.py --host=http://10.60.82.91 --no-web -c 10 -r 10 -t 1m
Parameter explanation:
Distributed Cluster Deployment
# Start master scheduling node
nohup locust -f locust_files/fast_http_user.py --master &
# Start worker pressure generation node
nohup locust -f locust_files/fast_http_user.py --worker --master-host=10.60.82.90 &
8-core load generator, 100 concurrent VUs: QPS = 38,500
Nginx total CPU utilization: 397.3% | JMeter load generator CPU consumption: 681% total