At the heart of TCP networks lies the congestion control algorithm, which essentially focuses on effectively detecting network congestion and maximizing data transmission efficiency using available bandwidth. The diverse nature of the internet has resulted in a multitude of congestion control algorithms. Yet, as research continues to grow, it becomes increasingly evident that there isn't a one-size-fits-all solution that can be applied to every network situation.
Congestion control is a distributed algorithm designed to allocate network resources fairly among multiple competing data flows. Congestion is characterized by the likelihood of packet loss and latency experienced by users due to an overload of network resources, which consequently reduces the network's spatial and temporal multiplexing efficiency.
The complexity of modern networks was not anticipated by the original designers. During the TCP/IP protocol data transmission process, statistics reveal that TCP data flows makeup 80% of network traffic, causing congestion to be a frequent issue for TCP data. The primary cause of congestion is the network's inability to provide sufficient resources to meet user demands, including buffer space, link bandwidth capacity, and processing capabilities of intermediate nodes. The internet's design mechanism, which allows anyone to share network resources at any time, lacks "admission control." As a result, when network resources are inadequate, it cannot limit the number of users and can only continue to serve them by reducing the quality of service – this is known as the "best-effort" service.
TCP congestion backoff comprises four fundamental algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery.
The main idea of this algorithm is to cautiously gauge the network's congestion level. Instead of sending a large amount of data right away, the algorithm gradually increases the size of the congestion window exponentially.
Initially, cwnd = 10
After 1 RTT, cwnd = 20 * 1 = 20
After 2 RTTs, cwnd = 20 * 2 = 40
After 3 RTTs, cwnd = 20 * 4 = 80
If the bandwidth is W, then it will take RTT * log2W time to fully utilize the available bandwidth.
The slow start approach mentioned earlier requires control to prevent inevitable network congestion. As cwnd grows rapidly, it quickly consumes network resources, but it cannot grow indefinitely; there must be a limit. TCP employs a slow start threshold (ssthresh) variable, and when cwnd exceeds this value, the slow start phase ends, transitioning to the congestion avoidance phase. In most TCP implementations, the ssthresh value is 65535 (also calculated in 16 bits). Congestion avoidance aims to switch from exponential growth to additive linear growth, preventing network congestion caused by overly rapid growth and gradually adjusting to the network's optimal value. The specific usage of ssthresh is as follows:
- When cwnd < ssthresh, use the slow start algorithm.
- When cwnd > ssthresh, switch to the congestion avoidance algorithm.
- When cwnd = ssthresh, either the slow start or congestion avoidance algorithm can be used.
Note: If the current cwnd reaches the slow start threshold, a segment is tentatively sent. If the server does not respond, TCP assumes that the network capacity has decreased, and the slow start threshold must be reduced. Simultaneously, to prevent further deterioration, the window may be reduced to 1.
The RTO timeout retransmission can result in low TCP transmission efficiency. Modern Linux TCP implementations utilize the ACK or SACK mechanism for cumulative confirmation of data packets, predicting packet loss, and enabling fast retransmission. The fast retransmission algorithm states that the sender should immediately resend the unacknowledged packet upon receiving three consecutive duplicate acknowledgments, rather than waiting for the retransmission timer to expire. The current Linux TCP fast retransmission algorithm includes not only the classic three ACKs but also the ER, FACK, and the latest time-sequence-based RACK packet loss fast retransmission algorithms, which have undergone numerous improvements.
Fast recovery is not an isolated process; rather, it is a follow-up action to fast retransmission. It is commonly believed that after the client receives 3 ACKs, it triggers fast retransmission. However, what happens if there are more duplicate ACKs? This is where fast recovery comes into play.
a. When the sender receives three consecutive duplicate acknowledgements, it performs the "multiplicative decrease" algorithm, reducing the ssthresh threshold by half (i.e., cwnd = ssthresh / 2). However, it does not initiate the slow start algorithm afterward;
b. Given that at this point, three ACKs can be continuously received, suggesting that there is no network congestion, the additive principle is applied. The number of segments of bytes is added based on the number of ACKs, or cwnd can be set to ssthresh, directly entering the congestion avoidance algorithm. It is worth noting that the TCP fast recovery implemented in Linux has been enhanced by Google, introducing a PRR fast recovery algorithm, primarily for faster fast recovery and better adaptation to modern networks.
Effectively detecting network congestion and maximizing network bandwidth utilization for data transmission are at the core of congestion control algorithms in modern networks. However, with the increasing complexity of today's networks, static congestion control algorithms are no longer suitable for these ever-changing environments and cannot be applied universally across all network scenarios. Here are two examples:
Bufferbloat issue: In the last mile's core router, a large number of data packets are queued and cached, resulting in an unusually long RTT for network transmission. The standard TCP protocol's congestion control algorithm fills the buffer at the bottleneck of the link, causing congestion signals to take a long time to provide feedback to the sender, which in turn prevents the sender from discovering packet loss promptly.
In wireless and mobile networks, a significant amount of out-of-order and link packet loss can cause the network congestion control algorithm to mistakenly believe there is network congestion. This results in conservative behavior by the sender, preventing effective utilization of the transmission bandwidth.
However, the challenges faced by TCP network congestion control extend far beyond these two examples. The following are some of the dilemmas encountered by current TCP congestion control.
Heterogeneity refers to the fact that TCP networks consist of various subnetworks, each containing different sub-topologies. This leads to different networks having distinct link characteristics. From a bandwidth standpoint, link quality can range from very slow, such as in mobile networks, to high-speed links, like those in core networks, with orders of magnitude ranging from kbps to G/s. From a latency standpoint, scenarios can vary from local network communication to satellite communication, with orders of magnitude ranging from ms to s. As networks rapidly evolve, this results in a growing complexity and diversity of realistic network scenarios (combinations of network bandwidth and end-to-end latency).
However, in networks, both bandwidth and latency are not constant. Changes in bandwidth and latency can be caused by varying levels of data flow competition, dynamic routing, and other factors.
Modern congestion control must effectively address this diversity. The congestion control principles introduced by Van Jacobson are primarily based on static scenarios and designed for a BDP (bandwidth-delay-product) scenario involving tens of packets. As networks have evolved, this congestion control algorithm has become increasingly difficult to adapt to the current network's BDP and changes. In many scenarios, the congestion control algorithm cannot provide ideal responses, resulting in low resource utilization.
The Additive Increase Multiplicative Decrease (AIMD) based congestion avoidance algorithm is too conservative for current networks and cannot be applied to the diverse scenarios present today. In some specific scenarios, congestion control parameters can be adjusted for adaptation. However, a fundamental challenge remains: whether a congestion control mechanism can be defined that performs well in a range of scenarios and maximally adapts to various congestion states on the network.
Evaluating the stability of congestion control algorithms in real network environments with varying packet sizes and network latency RTTs can be quite challenging. As a result, it is often more practical to assess these algorithms in a simulated environment. If an algorithm fails to maintain stability in a simple simulated environment, likely, it will also be unstable in real-world networks. However, it is important to note that simulated environments might not account for all the complexities of real networks, and an algorithm may still exhibit instability when faced with more intricate network conditions.
There are two key factors contributing to TCP stability:
1. The Additive Increase, Multiplicative Decrease (AIMD) congestion avoidance mechanism during periods of congestion. This factor stems from a basic vector analysis of two control endpoints with different time delays sharing a resource. The AIMD congestion avoidance algorithm has been in use since the 1980s and continues to be employed today.
2. The principle of packet conservation. The TCP sender's congestion control operation is driven or "controlled" by the receipt of ACKs (acknowledgments). When TCP transmission is in a stable state (with an appropriate cwnd value), receiving an ACK indicates that the sent data packet has been successfully received, allowing the sending operation to continue, similar to a Markov chain. Following this logic, the TCP congestion behavior in a stable state aims to maintain a constant number of data packets on the network transmission path. This is based on the "pipe model" and the principle of "packet conservation," which means that the number of data packets transmitted in the network simultaneously remains constant, and new data packets can only enter the network when old data packets have left. If the sender receives an ACK, it is assumed that a data packet has exited the network, and the congestion window is increased by 1. Adhering to the principle of packet conservation can help minimize network congestion. Essentially, the goal of congestion control is to identify and rectify instances where this principle is not being followed. However, many so-called optimizations are overly aggressive and do not adhere to the packet conservation principle, leading to increased network congestion.
With numerous congestion control algorithms available today, ensuring fairness among these algorithms across all networks is a critical issue that must be addressed before promoting their use. For instance, when two identical data streams vie for a bandwidth-limited channel, they should be able to share the resource fairly without any disparities. If a congestion control algorithm fails to guarantee the fairness of data streams, it could lead to aggressive network traffic and create a situation where gains in one area come at the expense of another. Thus, it is essential to evaluate the fairness of congestion control algorithms to maintain a balanced and efficient network environment.
Incorporating congestion control support into intermediate network devices would allow senders and receivers to adjust their congestion control algorithms based on explicit network feedback. This would enable connection endpoints to better understand the network characteristics of their current link and make more accurate congestion control decisions based on actual network congestion.
However, this issue primarily concerns hardware manufacturers. There are numerous challenges associated with requiring network support for congestion control, such as information collection, network state storage, feedback signal return frequency, performance, and robustness. If hardware support is necessary, promoting this feature becomes more complicated, as many home routers may not prioritize the inclusion of congestion control support. Therefore, it is crucial to weigh the benefits and challenges of requiring network support for congestion control to ensure its effective implementation and widespread adoption.
Popular congestion control algorithms like Reno and Cubic rely on packet loss detection for managing congestion. While router queue overflow can indeed result in packet loss, there are other potential causes, such as Corruption Loss, which is becoming increasingly common in wireless networks. In these situations, traditional packet loss-based congestion control mechanisms may not be the best fit. Consequently, optimizing the TCP protocol stack for wireless and satellite communications has become a focal point. To address Corruption Loss in designing congestion control algorithms, two key questions must be considered:
1. How can corruption be detected?
2. What should be the appropriate response?
Essentially, this involves determining how to identify Corruption and how to react to it. In TCP Westwood, the congestion control window is adjusted based on three DupACKs or RTO timeout, setting it to the measured bandwidth. This bandwidth measurement is performed using ACK counting frequency. This approach can provide effective throughput in the presence of noisy signals, rather than blindly reducing the window by half, as standard TCP does.
The TCP congestion control algorithm based on packet loss is based on the counting of packet numbers, regardless of packet size. The most famous performance formula for TCP's congestion avoidance algorithm is the square root formula, which is simplified here to represent the throughput of TCP. The throughput B of TCP is directly proportional to RTT and inversely proportional to the square root of the probability of packet loss:
B ~ C*S/(RTT*sqrt(p))
where
C is a constant
S is the TCP segment size (in bytes)
RTT is the end-to-end round-trip time of the TCP
connection (in seconds)
p is the packet drop probability
However, this formula is derived theoretically, and in practice, the actual network is much more complex. Moreover, many times RTT and packet loss are not completely accurately measurable.
When a new TCP connection is established, it's not aware of the available bandwidth and round-trip time (RTT) of the link. To manage the flow of data transmission, TCP connections use a process called slow start, which results in a longer time to find the optimal bandwidth-delay product (BDP). Furthermore, the exponential growth during slow start may lead to a significant number of packet losses when the congestion control window is large (known as slow start packet overshoot).
Additionally, slow start doesn't ensure fair bandwidth allocation for multiple data streams. For instance, a long-lived data stream tends to be more aggressive and consume more network resources compared to a slow-starting data stream. To address this issue, techniques based on bandwidth estimation can be employed, and congestion control algorithms using delay or rate stepping can provide a more equitable distribution of data stream resources.
Various elastic data applications require different levels of network traffic, such as short-term elastic services like HTTP and instant messaging, or long-term services for transferring large files. An effective congestion control scheme should be able to differentiate between these varying traffic demands and allocate appropriate bandwidth quotas to data streams based on their priority levels. To manage the relationship between traffic class priority and congestion control, one strategy is to utilize congestion signals, such as loss or explicit notifications. Another approach involves using Active Queue Management (AQM) mechanisms, where higher-priority traffic is granted higher scheduling priority and experiences reduced congestion levels in a separate queue.
The network is an autonomous system, and in the current Internet architecture, congestion control depends on individual interests. Nowadays, more and more Internet traffic comes from applications designed not to use congestion control schemes, such as using UDP, which can lead to Internet congestion collapse. For example, current Internet-based video-on-demand services require the transmission of larger data rates, and not using congestion control is becoming more common. The demand for bandwidth leads to networks not using congestion control, but at the same time, it may also be exploited for traffic attacks.
Estimating Round Trip Time (RTT) is a crucial parameter for TCP connections. Various TCP timer mechanisms, including RTO timeout retransmission, rely on the RTT value for adjustments. There are two primary techniques for measuring RTT:
1. Active testing: This method involves sending a special probe packet into the network and measuring the time it takes to receive feedback, such as using ICMP.
2. Passive testing: This approach utilizes the client's ACK and applies filters to smooth the RTT for estimation. The current Linux kernel employs passive testing.
Some challenges associated with RTT measurement include:
- Determining the optimal frequency for RTT measurement
- Designing effective RTT filters
- Establishing default RTT values for failed feedback signals
- Clock granularity: RTT estimation depends on the granularity of the clock protocol stack.
TCP network paths are diverse, and efficiently utilizing multiple path links to maximize bandwidth usage is advantageous.
Open Research Issues in Internet Congestion Control
WeTest Quality Open Platform is the official one-stop testing service platform for game developers. We are a dedicated team of experts with more than ten years of experience in quality management. We are committed to the highest quality standards of game development and product quality and tested over 1,000 games.
WeTest integrates cutting-edge tools such as automated testing, compatibility testing, functionality testing, remote device, performance testing, and security testing, covering all testing stages of games throughout their entire life cycle.
Give it a try for free today. Start Trial!