Jan. 27, 2011, 1:40 a.m.
posted by mp
The implementation of QoS ultimately relies on a collection of traffic-management mechanisms. The evolution of packet switched networks introduced the need for the development of suitable traffic-management mechanisms. These mechanisms help network nodes avoid and manage congestion. Frame Relay and, especially, ATM networks make use of some of these or similar techniques. This section summarizes the key concepts behind traffic management within the context of IP and MPLS networks. These mechanisms represent the building blocks that network nodes use to implement QoS using both the DiffServ and IntServ architectures.
Network nodes generally perform traffic classification before applying traffic-management mechanisms. The aggregate traffic that traverses a node typically combines traffic with different QoS requirements. In those cases, network nodes need to classify the traffic to provide the expected level of differentiation.
Traffic classification can be in the form of an MF or BA classifier in the context of DiffServ. Classification can also be in the form of filter specification in IntServ. Traffic classification typically involves stateless inspection of packet headers. In some cases, nodes perform packet payload inspection or even stateful inspection of packets. However, the complexity and processing impact of those two types of classification limits its applicability to particular devices in the network.
Packet marking involves assigning a new value to a QoS-related field in the header of a packet. This marking typically associates the packet with a class or a drop precedence. As discussed earlier, the DiffServ architecture relies on packet marking to indicate the PHB for each packet. Several Layer 2 technologies also make use of packet marking for QoS purposes, as follows:
Figure summarizes the most common fields used for packet marking.
The term drop precedence is synonymous with drop priority
Traffic policing is a commonly used scheme for rate control. In different situations, a network node might need to control the amount of a particular traffic stream. A policer measures the traffic and compares the measurement with a predefined traffic profile. The comparison result determines the action the policer takes on the packet. The three main actions are transmitting, marking, or dropping the packet. The marking action indicates implicitly that the node will transmit the packet after marking it.
Policing is essential to the traffic-conditioning function in DiffServ. It is not exclusive to this architecture. Many technologies make use of traffic policing (for example, ATM and Frame Relay). In general, traffic policing is a popular mechanism at boundaries between administrative domains.
Policers generally use a token bucket as the traffic profile description. A token bucket has two parameters: token rate and bucket size. The token rate specifies the rate at which new tokens arrive. The bucket size defines the highest number of tokens that the bucket can accumulate. The algorithm is simple: The policer constantly adds tokens to the bucket at the token rate. The policer checks whether the bucket holds B tokens when a packet of size B arrives. A positive or negative result triggers two different actions. If the result is positive (the bucket holds B tokens or more), the policer removes B tokens and executes a predetermined action. Otherwise, the policer executes an alternative action. Figure shows three different traffic streams with the same average rate (10 Mbps) but with different token bucket definitions.
Three Traffic Streams with the Same Average Rate but Different Token Bucket Descriptions
RFC 2697 defines a single-rate policer for DiffServ. This policer uses two token buckets, C and E. Tokens arrive at bucket C at a committed information rate (CIR). Tokens over flow into bucket E when bucket C is full. The policer can operate in two modes: color blind and color aware. In color-blind mode, the policer does not consider the current packet marking. In color-aware mode, the existing packet marking influences the operation of the policer. The policer configuration identifies the traffic colors (that is, markings) as conform or exceed.
Figure shows the token buckets and the two modes of operation. This policer can trigger three different actions: conform, exceed, and violate. Each of those actions can be transmitting, marking, or dropping the packet.
Single-Rate Policer Defined in RFC 2697
RFC 2698 proposes a similar dual-rate policer for DiffServ. This policer also uses two token buckets, C and P. Tokens arrive at bucket P at a peak information rate (PIR). Similarly, tokens arrive at bucket C at a CIR. Tokens do not over flow between buckets in this case. The policer can also operate in a color-blind or color-aware mode. These two modes have the same characteristics described earlier.
Figure illustrates the token bucket structure and the policer operation in its two modes. Similar to the single-rate policer described earlier, this policer can trigger three actions: conform, exceed, and violate. Again, each of those actions can be transmitting, marking, or dropping the packet.
Dual-Rate Policer Defined in RFC 2698
Shaping is another commonly used mechanism for rate control. Similar to a policer, a shaper measures traffic and compares the measurement with a profile. In this case, the comparison result determines whether the shaper should delay the packet or permit further processing. Therefore, shaping requires the buffering or queuing of packets that exceed the profile. Shaping allows a node to absorb bufirsts in a traffic stream and smooth these bufirsts by serving the stream according to the profile. Shaping might result in packet loss if the traffic stream drastically exceeds the profile or might not smooth the traffic if the stream never exceeds the profile. Shaping is also essential to traffic-conditioning in DiffServ. Figure shows the smoothing effect of a (10-Mbps) shaper on a sample traffic stream.
Effect of Shaping on a Sample Traffic Stream
Shaping can also make use of a token bucket as a traffic profile. The token bucket algorithm remains the same: Tokens go into the bucket at the token rate. The bucket size defines the maximum number of tokens that the bucket can hold. The shaper checks whether B tokens are available in the bucket when a packet of size B arrives. If the current token count is greater than or equal to B tokens, the shaper serves the packet and removes B tokens from the bucket. If less than B tokens are in the bucket, the shaper queues the packet. If the packet encounters a queue when arriving at the shaper, it waits its turn at the end of the queue. The smaller the bucket size, the smoother the shaper output is. Figure shows a packet shaper using a token bucket.
Example of a Packet Shaper Using a Token Bucket
Buffer allocation and traffic scheduling are two popular mechanisms for congestion management. Suppose, for instance, that a node switches traffic to an interface at a rate that exceeds the capacity of that interface at that specific time. When congestion occurs, the interface manages the excess traffic by buffering or queuing the traffic. A scheduler decides how to serve the queue (that is, it decides which packet to transmit next). A node may create multiple queues at a given congestion point. Each queue can receive a different allocation of buffers and bandwidth. This resource allocation, along with the scheduling discipline between queues, provides different latency, jitter, and loss characteristics for the traffic in the different queues.
Congestion may happen at different points within a network node. Each congestion point represents a potential source of delay, jitter, and loss for traffic streams. The most common points subject to congestion are output interfaces. However, nodes do experience congestion at other points. For example, nodes with a distributed architecture might face congestion at the interface between a line card and the switch fabric. Other service modules (for example, encryptors and compressors) can also experience congestion. The exact list of points subject to a relevant level of congestion depends on the device architecture and the traffic patterns across the device.
A simple approach to congestion management uses a single queue with a first-in, first-out(FIFO) scheduler. With a FIFO queue, an interface transmits packets in the order they arrive. New packets are appended to the end of the queue. The queue grows as long as packets arrive faster than the interface can send them. At some point, the queue might exceed the maximum buffering capacity (in which case, the interface needs to discard a packet). A common policy is to drop the arriving new packets in favor of those packets already waiting in the queue. This dropping policy receives the name of tail drop. In this case, the maximum queue size limits the maximum queuing delay that a packet will experience on this interface. Tail drop is one of several mechanisms that could be used. The next section discusses other methods of managing queue size.
Weighted fair queuing (WFQ) is a popular scheduler for fair bandwidth sharing. A simple scheduler that served nonempty queues in a round-robin order can provide bandwidth sharing. However, bandwidth distribution would not be fair because those queues with larger packets would eventually get a larger share. A bit-by-bit round-robin scheduler would be fair to packets of all sizes, but nodes transmit entire packets at a time and not individual bits. WFQ approximates the ideal behavior by computing the departure time of the packet as if a bit-by-bit round-robin scheduler were in use. The scheduler selects packets for transmission in the order of their departure times. A weight associated with queues can scale the departure time and ultimately scales the bandwidth share for the queue.
Figure illustrates the operation of WFQ with three sample queues. The packet number identifies the order in which each packet arrived. The scheduler orders packets by their departure time and serves them in that order. In this case, the first queue has a weight two times the weight of the other two queues. The weight has the effect of reducing the departure time in half for packets in that queue. This weight provides that queue a guarantee of half the bandwidth in the worst case. The scheduler still guarantees a quarter of the bandwidth to each of the other two queues. Queues may get more than their fair share of the band width if other queues are empty. The bottom part of the figure shows the transmission order that WFQ creates in comparison with a FIFO scheduler.
WFQ Scheduler with Three Sample Queues
Deficit round robin (DRR) is an alternative scheduler that can provide bandwidth sharing. Similarly to WFQ, DRR provides fair bandwidth sharing for variable-length packets and a weighting mechanism to influence bandwidth distribution. However, this scheduler results in a simpler hardware implementation that facilitates its use at higher speeds. The scheduler maintains a deficit (byte) counter for all nonempty queues. During each round, the scheduler selects the next nonempty queue, adds a quantum (byte) value to the queue deficit counter, and serves as many packets as the result permits. The deficit counter receives any remainder after serving the queue. The scheduler can scale the bandwidth share of a queue by scaling the quantum value for that queue.
Figure shows the operation of DRR with three sample queues. As in the previous figure, the packet number identifies the order in which each packet arrived. The first queue uses a quantum of 600, which is two times the quantum of 300 used for the other two queues. That configuration guarantees that the first queue gets half of the bandwidth in the worst case. Each of the other two queues gets a quarter of the bandwidth in the worst case. As with WFQ, queues may get more than their fair share of the bandwidth if other queues are empty. The scheduler uses the packet size and the deficit counter to determine the next packet to transmit in each round. The bottom part of the figure shows the transmission order that DRR generates in comparison with a FIFO scheduler.
DRR Scheduler with Three Sample Queues
Schedulers operate independently from the classification process that selects a queue for arriving packets. The granularity of that classification process and its corresponding number of queues can vary greatly. Classifying and queuing microflows separately provides a high level of isolation; as the number of microflows increases, however, the processing complexity of the scheduler increases. A microflow hashing function with a fixed number of buckets reduces this complexity with a lower level of isolation between individual microflows. A scheme based on DiffServ BAs reduces the processing complexity even further while providing coarse traffic isolation.
Actual scheduler implementations regularly include the ability to serve some traffic with strict priority. Schedulers such as WFQ or DRR can provide bandwidth guarantees and latency bounds for each queue. However, they cannot guarantee low latency in general terms. A hybrid scheduler can compensate for this limitation with the addition of at least one priority queue. The scheduler serves this queue with strict priority until the queue becomes empty. Only at that point does the scheduler serve other queues. The priority queue does not preempt the transmission of a nonpriority packet, but the presence of a single packet in the priority queue forces the scheduler to serve that queue next.
Active Queue Management
Network nodes need to manage queues to control their length. Active queue management consists in dropping or marking packets before a queue becomes full and is independent of the type of scheduler serving the queue. Dropping and in some cases marking play an important role in signaling congestion to TCP sources, which consume most of the bandwidth in current networks. How a node indicates congestion to traffic sources significantly impacts the delay and throughput that TCP sessions experience. How UDP sessions react to drops is application specific because the protocol does not provide a built-in congestion-control. The simplest form of queue management drops new packets that arrive to a full queue. This approach, called tail dropping, uses a fixed maximum queue size. Tail dropping provides limited packet differentiation and can have suboptimal effects on TCP congestion control.
Random early detection (RED) is a popular mechanism for active queue management. RED improves fairness among TCP sessions and can prevent TCP global synchronization. When using RED, nodes drop arriving packets with a probability (p) that is a function of the average queue depth. Figure shows the drop probability function. Nodes drop packets with a probability that grows linearly from zero to a maximum within a range of average queue depth values. This range is defined by a minimum threshold (minTh) and a maximum threshold (maxTh). The packet drop probability remains zero below the minimum threshold. When the average queue depth exceeds the maximum threshold, the drop probability becomes one, and the node drops all new arriving packets. RFC 2309 contains recommendations for active queue management.
Drop Probability Function for RED
RED relies on the computation of the average queue depth. The use of the average depth rather than the instantaneous depth allows the node to accommodate traffic bursts but react to persistent congestion. RED uses a moving average as a simple approximation of the actual average depth. This moving average is computed with the following equation:
a i = (1w) a i1wq i
Figure illustrates the impact that different averaging weights depth have on the average queue depth estimation. The average is smoother with lower weight values. The higher the weight, the closer the average queue depth tracks the instantaneous size. A weight value of one produces an estimated average queue depth that equals the instantaneous queue.
RED Average Queue Depth Estimation with Different Weights
A network node can use multiple RED profiles to perform active queue management using different probability functions. This particular use of RED receives the name of weighted RED (WRED). The word weight refers to the drop profile selection mechanism. The network node uses a packet marking (for example, IP DSCP or MPLS EXP) for the drop profile selection. WRED can use that drop precedence of a packet to apply different WRED profiles. This drop precedence can be the result of traffic policing marking some packets differently based on whether they exceeded a traffic profile. Figures 1-24 shows three sample drop profiles that a node could use for queue management. The definition of each WRED profile is independent from the others. For example, they may overlap or have different slopes. However, network nodes should use a more aggressive profile for packets with a higher drop precedence.
Sample WRED Design with Three Drop Profiles
Active queue management can use packet marking as an alternative to packet dropping. A node can use packet marking for congestion notification. Although packet dropping is an acceptable congestion-notification mechanism for long-lived high-throughput TCP sessions, short-lived interactive TCP sessions react better to explicit congestion notification(ECN). RFC 3168 defines ECN for IP. MPLS has no standard congestion-notification specification. Definition of MPLS ECN would probably require using a bit in the MPLS EXP field. This approach would reduce the number of PHBs that an E-LSP could transport. Figure shows the location of the ECN field in the IP header.
ECN Field in IP Header as Defined in RFC 3168
Link Fragmentation and Interleaving
A network node may require performing link fragmentation and interleaving (LFI) to reduce latency for priority traffic. Schedulers that use a priority queue do not preempt the current transmission of nonpriority packets. A priority packet may arrive right after the transmission of a nonpriority packet starts. In that case the priority packet must wait for the complete transmission of the nonpriority packet before the scheduler serves the priority packet. Depending on packet sizes and transmission rates, priority packets might experience an unacceptable amount of latency. Fragmentation of large nonpriority packets and interleaving of fragments with priority packets bounds the latency for priority traffic. Figure shows the latency that different packet sizes create at various rates.
LFI uses Layer 2 fragmentation mechanisms that the network node applies to influence packet scheduling. Layer 2 fragmentation is a more generic and efficient approach than other alternatives (for example, IP fragmentation). The most common fragmentation mechanism is Multilink PPP (MLP). It can be used on PPP, Frame Relay, and ATM links. Frame Relay links can also use native fragmentation mechanisms (FRF.11 and FRF.12).
Figure demonstrates how link fragmentation integrates with the packet scheduler managing congestion. Notice that the scheduler continues to serve individual packets and fragmentation and interleaving happens after the scheduling of nonpriority packets. This approach prevents interleaving of nonpriority traffic and simplifies the reassembly of packets at the receiving side of the link.
Congestion Management with LFI
Header compression provides higher bandwidth efficiency and reduces transmission latency. Real-time traffic generally relies on the Real-Time Transport Protocol (RTP). These traffic streams also receive an UDP and IP encapsulation. Depending on the type of real-time traffic, header information can be larger than the actual real-time payload. To reduce the overhead, a node can use RTP header compression (cRTP) to compact IP, UDP, and RTP headers. The remote end of the link reconstructs the headers before continuing packet forwarding. The reduction in packet size saves link bandwidth and reduces the transmission latency due to the node having to transmit fewer bits to forward the RTP packet. RFC 2508 defines cRTP. The IETF is working in the definition of cRTP operation over MPLS.
Figure illustrates the effect of cRTP on a sample RTP packet. In this case, the original RTP packet consists of 40 bytes of header information and 20 bits of RTP payload. The packet overhead accounts for two thirds of the total packet size. For this packet, cRTP converts the header information into 3 bytes. After compression, the packet overhead accounts for slightly higher than one third of the packet. The smaller packet size can also save a few milliseconds of transmission latency in low-speed links. The size of the compressed headers can actually range between 2 and 4 bytes depending on the contents of the original headers.
Example of the Effect of RTP Header Compression on the Overall Packet Size