Explicit Path Control in Commodity Data Centers: Design and Applications
1. What is XPath?
XPath is a simple and practical design for explicit path control in commodity data centers. XPath explicitly identifies an end-to-end path with a path (or tunnel) ID and leverages a two-step compression algorithm to pre-install all the desired paths into IP tables of commodity switches.
2. Why Explicit Routing in Data Center Networks?
Data center networks (DCNs) are often designed with multiple paths between any two nodes. Equal Cost Multi-Path routing (ECMP) is the state-of-the-art for multi-path routing and load-balancing in DCNs. In ECMP, a switch locally decides the next hop from multiple equal cost paths by calculating a hash value, typically from the source and destination IP addresses and transport port numbers. Applications therefore cannot explicitly control the routing path in DCNs. However, many emerging DCN applications such as provisioned IOPS, fine-grained flow scheduling, bandwidth guarantee, etc., all require explicit routing path control over the underlying topologies.
3. XPath Overview
As shown in fiugre 1, to leverage XPath for explicit path control, we can employ a logically centralized controller, called XPath manager, to control the network. The XPath manager has three main modules: routing table computation, path ID resolution, and failure handling. Servers have client modules for path ID resolution and failure handling. Note that XPath manager doesn't have to be implemented in a centralized way.
Routing table computation: This module is the heart of XPath. We design a two-step compression algorithm: paths to path sets aggregation (in order to reduce unique path IDs) and ID assignment for prefix aggregation (in order to reduce IP prefix based routing entries).
Path ID resolution: In XPath, path IDs (in the format of 32-bit IP, or called routing IPs3) are used for routing to a destination, whereas the server has its own IP for applications. This entails path ID resolution which translates application IPs to path IDs.
Failure handling: Upon a link failure, the detecting devices will inform the XPath manager. Then the XPath manager will in turn identify the affected paths and update the IP-to-ID table (i.e., disable the affected paths) to ensure that it will not return a failed path to a source that performs path ID resolution.
4. Compression Algorithm
One cannot enumerate all possible paths in a DCN as the number can be extremely large. However, we observe that DCNs do not intend to us all possible paths but a set of desired paths that are sufficient to exploit the topology redundancy. Even though, the challenge is that the sheer number of desired paths in large DCNs is still large, e.g., a Fattree (k = 64) has over 232 paths among ToRs (Top-of-Rack switches), exceeding the size of IP table with 144K entries, by many magnitudes. To tackle the above challenge, we introduce a two-step compression to encode all the desired paths into IP tables with limited routing entries.
Step I: paths to path sets aggregation. Based on the relations of paths, we mainly use two basic approaches for paths to path sets aggregation: convergent paths first approach (CPF) and disjoint paths first approach (DPF). In practice, users may have their own preferences to define customized path sets for different purposes. Based on this observation, we category our step I compression into two types: 1) general aggregation without knowing the DCN topology and user preferences; 2) leveraging the DCN topology and user preference to guide the aggregation.
Step II: ID assignment for prefix aggregation. By assigning the IDs of paths that traverse the same egress port consecutively, these IDs can be expressed using one entry through prefix aggregation. Our goal is to assign (or order) the path IDs in such a way that they can be best aggregated using prefixes among all switches. To reduce the runtime of our code, we introduce equivalence reduction to improve the time efficiency of our prefix aggregation algorithm.
Remark: The algorithm mentioned above is designed for arbitrary DCN topologies. However, specially for tree-based DCN topologies, we find that path compression can be done much more efficiently. We are currently working on another algorithm which aims to leverage the characteristics of tree-based topologies to better compress the desired paths.
We have implemented XPath on both Windows and Linux platforms, and deployed it on a 54-server Fattree testbed with commodity switches for experiments. In what follows, we describes the implementation on Linux. The implementation on Windows is similar.
Path ID resolution: To enable path ID resolution, we implemented a XPath Kernel Module on the end server, and a module on the XPath manager. The end server XPath module queries the XPath manager to obtain the updated path IDs for a destination. The XPath manager returns the path IDs by indexing the IP-to-ID mapping table. From the path IDs in the query response, the source selects one for the current flow, and caches all (with a timeout) for subsequent communications. As shown in Figure 2, in our implementation, the XPath Kernel Module on end servers consists of two parts: a Netfilter Hook and a Path Selector. The XPath Kernel Module is between the TCP/IP stack and the Linux Traffic Control. We use the Netfilter Hook to parse the incoming/outgoing packets, and to intercept the packets that XPath is interested in. The Path Selector is responsible for path selection and packet header modification (IP translation for Linux platform, and IP-in-IP encapsulation or decapsulation for Windows platform).
Failure handling: when a link fails, the devices on the failed link will notify the XPath manager. In our implementation, the communication channel for such notification is out-of-band. The path IDs for a destination server are distributed using a query-response based model. After the XPath manager obtains the updated link status, it may remove the affected paths or add the recovered paths, and respond to any later query with the updated paths. A key benefit of XPath is that it does not require route re-convergence and is loop-free during failure handling. This is because XPath pre-installs the backup paths and there is no need to do table re-computation unless all backup paths are down.
6. XPath Applications
We use two utility cases to show that applications can benefit from the explicit path control of XPath. More cases can be found in our paper.
- XPath for provisioned IOPS: In cloud services, there is an increasing need for provisioned IOPS. For example, Amazon EBS enforces provisioned IOPS for instances to ensure that disk resources can be accessed with high and consistent I/O performance whenever you need them. To enforce such provisioned IOPS, it should first provide necessary bandwidth for the instances. In this experiment, we show XPath can be easily leveraged to use the explicit path with necessary bandwidth. As shown in Fig. 3(a), we use background UDP flows to stature the ToR-Agg links and leave the remaining bandwidth on 3 paths (P1; P2 and P3) between X-Y as 300Mpbs, 100Mbps, and 100Mbps respectively. Suppose there is a request for provisioned IOPS that requires 500Mbps necessary bandwidth (The provisioned IOPS is about 15000 and the chunk size is 4KB.). We now leverage XPath and ECMP to write 15GB data (about 4 million chunks) through 30 flows from X to Y, and measure the achieved IOPS respectively. From Fig. 3(c), it can be seen that using ECMP we cannot provide the necessary bandwidth between X-Y for the provisioned IOPS although the physical capacity is there. In contrast, XPath can be easily leveraged to provide the required bandwidth due to its explicit path control. Thus, the actual achieved IOPS under ECMP is only 4547, while with XPath, we explicitly control how to use the three paths and accurately provide 500Mbps necessary bandwidth, achieving 15274 IOPS.
- XPath for network updating: In production data centers, DCN update occurs frequently. It can be triggered by the operators, applications and various networking failures. zUpdate is an application that aims to perform congestion-free network-wide traffic migration during DCN updates with zero loss and zero human effort. In order to achieve its goal, zUpdate requires explicit routing path control over the underlying DCNs. In this experiment, we show how XPath assists zUpdate to accomplish DCN update and use a switch firmware upgrade example to show how traffic migration is conducted with XPath. In Fig. 4(a), initially we assume 4 flows (f1; f2; f3 and f4) on three paths (P1; P2 and P3). Then we move f1 away from switch A1 to do a firmware upgrade for switch A1. However, neither P2 nor P3 has enough spare bandwidth to accommodate f1 at this point of time. Therefore we need to move f3 from P2 to P3 in advance. Finally, after the completion of firmware upgrade, we move all the flows back to original paths. We leverage XPath to implement the whole movement. In Fig. 4(b), we depict the link utilization dynamics.
- Explicit Path Control in Commodity Data Centers: Design and Applications. Shuihai Hu, Kai Chen, Haitao Wu, Wei Bai, Chang Lan, Hao Wang, Hongze Zhao, Chuanxiong Guo In Proceedings of USENIX NSDI 2015
8. Source code
- Compression algorithm: https://github.com/shuihaihu/XPath. Currently, the code is not well commented and documented. A better version is coming soon.
- Implementation on Linux platform: https://github.com/baiwei0427/XPath.