Next: 4.2 Reliable Transmission Protocol Up: 4. Congestive Loss on Previous: 4. Congestive Loss on Contents

4.1 Reliability

The development of lightweight messaging systems is to support low-latency communication on high-performance computer. These systems aim at delivering the best communication performance to the application layers. However, subtle issues in design and implementation of these communication libraries could impact on their usability and the ultimate performance available to the application level. One important aspect that most of these lightweight messaging systems have trade-off for performance is the reliability issue, which is a critical issue that affects the final performance delivered to the real applications.

Most clusters use standard system software to manage message passing. Particularly, TCP/IP protocol suite is the most commonly used communication protocol found in system software that delivers reliability to the application layer. However, TCP was initially designed to run on unreliable, wide-area network, and therefore, it is not optimized for the high-performance domain. Many performance studies on TCP/IP implementations [24,54,55] have reported on the high software overheads associated to those supporting functions of this protocol stack, e.g. data movement and buffer management overheads. They showed that these software overheads are the major hindrance to the achievable performance, especially when most of the data traffics are small size messages. As a result, all low-latency communication libraries have opted to bypass this layer and provide their own lightweight communication schemes. However, this brings up another question on how these new communication systems are going to handle the reliability issue.

When talking about high-performance cluster computing, people commonly have the following assumptions on the cluster interconnect. First, the interconnection networks are composed of commodity local area network (LAN) or system area network (SAN), which are characterized as low propagation delay, high bandwidth networks. Second, they assume that the underlying network is almost reliable, such that the underlying hardware has extremely low transmission error rate. Based on these assumptions, different communication packages provide difficult levels of reliability. For some low-latency communication packages, they just offer an unreliable programming interface, and require higher level software to think over on the reliability issue. On the other hand, some packages assume that the hardware is reliable, all data loss issues are the result of receiver buffer overruns. Therefore, an effective measure to prevent buffer overrun is by using of flow control mechanisms, e.g. sliding window and/or credit-based flow control.

While choosing to provide an unreliable or a lightweight reliable communication interface to the higher level applications is a decision issue that strives for the best practical compromise; however, one of the major achievements of the TCP protocol is its congestion control mechanism. Previous studies on high-speed LANs [55,79] have shown that the high overheads in traditional communication software limit the ability to generate load; therefore, the observed contention problem was not critical. With the availability of low-latency communication mechanisms, applications can now generate higher load to the network, that could result in congestion build up in some part of the network, and of the worst scenario, this would induce congestion loss problem. As the TCP/IP protocol suit is not included in these lightweight messaging systems, the system designers have to consider about supporting the congestion control mechanism on these communication systems.

Some authors of these low-latency communication packages argued that their underlying networks support link-level flow control, transient buildup of congestion can be spotted and back-pressure could be applied link-by-link all the way back to the sender; therefore, avoid data loss problem in the network. This may be true on some type of interconnects, e.g. Myrinet, but is a dangerous assumption for Ethernet type network. Since there could have data loss problem in the network, having end-to-end flow control alone could not resolve the problem, an error recovery mechanism should be incorporated in these low-latency communication schemes so as to provide a reliable communication interface to higher-level applications.

Next: 4.2 Reliable Transmission Protocol Up: 4. Congestive Loss on Previous: 4. Congestive Loss on Contents