Next: 7.1 Contributions Up: thesis Previous: 6.5 Summary Contents

7. Conclusions and Direction for Future Works

In this chapter, we summarize and conclude the thesis. To achieve our objective of using commodity clusters for supercomputing, this dissertation proposes the use of a realistic communication model for performance understanding, as well as for algorithm design and analysis. We have concentrated on identifying the essential properties of the communication architecture which have significant impact on the performance, and devising benchmark methodologies to quantify their performance characteristics. By organizing these architectural features as a performance parameter set, we have created a framework for the programmers to conduct performance understanding, performance calibration and performance prediction. We have applied this modeling framework in examining the performance characteristics of two implementations of the Directed Point lightweight messaging system. In particular, we have demonstrated the effectiveness of using the model as an evaluation tool in delineating the strength and weakness of these communication systems, as well as using the model as a emulating tool for assessing various design tradeoffs.

Our communication model is distinguished from other performance or abstract models on its supports of the congestion studies and pipelining communication, because our model is derived on the base of a resource-centric viewpoint. Pipelining communication is the key to achieve high-bandwidth data transfers in modern networks by maximizing the number of concurrent data movements. This is achieved by maximizing the uses of available network resources. However, without well coordination and scheduling, aggressive pipelining may lead to contention development, which results in performance loss. This is being shown in our studies of the congestive loss problem in Chapter 4. Based on the network buffering information provided by our model, we artificially controlled the degree of congestion happened on the cluster network. Observing through both modeling studies and experimental evaluations, we have examined how different buffering architectures interacted with our Go-Back-N reliable protocol and affected on the congestion development of the communication system when subjected to heavy congestion. Through these performance studies, we obtain valuable insights on guiding of the design of efficient reliable transmission protocol on top of the lightweight messaging systems.

Since all congestion problems are caused by the contention of resources, with the availability of the resource information described by our model, we can utilize these information to devise high-level communication schedules, which optimize the pipelining efficiency as well as guard against the congestion loss problem. In this study, we make use of the Complete Exchange operation to validate on the above conjecture. We have devised the Synchronous Shuffle Exchange algorithm, and have demonstrated by means of analytical and experimental studies that it is an optimal algorithm on any non-blocking network. To maximize the pipelining efficiency, we adopt a contention-free schedule at the packet level. This completely eliminates unnecessary startup and synchronization overheads.

Even with a contention-free communication scheme, other factors could introduce non-deterministic delays to well-scheduled communication events, such as variations in process scheduling and competition with high-priority system activities. We have shown that the buffering architecture of the switch plays a crucial role under such a circumstance. This is because, the synchronous shuffle exchange algorithm achieves high-performance by fully utilizing the network links, any variations in communication schedules will induce contention, and this is handled by the switch's buffers. However, the network buffers are scarce resources. Furthermore, we have demonstrated that switches with input-buffered architecture as well as the hierarchical networks are more vulnerable to these non-deterministic delays. To solve this problem, this dissertation proposes the use of our model parameter ( $B_{L}$ ) to derive a congestion control scheme, which limits the traffic loads and ensures a fair sharing of network resources that avoids buffer overflow. To improve the effectiveness of the congestion control scheme when working on the hierarchical network, we have incorporated information on the network topology to devise a contention-aware permutation. This permutation scheme generates a communication schedule, which is both node and switch contention-free as well as distributing the network loads more evenly across the hierarchy. This relieves the congestion build-up at the uplink ports and improves the synchronism of the traffic information exchange between cluster nodes.

Subsections

Next: 7.1 Contributions Up: thesis Previous: 6.5 Summary Contents