Next: 6.5 Summary Up: 6. Complete Exchange on Previous: 6.3 Experimental Analyses Contents

6.4 Related Work

The uses of models in designing algorithms, in particular communication algorithms, are not uncommon in the parallel community [53,76,73,47,88]. These algorithms are mostly shown to be efficient or optimal with respect to the based models by using paper and pen or simulations. For example, Karp et al. [53] had designed an optimal k-item broadcast algorithm based on the LogP model [30]. This algorithm is presented as a communication schedule where the root sends each data item only once, but alternates among recipients in order to retain the logarithmic depth of a tree broadcast. The principle behind this communication schedule matches with our synchronous shuffle exchange algorithm, as both schedules try to arrange the communications such that a processor would only at most send out one data item and receive one data item in one communication step. However, as demonstrated in our experimental studies, these types of communication schedules may be too idealistic, which demand to have logical synchronism in order to achieve the optimal bounds; therefore, it is hard to achieve the claimed performance on a real platform unless more control of the communications is enforced.

Similar to our contention-aware permutation, Nupairoj et al. [73] has reported that to ensure the efficiency of their optimal multicast algorithm, which is built on top of a parameterized model [76], it is necessary to consider some architecture-dependent characteristics of a system. This is because in real network, network contention is likely to occur if concurrent message transmissions are not scheduled properly. Their approach makes use of the topological information from the mesh or multistage networks, to order those communications in the multicast tree to avoid network contention. This supports our belief that optimal performance achievable on parallel machines can be first designed by using architecture independent model, and then perform performance tuning of an implementation on the based of some architecture dependent characteristics.

Donaldson et al. [34] also reported that the BSPlib, a communication library for BSP [105] programming, is using information related to the global state to improve the collective use of the communication system. Due to the synchronous nature of the BSP machines, communications are constrained to take place only in certain stages. This constraint, although limits its flexibility, can transform to enhancement on communications as each processor can infer the global state of the communication in which it is involved. For example, a BSP process knows which other processes are about to communicate, and how much they plan to send, and then can estimate the global network loading; this in turn, can be used to determine the ideal schedule and transmission rate. Their objective is similar to our approach, but we are working on an asynchronous model, which makes use of the buffering information and communication pattern to devise the corresponding information.

We make use of the taxonomy introduced by Yang et al. [120] as a reference for comparing different congestion control schemes. Broadly speaking, our global window congestion control mechanism is an anticipatory, closed loop control scheme with implicit feedback of global information. By anticipatory control, this means that the control is a congestion avoidance scheme, which tends to drive the network toward the optimal operating point but without falling into the danger of congestion. With closed loop control, this stands for the use of implicit feedback information to the sources as the control decisions in regulating the traffics. In our case, the control information is derived from the total number of unacknowledged packets for all communicating partners. Under the same terminology, the slow start scheme used in the TCP [51] could be considered as a reactive, closed loop control with implicit feedback, which based on the round-trip delay of acknowledgements [120] on individual end-to-end flow. Besides using the acknowledgements as the feedback control, other means of feedback can be used. For example, the Warp control scheme [77] makes use of a time-stamp based measure, called Warp, to monitor network utilization, and to control the injection rates for achieving optimal utilization. However, like the slow start scheme, it is also a reactive scheme that based on end-to-end feedback information.

Being an anticipatory scheme, our proactive approach is similar to the congestion control schemes used in some of the ATM networks. These schemes usually operate through input rate regulation [89], which relies on the underlying network devices to support and manage the resource allocation and scheduling. For example, the use of explicit feedback signals that generated or set by the congested devices to alert other network devices or communicating nodes on the growing congestion. With these control schemes, during the call-setup procedure, the information sources are assigned with some predefined rate, and are forced to limit their average input rate below this value. This ensures that no source will exceed for an extensive period of time the rate provided by the network and avoids congestion.

Instead of using the buffer resources as a guide to monitor the congestion problem, other approaches have been proposed to explicitly manage the allocation of scarce resources to different purposes. For example, Roy et al. [87] propose the uses of Quality of Service (QoS) mechanisms at the application level to manage contention, so as to improve the performance of MPI applications. They argue that if appropriate mechanisms can be provided for expressing application requirements, for arbitrating between different requirements, for enforcing allocations, and for providing feedback to applications concerning achieved performance, then applications can adapt their behavior according to resource availability. However, their approach require that the underlying network supports the QoS mechanisms.

Next: 6.5 Summary Up: 6. Complete Exchange on Previous: 6.3 Experimental Analyses Contents