next up previous contents
Next: 1.2 Thesis Statement and Up: 1. Introduction Previous: 1. Introduction   Contents

Subsections

1.1 Commodity Supercomputing

Speed is the main reason why we opt for the use of computers - both serial and parallel computers. Building parallel computers to solve large-scale and complex problems bear the same objective - we want to speed up the computations. Since parallel computing is becoming more and more accessible through the advances in processors and network technologies, the use of parallelism is no longer the privilege of a few scientists with access to powerful supercomputers. Now we have pile of PCs sitting in the server room, and they are interconnected by high-speed LAN, these setups become our in-house parallel computer. However, is that the answer to our quest for more processing power? No, this is not. Building an efficient and high-performance cluster is not that simple as plugging in and setting up all components. To construct a high-performance cluster as well as to orchestrate its processing power, we certainly need to adopt a more scientific approach.

1.1.0.0.1 Performance Understanding

Performance understanding is a process which is used to determine explanations on the performance behavior of a parallel application on a parallel machine. The development of efficient parallel applications depends upon a realistic prediction of application behavior, which requires in-depth understanding of both the application and the architecture characteristics [3,29,52]. For example, if the communication overhead of a parallel application dominates its execution time, the programmer may try to improve the communication schedule by rearranging the communication events to reduce the communication overhead. On the other hand, the system designer may try to improve the communication system to achieve the same performance goal. To determine how to perform performance tuning, both the programmer and system designer need to acquire detail knowledge on the operation costs, as well as to identify the potential bottleneck(s).

Therefore, the more we understand a parallel system and its performance characteristics, the more information we have to improve the efficiency of applications, as well as to provide insights on guiding enhancements to the system. This dissertation describes an attempt to improve or accelerate the performance understanding of the commodity clusters by focusing on the communication system. The network is the most critical path of a parallel system as its performance directly influences the capabilities of the system for high-performance computing. On the other hand, the communication efficiency of the network relies on the efficiency of the communication software that drives the interconnect. Therefore, understanding of the performance capabilities of the cluster communication system provides the conceptual framework we need to understand the performance of the cluster system. Here are the main issues related to the performance understanding:

1.1.0.0.2 Performance Issues on Communication

Traditional approach in assessing the performance of communication systems is by measuring the round-trip latency and the point-to-point bandwidth [20,55]. However, these simple figures

since these metrics only report of a single point-to-point measurement under a lightly loaded network. In fact, the network performance differs among applications and is depended on

  1. Communication patterns - Besides the one-to-one pattern, example patterns are the one-to-many, many-to-one and many-to-many communications, which are commonly known as the Collective Operations. Collective operations play an important role in the development of parallel applications. Different patterns have different performance characteristics and requirements, and demand specific communication algorithms that can efficiently utilize the underlying resources to achieve optimal results.
  2. Message sizes - Different factors dominate the communication performance for small, medium and large messages. For examples, with small messages, software overhead and network latency dominate the communication cost, while for large messages, network bandwidth is the more important factor in determining communication performance. For medium messages, they cannot completely amortize the software overhead as compared to long messages; therefore, they require better utilization of communication pipelines to hide away the software cost.
  3. Communication schedules - The communication schedule characterizes how we express the communication pattern in terms of communication primitives, such as send and receive operations. By designing communication schedules, we are able to structure our communications to match with the resource constraints and yield highly efficient implementations.
  4. Degree of contention - Mild contention may result in slight decrease in performance due to the higher queueing delays in the router. However, under prolonged contention, the network becomes heavily congested and this results in data loss.
All these communication issues contribute to the overall communication performance.

The performance problem related to the communication software has been an active research issue for the past decade. Currently, lightweight messaging systems [109,59,58,21,114,82,101] offer the best communication performance, as they create a fast communication path that bypasses the traditional in-kernel messaging protocol stack (e.g. TCP/IP), which is a serious obstacle in exploiting the high performance of modern network [54,55,9]. Although most of these lightweight messaging systems are successful in delivering the raw network performance to higher-level applications, their implementations, functionality and interfaces appear differently even though some are built on the same network technologies. There are issues that are not well addressed by these lightweight messaging systems, such as system behavior under heavy load and general purpose flow control. In particular, they lack of supports on guiding the development of high-level communication primitives atop of their lightweight messaging layer.

To better understand the network performance, we adopt the described systematic approach on performance understanding to explore how the underlying communication system supports these communication issues. In this dissertation, we make use of a communication model, which is derived on a resource-centric view of how data move across the abstract machine, to aid our quantitative and qualitative studies of the communication performance. This model is proposed as a ``mid-level'' tool, whose tasks are to map all high-level communication primitives properly onto the low-level architectural abstractions. Through a set of performance metrics, the programmers and system designers would understand the power, the strength and the weakness of their cluster communication system. In addition, by selecting appropriate metrics to analyze on the target application, this could bring out new insights as well as efficient quantitative prediction of the communication performance.

1.1.0.0.3 Limitations of LANs

Having a low-latency communication system that drives the cluster interconnect in high speed is the prerequisite but not the guarantee of achieving high performance. For example, most existing commercial switches adopt packet drop policy on buffer overrun, which is a known performance problem under heavy congestion. Studies on LAN performance [55,79] have shown that the high overheads in traditional communication protocols limit the ability to generate network load, thus the observed contention problem was not critical. With low-latency communication mechanisms, applications can now generate higher load to the network, which could result in congestion drop problem even under a well-balanced communication schedule.

The behavior of the networks in response to congestion becomes an important issue in understanding the communication performance. Depend on the switch architectures, type of network technologies and the host communication software, networks react to congestion in different ways. For instances,

With the use of commodity components as the cluster interconnect, the performance of the network depends on the type of network technologies (e.g. Fast Ethernet, Gigabit Ethernet, ATM, etc.), as well as depends on the internal architecture of a particular router or switch. This poses a challenge to the system designers and programmers on how to effectively utilize the communication network for high-performance communication.


next up previous contents
Next: 1.2 Thesis Statement and Up: 1. Introduction Previous: 1. Introduction   Contents