Morning Tutorials (8:30am - 12:00pm)
Tutorial 1 - Designing Next Generation
Clusters with Infiniband: Opportunities and Challenges,
D. Panda (Ohio State University).
Tutorial 2 - Using MPI-2: Advanced Features
of the Message Passing Interface, W. Gropp, E. Lusk,
R. Ross and R. Thakur (Argonne National Lab).
Afternoon Tutorials (1:30pm - 5:00pm)
Tutorial 3 - The Gridbus Toolkit for Grid
and Utility Computing, R. Buyya (University of Melbourne).
Tutorial 4 - Building and Managing Clusters
with NPACI Rocks, G. Bruno, M. Katz, P. Papadopoulos, and
F. Sacerdoti, (NPACI Rocks group at San Diego Supercomputer
Center), L. Liew and N. Ninaba (Singapore Computer Systems).
Short Descriptions
"Designing
Next Generation Clusters with Infiniband: Opportunities
and Challenges"
The emerging InfiniBand Architecture (IBA) standard is generating
a lot of excitement toward building next generation high
performance computing systems in a radical different manner.
This is leading to the following common questions among many
scientists, engineers, managers, developers, and users associated
with Cluster Computing:
1) What is InfiniBand Architecture?
2) How is it different from other on-going developments
and standardization effort such as Virtual Interface Architecture
(VIA), PCI-X, Gigabit Ethernet, Rapid I/O, Hyper-transport,
3GIO/PCI-Express, TCP off-load engines, etc.?,
3) How does it perform compared to other proprietary cluster
interconnects (Myrinet and Quadrics)?
4) What unique features and benefits does IBA bring to designing
next generation cluster computing systems?
5) How to exploit novel features of InfiniBand to build
clusters for high performance computing as well as cluster-based
servers and datacenters?
This tutorial is designed to provide answers to the above
questions. We will start with the background behind the origin
of the IBA standard. Then we will make the attendees familiar
with the novel features of IBA (uniform treatment of interprocessor
communication and I/O; provision for multiple transport services
and mechanisms to support QoS and protection in the network;
hardware support for remote DMA, atomic, and multicast operations;
support for virtual lanes and service levels; and support
for low latency communication with Virtual Interface). We
will compare and contrast the IBA standard with other on-going
developments/standards. We will show how the IBA standard
facilitates the next generation computing systems to be designed
not only to deliver high performance but also RAS (Reliability,
Availability, and Serviceability). Open research challenges
in designing communication and I/O subsystems of next generation
clusters with IBA will be outlined. Challenges in designing
clusters for standard high-performance computing (with different
programming models such as MPI, DSM, Get/Put) as well as
cluster-based servers/file systems and data centers will
be outlined. Performance numbers obtained on clusters with
latest InfiniBand products and their comparisons with other
proprietary interconnects (Myrinet and Quadrics) will be
presented. The tutorial will conclude with an overview of
on-going IBA related research projects, IBA products, and
the market time frame for the IBA products.
Speaker Bio:
Dhabaleswar K. Panda is a Professor of Computer
Science at the Ohio State University. He obtained his Ph.D.
in computer engineering from the University of Southern California.
His research interests include parallel computer architecture,
high performance computing, user-level communication protocols,
interprocessor communication and synchronization, network-based
computing, and Quality of Service. He has published over
110 papers in major journals and international conferences
related to these research areas. Dr. Panda and his research
group members have been doing extensive research on InfiniBand.
His research group is currently collaborating with Sandia
National Laboratory, IBM T.J. Watson, and leading InfiniBand
companies (Mellanox and InfiniSwitch) on designing various
subsystems of next generation High Performance Computing
systems with InfiniBand. The MVAPICH (MPI over VAPI for IBA)
package developed by his research group (http://nowlab.cis.ohio-state.edu/projects/mpi-iba/)
is being used by many organizations world-wide to extract
the potential of IBA-based clusters for HPC applications.
Dr. Panda has served on Program Committees
and Organizing Committees of several parallel processing
and high performance
computing conferences and on editorial boards for several
parallel processing journals. He was General Co-Chair for
the 2001 International Conference on Parallel Processing;
Program Co-Chair of the 1999 International Conference on
Parallel Processing, 1997 and 1998 Workshops on Communication
and Architectural Support for Network-Based Parallel Computing
(CANPC); Program Co-Chair of the Int'l Workshop on Communication
Architecture for Clusters (CAC '01 and CAC '02); an Associate
Editor of the IEEE Transactions on Parallel and Distributed
Computing; Co-Guest-Editor for two special issue volumes
of Journal of Parallel and Distributed Computing on "Workstation
Clusters and Network-based Computing"' an IEEE Distinguished
Visitor Speaker and an IEEE Chapters Tutorials Program Speaker.
Currently, he is serving as a Program Co-Chair of International
Workshop on Communication Architecture for Clusters (CAC
'03). Dr. Panda is a recipient of the NSF Faculty Early CAREER
Development Award, the Lumley Research Award (1997 and 2001)
at the Ohio State University, and an Ameritech Faculty Fellow
Award. Dr. Panda is listed as a distinguished scientist in "Who'sWho
in America" and in "American Men & Women of
Science".
"Using
MPI-2: Advanced Features of the Message Passing Interface"
This tutorial is about how to use MPI-2, the collection
of advanced features that were added to MPI (Message-Passing
Interface) by the second MPI Forum. These features include
parallel I/O, onesided communication, dynamic-process management,
language interoperability, and some miscellaneous features.
Implementations of MPI-2 (or significant subsets thereof)
are now available both from vendors and from open-source
projects. For example, the one-sided communication functions
of MPI-2 are being used successfully in applications running
on the Earth Simulator. In other words, MPI-2 can now really
be used in practice.
This tutorial explains how to use MPI-2, particularly, how
to use it in a way that results in high performance. We present
each feature of MPI-2 in the form of a series of examples
(in C, Fortran, and C++), starting with simple programs and
moving on to more complex ones. We also discuss how to combine
MPI with OpenMP. We assume that attendees are familiar with
the basic message-passing concepts of MPI-1.
The tutorial will feature a hands-on session in which attendees
will be able to run MPI-2 programs on their own laptops with
the latest version of MPICH2, which we will distribute on
CDs.
Speakers Bio:
William Gropp is a senior computer scientist
and associate division director in the Mathematics and Computer
Science Division at Argonne National Laboratory. His research
interests are in adaptive methods for PDEs, software for
scientific computing, and parallel computing. He was a member
of the MPI Forum from the beginning and was one of the chapter
authors in the MPI-2 standardization process. He is one of
the designers of MPICH and is a co-author of the books Using
MPI, Using MPI-2, and MPI - The Complete Reference: Volume
2, the MPI-2 Extensions.
Ewing Lusk is a senior computer scientist
in the Mathematics and Computer Science Division at Argonne
National Laboratory. His research interests are in portable
parallel-programming libraries, performance visualization,
and automated theorem proving. He was a member of the MPI
Forum from the beginning and played a leading role in the
MPI-2 standardization process. He is one of the designers
of MPICH and is a co-author of the books Using MPI, Using
MPI-2, and MPI - The Complete Reference: Volume 2, the MPI-2
Extensions.
Rob Ross is an assistant computer scientist
in the Mathematics and Computer Science Division at Argonne
National Laboratory. He received a Ph.D. in Computer Engineering
from Clemson University in 2000. His research interests are
in the area of high-performance computing, particularly cluster
computing and high-performance I/O. He is the primary author
of PVFS, a high-performance parallel file system for clusters,
and is currently involved in the implementation of the next-generation
MPICH.
Rajeev Thakur is a computer scientist in the
Mathematics and Computer Science Division at Argonne National
Laboratory. He received a Ph.D. in Computer Engineering from
Syracuse University in 1995. His research interests are in
the area of high-performance computing in general and high-performance
networking and I/O in particular. He was a member of the
MPI Forum and participated actively in the definition of
the I/O part of the MPI-2 standard. He is the author of ROMIO,
a high-performance, portable implementation of MPI I/O. He
is also a co-author of the book Using MPI-2 together with
Bill Gropp and Rusty Lusk.
"The Gridbus
Toolkit for Grid and Utility Computing"
Computational Grids enable the sharing, selection, and aggregation
of geographically distributed resources (such as computers,
data bases, scientific instruments) for solving large-scale
problems in science, engineering, and commerce. However,
application development, resource management, scheduling,
and supporting end-to-end quality-of-services (QoS) in these
environments is a complex undertaking. This is due to the
geographic distribution of resources that are owned by different
organizations having different usage policies and cost models,
and varying loads and availability patterns. To address these
challenges, we have developed distributed computational economy
framework for resource allocation and regulation of supply-and-demand
for resources. We applied this framework in the design and
development of scheduling systems that manage distributed
resources in a single administrative domain (cluster computing)
and also in multiple administrative domains (grid computing).
The Gridbus Project is engaged in the design and development
of cluster and grid middleware technologies for service-oriented
computing. They include visual Grid application development
tools for rapid creation of distributed applications, competitive
economy-based Grid scheduler, cooperative economy-based cluster
scheduler, Web-services based Grid market directory (GMD),
Grid accounting services, and a widely used GridSim toolkit.
These tools have been used in Grid-enabling applications
such as molecular docking and neuroscience and deploying
them for distributed proceedings on Global Grids.
This tutorial covers four topics. First, we briefly review
emerging trends in network-based high performance computing
and identify application development and resource management
challenges. Then, we introduce our framework on Grid Architecture
for Computational Economies (GRACE) that leverages existing
technologies such as Globus and provides new services that
are essential for constructing industrial-strength Grids.
We discuss Gridbus technologies and their use in Grid enabling
application, the use of our economic grid infrastructure
in scheduling parametric computations containing hundreds
of jobs for execution on the World Wide Grid (WWG) testbed.
Particular emphasis will be placed on Grid economy, how to
design and develop Grid technologies and applications capable
of dynamically leasing services of distributed resources
at runtime depending on their availability, capability, performance,
cost, and users' quality of service requirements. Finally,
we present the usage of tools in composition and distributed
execution of data-intensive applications (e.g., molecular
docking, brain activity analysis, and high-energy physics)
on the Grid to demonstrate capabilities of Gridbus system.
Speaker Bio:
Rajkumar Buyya is the founder and program
leader of the Grid Computing and Distributed Systems (GRIDS)
Laboratory in the Dept. of Computer Science and Software
Engineering at the University of Melbourne, Australia. He
is one of the creators of system software for PARAM Supercomputers
developed by the Centre for Development of Advanced Computing
(C-DAC), India. He has authored three books Microprocessor
x86 Programming, BPB Press, New Delhi, 1995, Mastering C++,
Tata McGraw Hill Press, New Delhi, 1997, and Design of PARAS
Microkernel. The books on emerging topics that he edited
include, High Performance Cluster Computing published by
Prentice Hall, USA, 1999; and High Performance Mass Storage
and Parallel I/O, IEEE and Wiley Press, USA, 2001. He has
published over 70 research articles in international conferences
and journals. For further information, please visit: http://www.buyya.com/
"Building
and Managing Clusters with NPACI Rocks"
NPACI Rocks is an open source clustering
distribution for heterogeneous x86 and IA64 HPC Linux Clusters.
The first
version of Rocks was released in November of 2000, and has
averaged 3-4 releases each year since. NPACI Rocks is unique
in that it is a complete “cluster on a CD” software distribution,
everything from the base Red Hat OS, to de facto standard
job schedulers and monitoring systems are included. Rocks
was designed with the goal of making clusters easy and accessible
to domain application scientists. To achieve this goal, Rocks
automates the common and most time consuming tasks of cluster
administration and deployment. NPACI Rocks software and documentation
is available at www.rocksclusters.org.
This tutorial assumes minimal experience with cluster administration
and use. Experience with common clustering toolkits is useful
but not required. We will cover the basic design and philosophy
of Rocks and the procedures to customize Rocks for unique
sites and non-HPC clustering roles. A hands-on lab will guide
participants through building their own cluster. Toward the
end of the lab, the individual clusters will be bound together
as a Grid where each participant will launch Grid-wide jobs.
Speakers Bio:
Greg Bruno is a Programmer
Analyst IV for the Cluster Development Group at the San
Diego Supercomputer
Center (SDSC). Mr. Bruno received his MS from UCSD in Computer
Science and he is currently enrolled in the PhD program.
For 10 years, Mr. Bruno worked with Teradata Systems developing
cluster management software for the systems that supported
the world’s largest databases. Recently, he has spent the
last 3 years helping to architect, design and implement the
Rocks Cluster Distribution, a freely available software stack
that enables domain-specific scientists to build and manage
their own clusters.
Mason Katz is Group Leader for Cluster Development
at the San Diego Supercomputer Center (SDSC). Mr. Katz received
his BS in Systems Engineering from the University of Arizona.
He worked for 5 years as an embedded software engineer on
networks of lightning detection sensors. Recently, he has
spent the last 6 years working at both the University of
Arizona, and UCSD/SDSC on projects ranging from network security
protocols, operating systems (x-kernel, Scout), and commodity
clustering (HPVM, NPACI Rocks).
Philip M. Papadopoulos is the Program Director
for Grid and Cluster Computing at the San Diego Supercomputer
Center (SDSC). Dr. Papadopoulos received his BA in Applied
Mathematics from UCSD, MS in Mechanical Engineering from
UC Berkeley, and PhD from UC Santa Barbara in Electrical
and Computer Engineering. He worked for over 5 years at Oak
Ridge National Laboratory as part of the PVM development
team before moving the Computer Science Department at UCSD.
In 1999, Dr. Papadopoulos joined SDSC as a group leader for
cluster development and started the NPACI Rocks Clustering
project. He has authored more than 25 peer-reviewed papers
on cluster and distributed computing, has served on the organizing
committees for several years on the SC conference series
and has given numerous invited talks on cluster and distributed
computing. Dr. Papadopoulos will be the technical program
chair of the IEEE Clusters 2004 conference. Dr. Papadopoulos
is deeply involved in several grid and cluster computing
research projects including OptIPuter, GEON, The National
Biomedical Computational Resource (NBCR), and the Biomedical
Informatics Research Network (BIRN).
Federico Secerdoti is currently a member of
the Cluster Development group, San Diego Supercomputer Center
(SDSC), at the University of California. Mr. Sacerdoti received
his BS in Computer Engineering fromWashington University
in St. Louis, and MS in Compute Science from University of
California, San Diego. Mr. Sacerdoti has been involved with
the Rocks cluster distribution effort at SDSC for over three
years, and has contributed to many HPC projects including
the Ganglia monitoring system and the KeLP parallel message-passing
libraries. His thesis work was in the area of dynamic cache
optimizations for parallel applications.
Laurence Liew graduated from the National University of
Singapore (NUS) with First Class Honors in Mechanical Engineering. He also holds
a Masters in Knowledge Engineering from the Institute of Systems Science in NUS.
Since 1998, Laurence has been actively involved with Linux and Beowulf
supercomputing. He was involved with the design and deployment of Beowulf
clusters in Singapore’s foremost R & D and academic institutions, and also the
deployment of enterprise Linux solutions in major government and commercial
organizations. Laurence joined Singapore Computer Systems in 2001 and presently
heads the SCS Linux Competency Centre (LCC) in Singapore, Malaysia and Thailand.
Under Laurence, the LCC have partnered world-class leaders in Linux and this
includes Red Hat as a Red Hat Certified Training and Education Centre, Oracle,
Sendmail, Computer Associates, Myricom, VMware and others.
Najib Ninaba graduated from Singapore Polytechnic with a
diploma in Computer Information Systems. Najib has been a Linux user, programmer
and administrator since 1996. He was introduced to Linux Beowulf supercomputing
in 2001 and have since been hooked on NPACI Rocks. Najib joined Singapore
Computer Systems in 2001 and currently leads the NPACI Rocks development in
Singapore. A co-developer of NPACI Rocks, Najib integrated the Parallel Virtual
Filesystem (PVFS) and Sun Grid Engine (SGE) into Rocks in 2002 and currently
maintains the PVFS, SGE, Myrinet and other packages for NPACI Rocks.
|