Introduction to Cluster Computing

this page compiled by John Akred for the Underdog Group Project
DS420 - Fall 2000, Professor Clark Elliott

What is a cluster?

A cluster generally refers to a network of workstations used for distributed processing. They are typically defined by the number and type of nodes (workstations), network topography, and control structure. There is a huge variety of implementations in the cluster computing world. These range from 4-node clusters running commodity off-the-shelf software on commodity hardware assembled by hobyists, to 128-node clusters made up of high-performance dual-cpu, symetric-multiprocessing machines running proprietary operating systems on cutting-edge networks designed for specific kinds of processing tasks.

A useful scheme for classifying cluster might be distinguishing based upon: A) locality; and B) heterogenity. The most powerful, highly-developed clusters tend to be those on fast, sometimes specialized, networks of identicle nodes. Collabartive processing efforts such as SETI@HOME could be thought of a cluster in the most general sense. "Clusters" of this type are widely dispersed and extremely heterogenious. The project is to analyze data collected in an effort to identify extra-terrestial intelligence by letting users contribute their computing resources via the internet. The project shows it is possible to use a dynamic collection of disparate computer platforms on the internet to solve a problem. "Nodes" in this project could be anything from a modest home PC to an enthusiasts dream machine or even academic resources communicating with a central server over varying internet connections.

A Network of Workstations (NOW) is a type of cluster where the computers that make up a cluster are not used solely by the cluster. A NOW is typically used to take advantage of spare processor cycles on workstations connected to a network. The workstations on a NOW tend to be for purposes other than using them as a cluster. This could be an important area of further developments in cluster computing as large organizations with corresponding large networks of workstations attempt to utilized clock cycles on CPUs currently going to waste. This type of arrangement is already common in companies like digital special effects houses. Overnight, rendering is spread accross the workstations used in the day by artists for illustration and editing.

One of the most popular types of clusters is the Beowulf. The original Beowulf cluster was developed at The Center of Excellence in Space Data and Information Sciences (CESDIS), a division of the University Space Research Association at the Goddard Space Flight Center in Greenbelt, Maryland by Thomas Sterling and Don Becker. There is now a Beowulf open-source project to continue the development of software for making the clusters tick. Beowulfs are characterized, among other things, by the use of mostly commodity hardware and free software. It is also defined by the use of at least one server node which controls any number of client nodes. The Beowulf project members themselves distinguish Beowulf clusters by whether or not they are made up of entirely commodity products, or mostly commodity products with some specialized hardware and/or software.

Why a cluster?

There has always been a debate about what kind of fighter jet would be developed to serve the U.S. Air Force's future needs. The debate centers on whether it is best to have a relatively small number of extremely high performance planes, or a larger number of smaller, cheaper, less capable planes.. There are particular resource constraints to this problem - pilots require enormous amounts of training, and we would like to keep them alive. There are also particular environmental factors - the specificity of the role of a fighter, and the likelyhood that jets will be used in unanticipated ways. Probably due to these factors, the high performance plane has won the day of late, as evidenced by the new crop of jets coming on line in the last 10 years.

There is a similar argument to be made in the world of computation. Is it best to attack computational problems with an extremly powerful supercomputer, or a larger number of modest ones. Up until the mid 1990's, supercomputers had definitely ruled the roost. Now we are seeing a shift towards the other approach. Clusters are becoming more popular, and mainframe manufacturers are going out of business or re-inventing their companies. Why is this happening? What has changed in the last 10 years to precipitate this shift? In other words, why would you want a cluster and not a supercomputer?

Price/Performance

The most obvious benefit of clusters, and the most compelling reason for the growth in their use, is that they have significantly reduced the cost of processing power. One indication of this phenomenon is the Gordon Bell Award for Price/Performance Acheivment in Supercomputing, which many of the last several years has been awarded to Beowulf type clusters. One of the most recent entries, the Avalon cluster at Los Alamos National Laboratory, "demonstrates price/performance an order of magnitude superior to commercial machines of equivalent performance."

This reduction in the cost of entry to high-power computing (HPC) has been due to commodification of both hardware and software over the last 10 years particularly. All the components of computers have dropped dramatically in that time. The components critical to the development of low cost clusters are:

  1. Processors - commodity processors are now capable of computational power previously reserved for supercomputers, witness Apple Computer's recent add campain touting the G4 Macintosh as a supercomputer.
  2. Memory - the memory used by these processors has dropped in cost right with the processors.
  3. Networking Components - the most recent group of products to experience commodification and dramatic cost decreases is networking hardware. High-Speed networks can now be assembled with these products for a fraction of the cost necessary only a few years ago.
  4. Motherboards, busses, and other sub-systems - all of these have become commodity products, allowing the assembly of affordable computers from off the shelf components.

Several clusters assembled for under $50,000 are now on the top 500 list of supercomputers.

Demand for Processing Power

In addition to, or perhaps causing the supply side developments above has been the increasing demand for computational power. The rapid growth of the internet in particular, and the information technology industry in general, has created ever-increasing demand for processing power. It would appear that it is much easier to scale up commodity products to meet this demand than it is to mass-produce the supercomputers that would otherwise be needed for the task. The rate with which a supercomputer becomes obsolete is horifying, given the price. If you feel bad when your $2,000 PC is behind the times after a year, imagine spending several million on a supercomputer and experiencing the same thing.

This demand for processing power has increased while the efficiency of use of processing power in large organizations has not. Today, the workstations of employees at large corporations, govenment beaurocracies, and academic institutions sit idle at night while expensive mainframes perform the processing intensive tasks of the organization. The use of a network of workstations as a cluster could vastly increase the utilization of processing resources already owned by these organizations with investments in software only.

Other Factors

There are more subtle but perhaps ultimately more important advantages of cluster computing using commodity hardware and free software - code re-use and forward compatability. Using a supercomputer for commercial applications results in vendor lock-in. The specialized components used in supercomputers and their attendand specialize architecture, programming languages, compilers, etc. cause the users to be dependent on the vendor for everything. Combined with the dizzying pace of advancement needed to stay competitive in the field as a vendor, this causes the tools used by developers on these systems to remain in a "young" state perpetually.

The use of free, open source software on a standard platform can break this rule. The use of open source software on standard platforms allows the upgrade of processor type and speed, netwok topology, and software without causeing a change in the programming model. "With the maturity and robustness of Linux, GNU software and the 'standardization' of message passign via PVM and MPI, programmers now have a guarantee that the programs they write will run on future Beowulf clusters - regardless of who makes the processors or the networks." - [Merkey98]

 

For more information:

Part II. How do they work, what do you need to set one up? - Beumhyeung Rhee

Part III. Java implementations of cluster computing, how has java been used in clusters? - Yoonjung Ha and Kavan Mehta

Part IV. The ProActive PDC library for Parallel, Distributed, and Concurrent computing and metacomputing in Java - an example of a library designed for writing applications to run on a cluster - Justin Harmon

Part V. An example of a cluster running a simple application demonstrating its use - Hiren Desai

Links:

Beowulf Project:
beowulf.org
beowulf underground

Articles:
Series in Linux Journal
New Income Source for Geeks

Actual Clusters:
Avalon
Brahma
Grendel
Hermes
Loki
Naegling
Pondermatic
Schrimp
Top Cat