| Turbolinux Cluster LoadBalancer 10: User Guide | ||
|---|---|---|
| <<< Previous | Next >>> | |
This chapter will cover some of the basic concepts that will be required in order to understand how Turbolinux Cluster LoadBalancer 10 works. You will need to understand these concepts in order to make the most of the product. It will also help you to understand your options when configuring a cluster.
We will look at the following topics:
What is a cluster?
Components that make up a cluster
The various types of clusters
How a cluster works
How to manage a cluster
Methods of sharing data between systems
A cluster is a group of individual computer systems that can be made to appear as one computer system. While that definition may sound simple, there are several other similar technologies. The differences between the technologies can be quite subtle.
Computer clustering has been around in various forms since the 1980s, originating on the Digital VAX platform. The VMS operating system and VAX hardware combined to provide clustered services. These VAX clusters were able to share hardware resources, such as disk space, and were able to provide computing resources to multiple users.
In this section, we'll take an in-depth look at what it means to be a cluster. Then we'll give an overview of some of the related parallel processing technologies in order to draw some distinctions.
Clustering is just one form of parallel computing. One of the key points that distinguishes clustering from other related technologies is the ability to view the cluster as either a single entity or a collection of stand-alone systems. For example, a cluster of web servers can appear as one large web server, but at the same time, individual systems within the cluster can be accessed as individual systems, if desired.
Because each system in the cluster is a separate computer, each has its own hardware, operating system, and software. Clusters can be either homogeneous, with all the systems running the same software on similar hardware. They can also be heterogeneous, with systems within the cluster running different operating systems on various hardware.
Clustering falls within a continuum of parallel processing techniques. The primary distinctions are based on the level at which resources are shared or duplicated. At the lowest level, a system will have multiple processors on a single motherboard, and share everything else. At the other end of the spectrum, distributed processing employs multiple computers, but the system is generally not viewed as a single entity.
Some parallel processing methods are (from tightest binding to loosest):
SMP
NUMA
MPP
Clustering
Distributed processing
We'll cover each of these in this section, except for clustering, which we have already covered.
Multi-processor systems today are generally of the symmetric type. This means that no one processor is any more important than the others, and all resources are equally available to all the processors. Systems of this type are called symmetric multi-processing, or SMP. A single computer has multiple CPUs but a single shared memory space and shared I/O facilities.
The idea behind SMP is to transparently break down a computing problem into concurrent processes and allow these to execute on separate processors within the same machine. The emphasis here is on transparency -- the same program can run time-sliced on a single processor machine, and the development tools need not even be aware of the underlying parallelism.
On an SMP machine, the operating system itself is responsible for dividing up the individual processes making up an application among the available CPUs. SMP machines are best used with operating systems and programs that use threading or light-weight processes. Windows NT is heavily thread-based, and Linux processes are fairly light-weight, so both scale fairly well on SMP hardware.
SMP systems with two or four processors are fairly simple to build. Anything beyond that becomes rather difficult, because the processors all need to be able to access all the I/O and memory resources. Beyond four processors, these shared resources start to become a bottleneck, and adding more CPUs provides diminishing returns.
SMP computers use a memory sharing scheme in which each processor has the same level of access to all the physical memory in the computer. Such a scheme is known as uniform memory access, or UMA. NUMA (non-uniform memory access) is a more complex technique which allows several processors in a multi-processor computer to share local memory in a more efficient manner than in simple SMP. Each CPU has direct fast access to a single memory area but can access other memory areas on the system with less immediate access.
The basic idea of NUMA is to give certain processors an advantage in accessing a given range of physical memory. You can think of a NUMA machine as a sort of intermediate step between simple SMP machines and massively parallel systems. Access to any part of the memory is possible on a NUMA system; it just may take more time to access some memory addresses than others. However, the time to access the non-local memory will still be faster than accessing disk or network I/O.
The system bus on a NUMA machine is quite complicated. It is often implemented as a mesh, with many connections to the bus. Coherency is also a major issue. You may see the term ccNUMA, which indicates that the system maintains cache coherency. When a CPU is accessing memory, the cache internal to all the other processors must be checked to make sure that they have not modified the data that is being retrieved.
NUMA systems try to optimize the main issue with parallel computing: inter- processor communication. In clusters and massively parallel systems, the overhead of communicating between processors is quite high, because the communication must travel across a network of some sort. NUMA uses a high- speed memory bus to communicate via the shared memory. While the speed of accessing non-local memory is not as high as that of a local memory access, it is much higher than communicating over the network.
NUMA machines scale very well to a large number of processors -- thus they can sometimes rival the performance of massively parallel systems for calculation throughput. The downside is that, as you might imagine, the design of these machines involves extremely complex algorithms based on nano-split second timings and arbitration schemes. Thus they tend to be rather expensive machines. However, they have a great advantage -- from the perspective of the application software -- all the complex memory arbitration among processors is invisible. Massively parallel systems are blindingly fast but almost require a per-problem configuration of the machine to take advantage of the speed. NUMA trades off some efficiency for simplicity of development tools and transparency of resources.
Massively parallel processing (MPP) is the heavyweight of the parallel computing world. In the MPP model, each node consists of a separate processor with its own dedicated resources. The idea of an MPP system is to break a computing problem down into parts that can be separately computed more or less independently of each other. Likewise, the architecture of the system has units that are fairly independent. Massively parallel systems are usually used for high-end compute-intensive operations. For example, the current record holder as the world's fastest computer is an MPP system used to create a mathematical model to simulate a nuclear blast.
MPP is very closely related to clustering, but each node in an MPP system does not usually have full I/O capabilities. Thus each node in an MPP system may not be a viable stand-alone computer. An MPP system is usually larger than a typical cluster, but projects such as Beowulf are definitely blurring the distinctions.
One of the problems with MPP is that programs must be written specifically for parallel systems. (This is also a problem with some types of clusters, including Beowulf.) There are two common APIs that are used: PVM and MPI. These APIs concentrate on breaking down a problem into chunks that can be computed in parallel. Thus, if the problem to be solved cannot be broken down in this way, an MPP system will not be of much help.
Distributed processing is probably the least well-defined of all the terms we have covered here. Distributed processing basically means that parts of the work to be done are done in different places. The most common example of distributed processing is the client/server architecture. The server has a specific job to perform, while the client performs another portion of the task, generally the task of displaying the information to the user.
A distributed system is more loosely coupled than a cluster. In fact, it is usually difficult to see any coupling at all. There generally isn't any single entity that would be managed as a whole. With distributed processing, nodes retain their individual identity, while cluster nodes are usually anonymous. In a distributed processing system, you would say, "give me data X from server Y." In a cluster, you would say, "give me data X from the cluster."
| <<< Previous | Home | Next >>> |
| Requirements | Components of a Cluster |