| Turbolinux Cluster LoadBalancer 10: User Guide | ||
|---|---|---|
| <<< Previous | Chapter 2. Clustering Concepts | Next >>> |
In order for two or more systems to provide the same access to the same data, they must have some way to share that data. This is actually a much more difficult thing to do than would appear at first glance. If the data changes frequently, there must be some way to keep all the systems synchronized. In this section we'll look at some software and hardware solutions that can be used to share data.
The easiest shared storage mechanisms are done through software. Unfortunately, the hardware solutions are more powerful and robust, but in many instances you will be able to use a simple software method to share data.
The most basic way of sharing data is by copying the data in question to each server. Of course, this will only work if the data is changed infrequently, and always by someone with administrative access to all the servers in the cluster.
Turbolinux Cluster LoadBalancer 10 comes with two synchronization tools. One is used to synchronize the configuration of the servers. The other is used to synchronize content. These tools can be run directly or accessed through the turboclusteradmin program. They will be covered in detail in Chapter 7. If you can use the synchronization tools to maintain data consistency, you will probably find them to be the easiest solution. They provide you with data redundancy without the need for any complex administration.
There are other replication methods available for data. One of the more common replication systems coming into use is the Lightweight Directory Access Protocol (LDAP). With LDAP, you can keep a database that is replicated across several systems. This provides a database system with redundancy and reliability, and is relatively easy to set up. LDAP is not a general-purpose database, and does not implement SQL. It is intended as a directory of network information and is object-based. However, you may find that it can be adapted to fit your needs.
If your data changes too frequently to do manual synchronization, you should consider using a distributed file system. Your options here include NFS, AFS, DFS, Coda, Intermezzo, and GFS.
UNIX and Linux systems typically use NFS to share data over the network. NFS is a well-known system and is easy to configure as a server or as a client. However, NFS has many problems. It does not have very good security and has no provisions for replicating the data to multiple systems. Thus, if you use NFS, you will most likely still have a single point of failure, which may be one of the reasons you wanted to create a cluster in the first place. Several newer distributed file systems have been developed to overcome the shortcomings with NFS, but none of them have become significant enough yet to replace NFS.
One alternative that has much in common with NFS while replacing its broken authentication mechanism is the Andrew File System (AFS). AFS is an outgrowth of the Andrew Project at Carnegie Mellon University in Pittsburgh. AFS is licensed commercial software. The most important aspect of AFS is its secure authentication mechanism, based on the Kerberos protocol. AFS has a number of other performance, usage, and administration enhancements that make it preferable to NFS, even in secured areas.
Closely related to AFS is Transarc's Distributed File System (DFS). Both are available commercially from Transarc. DFS is an enterprise-level shared storage solution with sophisticated replication and load balancing capabilities. A key design goal in DFS is transparency across domains and networks within an enterprise, allowing for easy centralized administration.
The Coda file system is an Open Source distributed file system that now comes with the Linux kernel. Coda is an attempt to create a system much like AFS, with some more modern features as well. It attempts to fix some of the availability problems by providing disconnected operation, server side replication, continued operation during partial network failures, and scalability and bandwidth adaptation features.
Intermezzo is another Open Source distributed file system. One of the advantages of Intermezzo is that it sits in a layer above the native file system, allowing you to use any native file system to store the data. It is more aware of modern computing environments and equipment capabilities than Coda. Like Coda, it stresses high availability, large scale replication, and disconnected networks. You can check it out at http://www.inter-mezzo.org/.
One of the best distributed file system solutions is the Global File System (GFS). This solution requires hardware support in addition to the file system software. The hard drives must be directly attached to all the systems participating in the file system (i.e. all the nodes in the cluster). This can be done using either double-ended SCSI or fibre-channel.
Most high-end shared storage systems are hardware based. The two primary technologies used are Storage Area Networks (SAN) and Network Attached Storage (NAS). Solutions can also be implemented using fibre-channel and double-ended SCSI chains.
A Storage Area Network (SAN) is a highly fault tolerant, distributed network in itself dedicated to the purpose of providing absolutely reliable data serving operations. Conceptually, a SAN is a layer which sits between application servers and the physical storage devices, which themselves may be NAS devices, database servers, traditional file servers, or near-line and archival storage devices. The software associated with the SAN makes all this back-end storage transparently available and provides centralized administration for it.
The main distinguishing feature of a SAN is that it runs as an entirely separate network, usually employing a proprietary or storage-based networking technology. Most SANs these days are moving towards the use of fibre-channel. It should be clear that implementing a SAN is a non-trivial undertaking. Administering a SAN will likely require dedicated support personnel. Therefore SANs will most likely only be found in large enterprise environments.
A NAS device is basically an old fashioned file server turned into a closed system. Every last clock cycle in a NAS device is dedicated to pumping data back and forth from disk to network. This can be very useful in freeing up application servers (such as mail servers, web servers, or database servers) from the overhead associated with file operations.
Another way to think of a NAS device is as a hard drive with an Ethernet card and some file serving software thrown on. The advantage of a NAS box over a file server is that the NAS device is self-contained and needs less administration. Another key aspect is that a NAS box should be platform independent. As an all-purpose storage device, a NAS box should be able to transparently serve Windows and UNIX clients alike.
Fail-over clustering would not be practical without some way for the redundant servers to access remote storage devices without taking a large performance hit, as would occur if these devices were simply living on the local network. Two common solutions to this problem are double-ended SCSI and fibre-channel.
Double-ended SCSI, also known as differential SCSI, exploits a redundancy in the design of SCSI to allow longer SCSI cables and thus make practical high speed outboard storage devices. On a single-ended SCSI cable, every other signal line is actually grounded. Double-ended SCSI uses these redundant ground lines to carry the same signal as the adjacent signal line, with the voltage inverted. The net effect is a signal with twice the strength and thus a much longer potential cable length, up to several feet, without signal loss. Double-ended SCSI suffices when the computers using the external device are more or less adjacent.
Fibre-channel interfaces actually use fiber optic cables to carry the encoded SCSI signals via laser light, in much the same way that high speed network interfaces do. These have essentially unlimited local range (up to 6 miles) at high bandwidth and are a key technology in implementing SANs. Of course they are quite expensive in comparison to strictly local interfaces.
| <<< Previous | Home | Next >>> |
| Cluster Management | Up | Installation |