Understanding RAID


In 1987, Patterson, Gibson and Katz at the University of California
Berkeley, published a paper entitled "A Case for Redundant Arrays
of Inexpensive Disks (RAID)" [1]. This paper described various
types of disk arrays, referred to by the acronym RAID. The basic
idea of RAID was to combine multiple small, inexpensive disk drives
into an array of disk drives which yields performance exceeding
that of a Single Large Expensive Drive (SLED). This array of drives
appears to the computer as a single logical storage unit or drive.

The Mean Time Between Failure (MTBF) of the array will be equal to
the MTBF of an individual drive, divided by the number of drives in
the array. Because of this, the MTBF of an array of drives would be
too low for many applications. However, disk arrays can be made
fault-tolerant by redundantly storing information in various ways.

Five types of array architectures, RAID-1 through RAID-5, were
defined by the Berkeley paper, each providing disk fault-tolerance
and each offering different trade-offs in features and performance.
In addition to these five redundant array architectures, it has
become popular to refer to a non-redundant array of disk drives as
a RAID-0 array.

Fundamental to RAID is "striping", a method of concatenating
multiple drives into one logical storage unit. Striping involves
partitioning each drive's storage space into stripes which may be
as small as one sector (512 bytes) or as large as several
megabytes. These stripes are then interleaved round-robin, so that
the combined space is composed alternately of stripes from each
drive. In effect, the storage space of the drives is shuffled like
a deck of cards. The type of operating environment determines
whether large or small stripes should be used.

Most multiuser operating systems today, like UNIX and Novell
Netware, support overlapped disk I/O operations across multiple
drives. However, in order to maximize throughput for the disk
subsystem, the I/O load must be balanced across all the drives so
that each drive can be kept busy as much as possible. In a multiple
drive system without striping, the disk I/O load is never perfectly
balanced. Some drives will contain data files which are frequently
accessed and some drives will only rarely be accessed. By striping
the drives in the array with stripes large enough so that each
record falls entirely within one stripe, the records will be evenly
distributed across all drives and the I/O load will be balanced.
All drives in the array will thus be kept busy during heavy load
situations. This allows each drive to work on a different I/O
operation, and thus maximize the number of simultaneous I/O
operations which can be performed by the array.

In single-user systems which access large records, small stripes
(typically one 512-byte sector in length) can be used so that each
record will span across all the drives in the array, each drive
storing part of the data from the record. This causes long record
accesses to be performed faster, since the data transfer occurs in
parallel on multiple drives. Unfortunately, small stripes rule out
multiple overlapped I/O operations, since each I/O will typically
involve all drives. However, operating systems like DOS do not
allow overlapped disk I/O and thus will not be negatively impacted.
Medical imaging and data acquisition are typical of long record,
single-user environments which can achieve performance enhancement
with small stripe arrays.

One drawback to using small stripes is that synchronized spindle
drives are required in order to keep performance from being
degraded when short records are accessed. Without synchronized
spindles, each drive in the array will be at different random
rotational positions. Since an I/O cannot be completed until every
drive has accessed its part of the record, the drive which takes
the longest will determine when the I/O completes. The more drives
in the array, the more the average access time for the array
approaches the worst case single-drive access time. Synchronized
spindles assure that every drive in the array reaches its data at
the same time. The access time of the array will thus be equal to
the average access time of a single drive rather than approaching
the worst case access time.


The five fault-tolerant RAID types, plus RAID-0, are described 
in the following paragraphs:

RAID-0 is typically defined as a non-redundant group of striped
disk drives without parity. RAID-0 arrays are usually configured
with large stripes, but may be sector-striped with synchronized
spindle drives for single-user environments which access long
sequential records. If one drive in a RAID-0 array crashes, the
entire array crashes. However, RAID-0 arrays deliver the best
performance and data storage efficiency of any array type.

RAID-1, better known as "disk mirroring", is simply a pair of disk
drives which store duplicate data, but appears to the computer as
a single drive. Striping is not used, although multiple RAID-1
arrays may be striped together to appear as a single larger array
consisting of pairs of mirrored drives. Writes must go to both
drives in a mirrored pair so that the information on the drives is
kept identical. Each individual drive, however, can perform
simultaneous read operations. Mirroring thus doubles the read
performance of an individual drive and leaves the write performance
unchanged. RAID-1 has been popularized at the system level by
Tandem Computers, through software by Novell Corporation, and in a
hardware implementation on the disk controller by DPT. RAID-1
delivers the best performance of any redundant array type in
multiuser environments.

RAID-2 arrays sector-stripe data across groups of drives, with some
drives relegated to storing ECC information. Since most disk drives
today embed ECC information within each sector, RAID-2 offers no
significant advantages over RAID-3 architecture. 

RAID-3, as with RAID-2, sector-stripes data across groups of
drives, but one drive in the group is dedicated to storing parity
information. RAID-3 relies on the embedded ECC in each sector for
error detection. In the case of a hard drive failure, data recovery
is accomplished by calculating the exclusive OR (XOR) of the
information recorded on the remaining drives. Records typically
span all drives, thereby optimizing disk transfer rate. Since each
I/O accesses all drives in the array, RAID-3 arrays cannot overlap
I/O and thus deliver best performance in single-user,
single-tasking environments with long records. Synchronized-spindle
drives are required for RAID-3 arrays in order to avoid performance
degradation with short records.

RAID-4 is identical to RAID-3 except that large stripes are used,
so that records can be read from any individual drive in the array
(except the parity drive), allowing read operations to be
overlapped. However, since all write operations must update the
parity drive, they cannot be overlapped. This architecture offers
no significant advantages over RAID-5.

RAID-5, sometimes called a Rotating Parity Array, avoids the write
bottleneck caused by the single dedicated parity drive of RAID-4.
Like RAID-4, large stripes are used so that multiple I/O operations
can be overlapped. However, unlike RAID-4, each drive takes turns
storing parity information for a different series of stripes. Since
there is no dedicated parity drive, all drives contain data and
read operations can be overlapped on every drive in the array.
Write operations will typically access a single data drive, plus
the parity drive for that record. Since, unlike RAID-4, different
records store their parity on different drives, write operations
can be overlapped.

RAID-5 offers improved storage efficiency over RAID-1 since parity
information is stored, rather than a complete redundant copy of all
data. The result is that any number of drives can be combined into
a RAID-5 array, with the effective storage capacity of only one
drive sacrificed to store the parity information. Therefore, RAID-5
arrays provide greater storage efficiency than RAID-1 arrays.
However, this comes at the cost of a corresponding loss in
performance.

When data is written to a RAID-4 or 5 array, the parity information
must be updated. There are two ways to accomplish this. The first
way is straightforward but very slow. The parity information is the
XOR of the data on every drive in the array. Therefore, whenever
one drive's data is changed, the other drives in the array which
hold data are read and XORed to create the new parity. This
requires accessing every drive in the array for each write
operation.

The second method of updating parity, which is usually more
efficient, is to find out which data bits were changed by the write
operation and then change the corresponding parity bits. This is
accomplished by first reading the old data to be overwritten. This
data is then XORed with the new data which is to be written. The
result is a bit mask which has a one in the position of every bit
which has changed. This bit mask is then XORed with the old parity
information which is read from the parity drive. This results in
the corresponding bits being changed in the parity information. The
new updated parity is then written back to the parity drive.
Although this may seem more convoluted, it results in only two
reads, two writes and two XOR operations, rather than a read or
write and XOR for every drive in the array.

The cost of storing parity, rather than redundant data, is the
extra time taken during write operations to regenerate the parity
information. This additional time results in a degradation of write
performance for RAID-5 arrays over RAID-1 arrays by a factor of
between 3:5 and 1:3. (i.e. RAID-5 writes are between 3/5 and 1/3
the speed of RAID-1 write operations.) Because of this, RAID-5
arrays are not recommended for applications in which performance is
important. (The exception to this is applications which never write
data.)


In summary:

RAID-0 is the fastest and most efficient array type but offers no
fault-tolerance.

RAID-1 is the array of choice for performance-critical,
fault-tolerant environments. In addition, RAID-1 is the only choice
for fault-tolerance if no more than two drives are desired.

RAID-2 is seldom used today since ECC is embedded in almost all
modern disk drives.

RAID-3 can be used in single-user environments which access long
sequential records to speed up data transfer. However, RAID-3 does
not allow multiple I/O operations to be overlapped and requires
synchronized-spindle drives in order to avoid performance
degradation with short records.

RAID-4 offers no advantages over RAID-5 and does not support
multiple simultaneous write operations.

RAID-5 is the best choice in multiuser environments which are
either not performance sensitive, or which do little or no write
operations. However, at least three, and more typically five drives
are required for RAID-5 arrays.


References:

[1] D. A. Patterson, G. Gibson, and R. H. Katz
"A Case for Redundant Arrays of Inexpensive Disks (RAID)",
Report No. UCB/CSD 87/391, University of California,
Berkeley CA 1987.


         For reprints, ask for Technology Focus Paper:
                     "Understanding RAID"
                Document Number MM-0096-001-A
                   from DPT Channel Marketing
                         (407) 830-5522

