RAID-1 versus RAID-5:
        Picking the right array for your application


In 1987, a paper was written at the University of California
Berkeley defining various types of Redundant Arrays of Inexpensive
Disks, more commonly referred to as RAID. The basic idea of RAID
was to combine multiple small, inexpensive disk drives into a group
which yields performance exceeding that of one large, more
expensive drive. This array of small drives is made to appear to
the computer as a single virtual drive. In addition, the array can
be made fault-tolerant by redundantly storing information in
various ways.

Five types of array architectures, RAID-1 through RAID-5, were
defined, each providing fault-tolerance in case of a drive failure
and each offering different trade-offs in features and performance.
Of the five types, only RAID-1, RAID-3 and RAID-5 are commonly
used. (RAID-2 and RAID-4 do not offer any significant advantages
over these other types.) RAID-3 is designed for single-user
environments, such as imaging or data acquisition, which access
extremely large sequential records. This leaves RAID-1 and RAID-5
as the types applicable for multiuser applications such as Unix
platforms or fileservers.

RAID-1, better known as "disk mirroring", is simply a pair of disk
drives which store duplicate data, but appears to the computer as
a single drive. All writes must go to both drives in the mirrored
pair so that the information on the drives is kept identical. Both
drives, however, can perform simultaneous read operations.

RAID-1 arrays can be combined into larger groups by "striping" the
arrays together to appear as a single large drive. Each RAID-1
array's logical storage space is partitioned into "stripes" which
may be as small as one sector (512-bytes) or as large as several
megabytes. Logical stripes are then interleaved round-robin, so
that the resulting combined space is composed alternately of
stripes from each array. In effect, the stripes from each array are
shuffled like a deck of cards so that the I/O load will be balanced
across all the drives. One large array created from the combination
of smaller arrays is sometimes called a "dual-level" array.

RAID-1 arrays are probably the most commonly used arrays today. For
many years Novell has offered disk mirroring as an option on their
server software. Since 1988, DPT has offered hardware RAID-1 as a
higher-performance mirroring option on its ESDI and SCSI
controllers.

RAID-5 arrays, although not as frequently used as RAID-1, have
received much attention in the trade press. RAID-5 arrays, like
RAID-1, store redundant information, enabling them to survive a
disk failure and continue to operate. However, RAID-5 offers
improved storage efficiency over RAID-1. This is accomplished by
storing "parity" information, rather than a complete redundant copy
of all data. This parity information is generated by calculating
the XOR (exclusive or) of the data stored on every drive in the
array. The result is that any number of drives can be combined into
a RAID-5 array, with the effective storage capacity of only one
drive sacrificed to store the parity information. Therefore, RAID-5
arrays provide greater storage efficiency than RAID-1 arrays.
However, this comes at the cost of a corresponding loss in
performance.

When data is written to a RAID-5 array, the parity information must
be updated. There are two ways to accomplish this. The first way is
straightforward but very slow. The parity information is the XOR of
the data on every drive in the array. Therefore, whenever one
drive's data is changed, the other drives in the array which hold
data are read and XORed to create the new parity. This requires
accessing every drive in the array for each write operation.

The second method of updating parity, which is usually more
efficient, is to find out which data bits were changed by the write
operation and then change the corresponding parity bits. This is
accomplished by first reading the old data to be overwritten. This
data is then XORed with the new data which is to be written. The
result is a bit mask which has a one in the position of every bit
which has changed. This bit mask is then XORed with the old parity
information which is read from the parity drive. This results in
the corresponding bits being changed in the parity information. The
new updated parity is then written back to the parity drive.
Although this may seem more convoluted, it results in only two
reads, two writes and two XOR operations, rather than a read or
write and XOR for every drive in the array.

The cost of storing parity, rather than redundant data, is the
extra time taken during write operations to regenerate the parity
information. This additional time results in a degradation of write
performance for RAID-5 arrays over RAID-1 arrays by a factor of
between 3:5 and 1:3. (i.e., RAID-5 writes are between 3/5 and 1/3
the speed of RAID-1 write operations.) Because of this, RAID-5
arrays are not recommended for applications in which performance is
important. (The exception to this is applications which never write
data.)

In summary, RAID-1 is the array of choice for performance-critical,
fault-tolerant environments. In addition, RAID-1 is the only choice
if no more than two drives are desired. RAID-5 is the best choice
in environments which are either not performance sensitive or which
do no write operations. However, at least three, and more typically
five drives are required for RAID-5 arrays.


Calculating RAID performance:

To calculate the read or write performance of an array, the number
of simultaneous I/O operations which can be performed on the array
is divided by the time taken to perform an I/O operation. The
result is the number of I/O operations per second which can be
performed by the array.

The average time to access data on a disk drive is "s + p/2" where
"s" is the average seek time (the time it takes to move the heads
to the correct cylinder), and "p" is the rotational period of the
drive (the time it takes for the media to rotate once). "p" is
divided by two since, on average, the disk will make one half
rotation before finding the correct data.

The average time to do a read operation is equal to the access time
plus the time to transfer the data record from the media once the
head has been positioned. In most transaction processing
environments, the record size is small compared to the size of a
track, and so the record transfer time is much smaller than "p" and
can be eliminated from the expression. The time to perform a read
operation is thus:

                            s + p/2

Read operations on both RAID-1 and RAID-5 arrays require only a
single access to one drive in the array. The number of simultaneous
read operations which can be performed is thus "n" in an array with
"n" drives. The expression for the number of read I/Os per second
which can be performed on RAID-1 or RAID-5 arrays is thus:

                         n / (s + p/2)

Write operations on RAID-1 and RAID-5 arrays differ in the amount
of time required. RAID-1 writes simply require "s + p/2" time on
both redundant drives. However, RAID-5 writes require an extra
rotation on both the data and parity drives so that the old data
and parity can be read and used in the XOR calculation before being
rewritten on the next rotation. The time to do a RAID-5 write 
operation is thus "s + p/2 + p" on both the data and parity drives.
Both RAID-1 and RAID-5 arrays require two drives to be accessed
during writes, and thus can perform only "n/2" simultaneous write
operations in an array with "n" drives. The number of write I/Os
which can be performed per second on a RAID-1 array is thus:

                         (n/2) / (s + p/2)


Writes per second on a RAID-5 array is:

                       (n/2) / (s + p/2 + p)


In summary, the read and write I/O bandwidths for RAID-1 and RAID-5
arrays are:

R[1]   =  n / (s + p/2)
R[5]   =  n / (s + p/2)
W[1]   =  (n/2) / (s + p/2)
W[5]   =  (n/2) / (s + p/2 + p)

It can be seen that the read bandwidths for RAID-1 and RAID-5 are
identical, and the write bandwidth for RAID-1 is one half the read
bandwidth. (Another way of viewing this, is that when drives are
mirrored, the read bandwidth will double and the write bandwidth
will not be affected.)

The comparison of RAID-1 and RAID-5 write bandwidths is a bit more
complex. Fortunately, most disk drives have rotational periods that
are roughly equal to their average seek time, so the expressions
above can be simplified by setting "s = p". The write equations may
then be written as:

W[1]  =  (n/2) / (3p/2)
W[5]  =  (n/2) / (5p/2)

The ratio of RAID-1 to RAID-5 write bandwidths then becomes:

W[1]/W[5]  =  5/3

i.e., RAID-1 write bandwidth is 5/3, or almost double that of
RAID-5. In cached I/O subsystems which have the benefit of elevator
sorted writes, the average seek time during writes will be much
less than the rotational period, and thus "s" can be dropped from
the original expressions. In this case:

W[1]  =  (n/2) / (p/2)
W[5]  =  (n/2) / (3p/2)

The ratio of RAID-1 to RAID-5 write bandwidth then becomes:

W[1]/W[5]  =  3

i.e., RAID-1 write bandwidth is triple that of RAID-5.


         For reprints, ask for Technology Focus Paper:
                    "RAID-1 versus RAID-5:
         Picking the right array for your application"
                Document Number MM-0094-001-A
                   from DPT Channel Marketing
                         (407) 830-5522

