What is RAID ?
RAID (redundant array of independent disks) is a data storage virtualization technology that combines multiple physical disk drive components into a
single logical unit for the purposes of data redundancy, performance improvement, or both.
Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on the required level of redundancy and performance.
The different schemes, or data distribution layouts, are named by the word RAID followed by a number, for example RAID 0 or RAID 1. Each schema, or
RAID level, provides a different balance among the key goals: reliability, availability, performance, and capacity. RAID levels greater than RAID 0
provide protection against unrecoverable sector read errors, as well as against failures of whole physical drives.
There are different RAID levels, each optimized for a specific situation. These are not standardized by an industry group or standardization committee.
This explains why companies sometimes come up with their own unique numbers and implementations. This article will also cover the following RAID levels:
RAID 0 – striping
RAID 1 – mirroring
RAID 5 – striping with parity
RAID 6 – striping with double parity
RAID 10 – combining mirroring and striping
RAID 0
RAID 0 consists of striping, without mirroring or parity. The capacity of a RAID 0 volume is the sum of the capacities of the disks in the set, the same
as with a spanned volume. There is no added redundancy for handling disk failures, just as with a spanned volume. Thus, failure of one disk causes the
loss of the entire RAID 0 volume, with reduced possibilities of data recovery when compared with a broken spanned volume. Striping distributes the
contents of files roughly equally among all disks in the set, which makes concurrent read or write operations on the multiple disks almost inevitable
and results in performance improvements. The concurrent operations make the throughput of most read and write operations equal to the throughput of
one disk multiplied by the number of disks. Increased throughput is the big benefit of RAID 0 versus spanned volume, at the cost of increased
vulnerability to drive failures.
RAID 1
RAID 1 consists of data mirroring, without parity or striping. Data is written identically to two drives, thereby producing a "mirrored set" of drives.
Thus, any read request can be serviced by any drive in the set. If a request is broadcast to every drive in the set, it can be serviced by the drive
that accesses the data first (depending on its seek time and rotational latency), improving performance. Sustained read throughput, if the controller
or software is optimized for it, approaches the sum of throughputs of every drive in the set, just as for RAID 0. Actual read throughput of most RAID
1 implementations is slower than the fastest drive. Write throughput is always slower because every drive must be updated, and the slowest drive
limits the write performance. The array continues to operate as long as at least one drive is functioning.
RAID 2
RAID 2 consists of bit-level striping with dedicated Hamming-code parity. All disk spindle rotation is synchronized and data is striped such that each
sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. This level
is of historical significance only; although it was used on some early machines (for example, the Thinking Machines CM-2), as of 2014 it is not used
by any commercially available system.
RAID 3
RAID 3 consists of byte-level striping with dedicated parity. All disk spindle rotation is synchronized and data is striped such that each sequential byte
is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive. Although implementations exist,
RAID 3 is not commonly used in practice.
RAID 4
RAID 4 consists of block-level striping with dedicated parity. This level was previously used by NetApp, but has now been largely replaced by a
proprietary implementation of RAID 4 with two parity disks, called RAID-DP. The main advantage of RAID 4 over RAID 2 and 3 is I/O parallelism: in RAID 2
and 3, a single read/write I/O operation requires reading the whole group of data drives, while in RAID 4 one I/O read/write operation does not have
to spread across all data drives. As a result, more I/O operations can be executed in parallel, improving the performance of small transfers.
RAID 5
RAID 5 consists of block-level striping with distributed parity. Unlike RAID 4, parity information is distributed among the drives, requiring all drives
but one to be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is
lost. RAID 5 requires at least three disks.[12] RAID 5 implementations are susceptible to system failures because of trends regarding array rebuild
time and the chance of drive failure during rebuild (see "Increasing rebuild time and failure probability" section, below). Rebuilding an array
requires reading all data from all disks, opening a chance for a second drive failure and the loss of the entire array. In August 2012, Dell posted an
advisory against the use of RAID 5 in any configuration on Dell EqualLogic arrays and RAID 50 with "Class 2 7200 RPM drives of 1 TB and higher
capacity" for business-critical data.
RAID 6
RAID 6 consists of block-level striping with double distributed parity. Double parity provides fault tolerance up to two failed drives. This makes larger
RAID groups more practical, especially for high-availability systems, as large-capacity drives take longer to restore. RAID 6 requires a minimum of
four disks. As with RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced. With
a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The
larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5. RAID 10 also minimizes
these problems.
single logical unit for the purposes of data redundancy, performance improvement, or both.
Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on the required level of redundancy and performance.
The different schemes, or data distribution layouts, are named by the word RAID followed by a number, for example RAID 0 or RAID 1. Each schema, or
RAID level, provides a different balance among the key goals: reliability, availability, performance, and capacity. RAID levels greater than RAID 0
provide protection against unrecoverable sector read errors, as well as against failures of whole physical drives.
There are different RAID levels, each optimized for a specific situation. These are not standardized by an industry group or standardization committee.
This explains why companies sometimes come up with their own unique numbers and implementations. This article will also cover the following RAID levels:
RAID 0 – striping
RAID 1 – mirroring
RAID 5 – striping with parity
RAID 6 – striping with double parity
RAID 10 – combining mirroring and striping
RAID 0
RAID 0 consists of striping, without mirroring or parity. The capacity of a RAID 0 volume is the sum of the capacities of the disks in the set, the same
as with a spanned volume. There is no added redundancy for handling disk failures, just as with a spanned volume. Thus, failure of one disk causes the
loss of the entire RAID 0 volume, with reduced possibilities of data recovery when compared with a broken spanned volume. Striping distributes the
contents of files roughly equally among all disks in the set, which makes concurrent read or write operations on the multiple disks almost inevitable
and results in performance improvements. The concurrent operations make the throughput of most read and write operations equal to the throughput of
one disk multiplied by the number of disks. Increased throughput is the big benefit of RAID 0 versus spanned volume, at the cost of increased
vulnerability to drive failures.
RAID 1
RAID 1 consists of data mirroring, without parity or striping. Data is written identically to two drives, thereby producing a "mirrored set" of drives.
Thus, any read request can be serviced by any drive in the set. If a request is broadcast to every drive in the set, it can be serviced by the drive
that accesses the data first (depending on its seek time and rotational latency), improving performance. Sustained read throughput, if the controller
or software is optimized for it, approaches the sum of throughputs of every drive in the set, just as for RAID 0. Actual read throughput of most RAID
1 implementations is slower than the fastest drive. Write throughput is always slower because every drive must be updated, and the slowest drive
limits the write performance. The array continues to operate as long as at least one drive is functioning.
RAID 2
RAID 2 consists of bit-level striping with dedicated Hamming-code parity. All disk spindle rotation is synchronized and data is striped such that each
sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. This level
is of historical significance only; although it was used on some early machines (for example, the Thinking Machines CM-2), as of 2014 it is not used
by any commercially available system.
RAID 3
RAID 3 consists of byte-level striping with dedicated parity. All disk spindle rotation is synchronized and data is striped such that each sequential byte
is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive. Although implementations exist,
RAID 3 is not commonly used in practice.
RAID 4
RAID 4 consists of block-level striping with dedicated parity. This level was previously used by NetApp, but has now been largely replaced by a
proprietary implementation of RAID 4 with two parity disks, called RAID-DP. The main advantage of RAID 4 over RAID 2 and 3 is I/O parallelism: in RAID 2
and 3, a single read/write I/O operation requires reading the whole group of data drives, while in RAID 4 one I/O read/write operation does not have
to spread across all data drives. As a result, more I/O operations can be executed in parallel, improving the performance of small transfers.
RAID 5
RAID 5 consists of block-level striping with distributed parity. Unlike RAID 4, parity information is distributed among the drives, requiring all drives
but one to be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is
lost. RAID 5 requires at least three disks.[12] RAID 5 implementations are susceptible to system failures because of trends regarding array rebuild
time and the chance of drive failure during rebuild (see "Increasing rebuild time and failure probability" section, below). Rebuilding an array
requires reading all data from all disks, opening a chance for a second drive failure and the loss of the entire array. In August 2012, Dell posted an
advisory against the use of RAID 5 in any configuration on Dell EqualLogic arrays and RAID 50 with "Class 2 7200 RPM drives of 1 TB and higher
capacity" for business-critical data.
RAID 6
RAID 6 consists of block-level striping with double distributed parity. Double parity provides fault tolerance up to two failed drives. This makes larger
RAID groups more practical, especially for high-availability systems, as large-capacity drives take longer to restore. RAID 6 requires a minimum of
four disks. As with RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced. With
a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The
larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5. RAID 10 also minimizes
these problems.
Comments
Post a Comment