RAID EE Technology
 
As hard disk capacity increases, the amount of time required to rebuild RAID data has also dramatically increased. This makes one of the most troubles for enterprise storage management today. In the past days when the hard disk capacity was only 10GB to 100GB, RAID built was a job that could be completed in 10 minutes or even more than 10 minutes, which was not yet a problem without special concern. However, as disk capacity grows to hundreds of GB and even TB, RAID rebuild times have increased to hours or even days, it becomes a major problem in storage management.
*
Why RAID Rebuild Time-Consuming
 
As drive capacity grows, RAID rebuild time grows linearly, raising the rebuild time required by traditional RAID architectures to tens of hours when using RAID disks with more than 4TB HDD capacity.
 
There are several factors that affect the RAID rebuild time:
 
  • HDD Capacity: The HDD capacity makes up the disk group, the larger the HDD capacity, the longer the rebuild time is required.
  • Quantity of Disk Drives: The quantity of disk drives included in a disk group affects the amount of time it takes for the system to read data from the remaining healthy disk drives and write them to the hot spare disk drives. The more disk disks, the longer the rebuild time.
  • Rebuild Job Priority: During RAID rebuild, the system still has to assume I/O access to the front-end host. The higher the priority assigned to the RAID rebuild job, the faster the rebuild, but the less the front-end host gains I/O performance.
  • Fast Rebuild: Enabling fast rebuild function only need to rebuild the actual capacity of the volume, unused disk group space has not to rebuild. If only part of the space in a disk group is used by the volume, the rebuild time will be shortened.
  • RAID level: RAID 1 and RAID 10 with direct block-to-block replication will rebuild faster than RAID 5 and RAID 6 with parity calculations.

Given the potential for failure on each disk drive, the more disk drives contain in a disk group, the more possibility of cumulative failure increase, so there is an upper limit on the quantity of disk drives in a disk group. Compared with the previous factors, the increasing impact of the disk drive capacity on the rebuild speed has become the primary factor. Such a long rebuild time is apparently not acceptable to any user. To solve the problems of traditional RAID, we implement RAID EE technology.
 
Theory of RAID EE
 
RAID EE adds more spare disks in a disk group, we call them RAID EE spares to separate the original global, local, and dedicated spares. Spare areas are preserved in each stripe of the disk group and are distributed in the disk group by means of disk rotation. When disks failed in the disk group, missing data is rebuilt into the preserved spare areas. Since all disks in the set are destination of rebuilt data, the bottleneck of traditional RAID rebuild is gone, rebuild performance dramatically improved. If new disks are added in, data in spare areas are copied back to new joined disks.
 
Four new RAID levels are provided for RAID EE, there are:
 
  • RAID 5EE (E stands for Enhanced), requires a minimum of 4 disk drives with one RAID EE spare disk which can tolerate 2 disk drives failure. Adding more RAID EE spares will tolerate more disk drives failure.
  • RAID 6EE requires a minimum of 5 disk drives.
  • RAID 50EE requires a minimum of 7 drives.
  • RAID 60EE requires a minimum of 9 drives.

 
*
Example of RAID 5EE with 1 RAID EE spare
 
Now we take an example to describe how it works. The following example is a RAID 5EE with 5 disks. 4 disks are for RAID disks, and additional one disk is for RAID EE spare. After initialization, data block distribution is as follows. P is stands for parity, S is stands for RAID EE spare, and it is empty now.
 
*
*
Assume that disk 2 is failed. RAID 5EE is under degraded mode.
 
The spare areas are rebuilt with data from the failed disk drive. This action is called EE Rebuild. After rebuild, data distributed is like RAID 5 and it can tolerate another failed disk drive. As we can imagine, the more RAID EE spare disks, the faster it rebuilds.
 
*
*
When a new disk drive is joined into the RAID EE disk group, the data rebuilt in the spare area will be copied back to the new disk. This action is called Copyback. After copied back, it is back to RAID 5EE normal state.
 
Example of RAID 60EE with 2 RAID 2.0 spares
 
Take another example of a RAID 60EE with 10 disks. 8 disks are for RAID disks, and 2 disks are for RAID 2.0 spares. After initialization, data block distribution is as follows. Rebuild and copy back of RAID 60EE is similar as the above; it will not be repeated here.
 
*
Test Results
Test Case 1: RAID 5 vs. RAID 5EE
 
This test provides the comparison of rebuild time and copyback time between RAID 5 and RAID 5EE. We assume that the more RAID EE spare disks will have less rebuild time. First we create a RAID 5 pool. After initialization, plug out and then plug in one disk drive. Count the rebuild time with different I/O access patterns. Continue to create RAID 5EE with 1 / 2 / 4 /8 x RAID EE spare disks in sequence. After initialization, plug out one disk drive. The RAID EE starts rebuilding. Count the rebuild time with different I/O access patterns. Then plug in one disk drive and set it as dedicated spare, it starts copying back. Last, count the copyback time.
*
*
Summary
 
  • RAID EE can improve rebuild time by up to 48%.
  • The more RAID EE spare disks are used, the less rebuild time is.
  • Rebuild time is more effective when there are reading accesses.
 
Test Equipments & Configurations
 
Server
  • Model: ASUS RS700 X7/PS4 (CPU: Intel Xeon E5-2600 v2 / RAM: 8GB)
    • iSCSI HBA: Intel 82574L Gigabit Network Connection
    • OS: Windows Server 2012 R2
I/O Pattern
  • Tool: IOmeter V1.1.0
  • Workers: 1
  • Outstanding (Queue Depth): 128
  • Access Specifications:
    • Backup Pattern (Sequential Read / Write, 256KB (MB/s))
    • Database Access Pattern (as defined by Intel/StorageReview.com, 8KB, 67% Read, 100% Random)
    • File Server Access Pattern (as defined by Intel)
    • Idle
Storage
  • Model: XCubeSAN XS5224D
    • Memory: 16GB (2 x 8GB in bank 1 & 3) per controller
    • Firmware 1.3.0
    • HDD: 24 x Seagate Constellation ES, ST500NM0001, 500GB, SAS 6Gb/s
  • HDD Pool:
    • RAID 5 Pool with 16 x NL-SAS HDDs in Controller 1
    • RAID 5EE Pool with 17 (16+1 x RAID EE spare) x NL-SAS HDDs in Controller 1
    • RAID 5EE Pool with 18 (16+2 x RAID EE spares) x NL-SAS HDDs in Controller 1
    • RAID 5EE Pool with 20 (16+4 x RAID EE spares) x NL-SAS HDDs in Controller 1
    • RAID 5EE Pool with 24 (16+8 x RAID EE spares) x NL-SAS HDDs in Controller 1
  • HDD Volume: 100GB in Pool
Test Case 2: RAID 60 vs. RAID 60EE
 
This test provides the comparison of rebuild time and copyback time between RAID 60 and RAID 60EE. The same, we assume that the more RAID EE spare disks will have less rebuild time and RAID 60EE will have better efficiency. First we create a RAID 60 pool. After initialization, plug out and then plug in one disk drive. Count the rebuild time with different I/O access patterns. Continue to create RAID 60EE with 1 / 2 / 4 /8 x RAID EE spare disks in sequence. After initialization, plug out one disk drive. The RAID EE starts rebuilding. Count the rebuild time with different I/O access patterns. Then plug in one disk drive and set it as dedicated spare, it starts copying back. Last, count the copyback time.
*
*
Summary
 
  • RAID EE can improve rebuild time by up to 58%.
  • The more RAID EE spare disks are used, the less rebuild time is.
  • Rebuild time is more effective when there are reading accesses.
 
Test Equipments & Configurations
 
Server
  • Model: ASUS RS700 X7/PS4 (CPU: Intel Xeon E5-2600 v2 / RAM: 8GB)
    • iSCSI HBA: Intel 82574L Gigabit Network Connection
    • OS: Windows Server 2012 R2
I/O Pattern
  • Tool: IOmeter V1.1.0
  • Workers: 1
  • Outstanding (Queue Depth): 128
  • Access Specifications:
    • Backup Pattern (Sequential Read / Write, 256KB (MB/s))
    • Database Access Pattern (as defined by Intel/StorageReview.com, 8KB, 67% Read, 100% Random)
    • File Server Access Pattern (as defined by Intel)
    • Idle
Storage
  • Model: XCubeSAN XS5224D
    • Memory: 16GB (2 x 8GB in bank 1 & 3) per controller
    • Firmware 1.3.0
    • HDD: 24 x Seagate Constellation ES, ST500NM0001, 500GB, SAS 6Gb/s
  • HDD Pool:
    • RAID 60 Pool with 16 x NL-SAS HDDs in Controller 1
    • RAID 60EE Pool with 17 (16+1 x RAID EE spare) x NL-SAS HDDs in Controller 1
    • RAID 60EE Pool with 18 (16+2 x RAID EE spares) x NL-SAS HDDs in Controller 1
    • RAID 60EE Pool with 20 (16+4 x RAID EE spares) x NL-SAS HDDs in Controller 1
    • RAID 60EE Pool with 24 (16+8 x RAID EE spares) x NL-SAS HDDs in Controller 1
  • HDD Volume: 100GB in Pool
Conclusion
 
As drive capacity grows, RAID rebuild time grows linearly. The more disk drives contain in a disk group, the more possibility of cumulative failure increase, so does the increasing impact of the disk drive capacity on the rebuild speed. Using RAID EE technology will greatly reduce these risks.