So how many drives can I lose in a RAID and still be up? Read the section at the bottom (Red heading that reads OTHER RAIDS). If your curious about RAID10 start up here.
RAID10
In a RAID10 you can either lose a maximum of 1 drive or half of your drives and still have your volume up. So meaning if you lose the wrong 2 drives your volume will crash, plugging back the drives, can make your volume come back but data corruption is possible (and probable). If you have 60 drives in your RAID10 volume, then you can also lose 30 drives if the right drives fail, without having the volume fail, but… the wrong 2 drives fail and your whole 60 drive volume comes crashing down.
Here is how it works with RAID10:
Imagine 6 drives of size 3TB in a RAID10. The drives have the names: D1,D2,D3,D4,D5,D6.
When you make a RAID10 out of those drives, they will automatically be put into as many 2 drive mirrors as possible. In this case they will form 3 mirrors (or vdevs if your from ZFS world)
Pretend setup:
1st mirror or vdev1: D1 + D2 = 3TB 2nd mirror or vdev2: D3 + D4 = 3TB 3rd mirror or vdev3: D5 + D6 = 3TB ------------------------------------- Total Volume = vdev1 + vdev2 + vdev3 = 9 TB
Because the VDEVs are RAID1ed together you will only have 3TB and not 6TB. Now each VDEV is combined together via RAID0 to form 9TB. One way to look at it is pretend the vdevs turn into 3TB drives. So you have 3 vdevs which are RAID0ed, i.e. you have 3 drives which are RAID0ed. In reality each drive is 2 drives which are mirrored.
SIDENOTE: I call vdevs, as mirrors, or subsets, or pairs. In reality mirrors/subsets/pairs of drives is correct terminology. But to be very specific vdevs are only that (that being “mirrors/subsets/pairs”) for ZFS. For other raids like MDADM we dont say “vdev” (i mean we could say it, nothing would stop us, but technically it would be incorrect). So what are vdevs and zpools and zfs in general? Check out this ZFS intro which was the first thing that I luckly read on ZFS and got me a good head start. Link: ZFS Intro – Just in case that link is dead I downloaded the latest copy as of 2015-04-30 (as you see its dated back to August 2014 so most likely there wont be more updates) here it is: http://www.infotinks.com/wp-content/uploads/2015/04/FreeNAS-Guide-9.2.1.pptx.zip
It looks like: RAID1 on the inside, RAID0 on the outside and you have RAID10
So which drives can we lose?
Rule of thumb: *You can lose as many drives as you want as long as each subset/pair/mirror/vdev has 1 drive remaining. *Once you lose a subset/pair/mirror/vdev you lose the volume. So we have to make sure we dont lose that. Analysis: *You can lose any 1 drive and still be up *You can lose D1,D3,D5 and still have the volume up *You can lose D2,D3,D6 and still have the volume up *You cannot lose D1 and D2 you will lose the volume. Putting drives back in is the only cure, but there might be data corruption. *You cannot lose D3 and D4 you will lose the volume. Putting drives back in is the only cure, but there might be data corruption. *You cannot lose D5 and D6 you will lose the volume. Putting drives back in is the only cure, but there might be data corruption.
Now pretend you have this rare assymetrical RAID10 . We added D7 and D8 to vdev2 , the 2nd mirror:
1st mirror or vdev1: D1 + D2 = 3TB 2nd mirror or vdev2: D3 + D4 + D7 + D8 = 3TB 3rd mirror or vdev3: D5 + D6 = 3TB ------------------------------------- Total Volume = vdev1 + vdev2 + vdev3 = 9 TB Note: vdev2 is still only 3TB because its RAID1. D1 can be the data drive, but D3, D4 and D5 are just holding copies/mirrors.
What drives can you lose in this assymterical raid?
Rule of thumb: *You can lose as many drives as you want as long as each subset/pair/mirror/vdev has 1 drive remaining. *Once you lose a subset/pair/mirror/vdev you lose the volume. So we have to make sure we dont lose that. Analysis: *You can lose any 1 drive and still be up *You can lose D1,D3,D5 and still have the volume up *You can lose D2,D3,D6 and still have the volume up *You can lose D4,D3,D7 and still have the volume up, as you still have D8 in that vdev to keep that vdev and therefore that volume up. *You cannot lose D1 and D2 you will lose the volume. Putting drives back in is the only cure, but there might be data corruption. *You cannot lose D3 and D4 and D7 and D8 you will lose the volume. Putting drives back in is the only cure, but there might be data corruption. *You cannot lose D5 and D6 you will lose the volume. Putting drives back in is the only cure, but there might be data corruption.
How to check what drives are in a mirror?
So when you lose 2 drives, as long as they dont meet any of the fail conditions above (as long as the vdevs are still up) your volume will be up. The trick to knowing that is knowing what drives make up each mirror.
With ZFS you can do this: “zpool status -v”
# zpool status -v pool: VOLUME1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM FlexiBen ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t50014EA25EFC3F32d0 ONLINE 0 0 0 c0t50014EA209A6ED00d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c0t50014EA25EFCEF77d0 ONLINE 0 0 0 c0t50014EA2B453BAB9d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c0t50014EA2099C025Ad0 ONLINE 0 0 0 c0t50014EA209A68617d0 ONLINE 0 0 0
With MDADM raid you can do this: “mdadm -D –scan –vv” or you can run for a single raid “mdadm -D /dev/md2″
# mdadm -D --scan -vv I dont have a copy of some output right now. I can tell you that newer versions of "mdadm" will clearly state what drives are in what mirror/subset/"vdev" (note vdev is wrong terminology as its zfs terminology, however it applies in a way - potato potahtoe) If you have older version of "mdadm" then you will not see what drives are in what mirror. You will need to use the below "dd" method, see link below.
You can also use dd to find out what drives are in a mirror:
http://www.infotinks.com/mdadm-raid10-pairs/
Other RAIDs:
*In RAID0: Cant lose any drives. If you lose any drives, the volume will crash, putting drive back in will fix it but most likely you will have some permanent filesystem corruption (Data loss). You can fight back against that by having regular backups of your data to another volume/location.
*In RAID1: Can lose all but one of the drives. If you have 2 drives, can lose 1. If you have 3 drives, can lose 2.
*In RAID5 or RAID2/3/4 or RAIDz1: Can lose only 1 drive.
*In RAID6 or RAIDz2: Can lose 1 or 2 drives.
*In RAIDz3 (ive seen this called RAID7 somewhere, which makes sense but I dont think thats its final name): can lose 1 or 2 or 3 drives.
*In RAID50: Can lose 1 drives in one vdev/subset, each subset follows rules of RAID5. If you lose a whole subset because you broke RAID5 rules, then the volume is lost and data corruption is emminent.
*In RAID60: Can lose 1 or 2 drives in one vdev/subset, each subset follows rules of RAID6. If you lose a single subset because you broke RAID6 rules, then the volume is lost and data corruption is eminent.
*In RAIDz30 (or RAID70 or whatever you want to call it): Can lose 1 or 2 or 3 drives in one vdev/subset, each subset follows the rules of RAID7 or RAIDz3. If you lose a single subset because you broke RAID7/RAIDz3 rules, then the volume is lost and data corruption is eminent.
*In RAID10: RAID10 makes 2 drive mirrors as many times as it can fit with the number of drives. So you can only lose 1 drive in each mirror, if you lose 2 drives in a mirror you will lose the vdev/subset and volume is lost and data corruption is eminent. Minimum you can lose 1 drive. Maximum you can lose half of your drives (they just all have to be from a different mirror/subset/vdev). If you lose half of your drives AND also each mirror only has 1 drive dead THEN your volume is still up.
NOTE: just like RAID10 can be made of several mirrors(or vdevs or subsets) which are all RAID0ed together (hence the 0 on the end, representing the outer RAID0 — the inside of each vdev/mirrorr/subset is RAID1ed), RAID50 and RAID60 can be made of several vdevs or subsets as well (where the inside is all RAID5ed or RAID6ed). Each vdev/subset/mirror is a member of that outter RAID0. Since RAID0 cant lose a single member or else the volume crashes, therefore a RAID50,RAID10 and RAID60 cant lose a single vdev or subset or else volume fails (ill just say vdev from now on). A vdev can be lost when its RAID1 is lost (both drives fail in the inner mirror), or RAID5 is lost (more than 1 drive fails in the inner RAID5), or RAID6 is lost (more than 2 drive fails in the inner RAID6). Same idea applies to RAIDz30 or RAID70 when and if that exists/comes out. With ZFS the number of VDEVs can be represented by a multiplier at the end, for example RAID50x3 means you have 3 vdevs that are RAID0ed together, and each vdev has members (which are usually drives or partitions) which are RAID5ed together.
By “lose a drive”, I mean: If a drive is dead or missing from the chassis its considered lost. If a drive is not fully synced up to the RAID yet its considered lost. These Lost drives should be replaced immediately (unless they are syncing up, which means they are in the processes of being replaced, in that case wait for them to sync up).
How many drives should I replace at a time in my raid?
Even though RAIDs support multiple failures/missing drives at a time, Replace 1 drive at a time. Even though your RAID might support replacing more than 1 drive at a time, its wise to replace one at a time. Putting in all drives at the same time might cause you to pull out the wrong drive and crash the raid. If you are certain and pulled out and replaced only the bad drives then your fine, the system will either way only sync up 1 drive at a time. Thats why its best to only replace 1 drive at a time (to avoid human error).
This is a really helpful post, keep up the good work.