mdadm RAID5 array errors

Demoniac · 2009-04-03 13:00:32

This week there was a power blackout at my town, which seems to have corrupted my RAID5 array. When I first booted my server, the array seemed normal. I could browse my folders at the machine, but when I tried to access them through samba from a Windows pc, it didn't show all files in the directories. After restarting samba, I finally tried rebooting the system, which is when the real troubles started.
After the reboot the array didn't get mounted at al and cat /proc/mdstat showed me:

Personalities :
md0 : inactive sde[0](S) sdc[3](S) sdb[2](S) sdd[1](S)
      3907049984 blocks

mdadm --assemble --scan told me: mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

After some reading I tried mdadm --assemble --force /dev/md0 I got:

[root@htpc ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde[0] sdb[3] sdc[2]
      2930287488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]

unused devices: <none>

This made me think that /dev/sdd was kicked out of the array (like real RAID controllers do when a disk is not responding) so I tried to re-add it with mdadm --add /dev/md0 /dev/sdd which gave me a message stating that the disk was re-added to the array. It seemed like the array was rebuilding, given this /proc/mdstat output:

md0 : active raid5 sdd[4] sde[0] sdb[3] sdc[2]
      2930287488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
      [>....................]  recovery =  0.0% (710784/976762496) finish=274.6min speed=59232K/sec

However, when I checked the array this morning, I got this:

[root@htpc ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4](S) sde[0] sdb[5](F) sdc[2]
      2930287488 blocks level 5, 64k chunk, algorithm 2 [4/2] [U_U_]

Can anyone tell me if my data is totally screwed, or would there be any way to get this array back to life?

The machine is running Arch x86_64, which I update weekly. The harddisks are WD Caviar Green WD10EACS.

.:B:. · 2009-04-03 13:10:26

Your last output even shows two inactive drives .

I do not know what the uppercase letters between brackets mean (I'd like to know too, I happen to have a three-drive mdadm RAID 5 setup over here in my server too, with the WD10EACS drives too). I assume, since this is a RAID 5 setup, there are three drives in the 'core' with one spare? Theoretically you should be able to rebuild it - if out the two active drives one has actual data and the other one has the checksums.

Could you post the output of

array --detail

Demoniac · 2009-04-03 13:34:52

(S) is spare, (F) is faulty.

[root@htpc ~]# mdadm --assemble --scan --force
mdadm: forcing event count in /dev/sdb(3) from 202 upto 208
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdb
mdadm: /dev/md0 has been started with 3 drives (out of 4) and 1 spare.
[root@htpc ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Thu Dec 18 00:34:55 2008
     Raid Level : raid5
     Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
  Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Apr  3 09:52:01 2009
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 0% complete

           UUID : 725ee0e7:c5d5778a:950b469e:87adda2c
         Events : 0.208

    Number   Major   Minor   RaidDevice State
       0       8       64        0      active sync   /dev/sde
       4       8       48        1      spare rebuilding   /dev/sdd
       2       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb

.:B:. · 2009-04-03 15:22:26

Hmm. That output shows the array as rebuilding, with three active drives active, none failing?

Does /proc/mdstat show similar output?

Last edited by B (2009-04-03 15:22:50)

Demoniac · 2009-04-06 13:55:58

The array is pretty much borked I guess.. It tries to rebuild to sdd every time, but after a few percents it fails on sdb and stops. I guess there's some bad data on sdb, but most of it probably is still OK. So I really wonder if there's a way to unset the faulty flag on sdb and force the array to read it, even if there's errors. I guess I could save most of my data that way.

nogoma · 2009-04-06 18:51:24

Maybe I'm not quite understanding your setup correctly, but it seems to me you're still in a recoverable state. sdb looks to be failed; w/ 2/3 of the original non-spare disks available, however, you should be able to rebuild. That is to say, if sd{b,c,e} were your original RAID, w/ sdd as a spare, then when sdb failed you should have been able to remove that from the array and add sdd and your array should rebuild using sd{c,e} and ending up online w/ sd{c,d,e}. If sdb is genuinely borked, you don't want it in the array when you rebuild.

% mdadm /dev/md0 -r /dev/sdb
% mdadm /dev/md0 -a /dev/sdd
% cat /proc/mdstat
...should show rebuild in progress

Of course, at this point, you could hit horrible things like http://blogs.zdnet.com/storage/?p=162, in which case you're knackered.

Arch Linux

#1 2009-04-03 13:00:32

mdadm RAID5 array errors

#2 2009-04-03 13:10:26

Re: mdadm RAID5 array errors

#3 2009-04-03 13:34:52

Re: mdadm RAID5 array errors

#4 2009-04-03 15:22:26

Re: mdadm RAID5 array errors

#5 2009-04-06 13:55:58

Re: mdadm RAID5 array errors

#6 2009-04-06 18:51:24

Re: mdadm RAID5 array errors

Board footer