You are not logged in.
This week there was a power blackout at my town, which seems to have corrupted my RAID5 array. When I first booted my server, the array seemed normal. I could browse my folders at the machine, but when I tried to access them through samba from a Windows pc, it didn't show all files in the directories. After restarting samba, I finally tried rebooting the system, which is when the real troubles started.
After the reboot the array didn't get mounted at al and cat /proc/mdstat showed me:
Personalities :
md0 : inactive sde[0](S) sdc[3](S) sdb[2](S) sdd[1](S)
3907049984 blocks
mdadm --assemble --scan told me: mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
After some reading I tried mdadm --assemble --force /dev/md0 I got:
[root@htpc ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde[0] sdb[3] sdc[2]
2930287488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
unused devices: <none>
This made me think that /dev/sdd was kicked out of the array (like real RAID controllers do when a disk is not responding) so I tried to re-add it with mdadm --add /dev/md0 /dev/sdd which gave me a message stating that the disk was re-added to the array. It seemed like the array was rebuilding, given this /proc/mdstat output:
md0 : active raid5 sdd[4] sde[0] sdb[3] sdc[2]
2930287488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
[>....................] recovery = 0.0% (710784/976762496) finish=274.6min speed=59232K/sec
However, when I checked the array this morning, I got this:
[root@htpc ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4](S) sde[0] sdb[5](F) sdc[2]
2930287488 blocks level 5, 64k chunk, algorithm 2 [4/2] [U_U_]
Can anyone tell me if my data is totally screwed, or would there be any way to get this array back to life?
The machine is running Arch x86_64, which I update weekly. The harddisks are WD Caviar Green WD10EACS.
Offline
Your last output even shows two inactive drives .
I do not know what the uppercase letters between brackets mean (I'd like to know too, I happen to have a three-drive mdadm RAID 5 setup over here in my server too, with the WD10EACS drives too). I assume, since this is a RAID 5 setup, there are three drives in the 'core' with one spare? Theoretically you should be able to rebuild it - if out the two active drives one has actual data and the other one has the checksums.
Could you post the output of
array --detail
Got Leenucks? :: Arch: Power in simplicity :: Get Counted! Registered Linux User #392717 :: Blog thingy
Offline
(S) is spare, (F) is faulty.
[root@htpc ~]# mdadm --assemble --scan --force
mdadm: forcing event count in /dev/sdb(3) from 202 upto 208
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdb
mdadm: /dev/md0 has been started with 3 drives (out of 4) and 1 spare.
[root@htpc ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Thu Dec 18 00:34:55 2008
Raid Level : raid5
Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Apr 3 09:52:01 2009
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 0% complete
UUID : 725ee0e7:c5d5778a:950b469e:87adda2c
Events : 0.208
Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
4 8 48 1 spare rebuilding /dev/sdd
2 8 32 2 active sync /dev/sdc
3 8 16 3 active sync /dev/sdb
Offline
Hmm. That output shows the array as rebuilding, with three active drives active, none failing?
Does /proc/mdstat show similar output?
Last edited by B (2009-04-03 15:22:50)
Got Leenucks? :: Arch: Power in simplicity :: Get Counted! Registered Linux User #392717 :: Blog thingy
Offline
The array is pretty much borked I guess.. It tries to rebuild to sdd every time, but after a few percents it fails on sdb and stops. I guess there's some bad data on sdb, but most of it probably is still OK. So I really wonder if there's a way to unset the faulty flag on sdb and force the array to read it, even if there's errors. I guess I could save most of my data that way.
Offline
Maybe I'm not quite understanding your setup correctly, but it seems to me you're still in a recoverable state. sdb looks to be failed; w/ 2/3 of the original non-spare disks available, however, you should be able to rebuild. That is to say, if sd{b,c,e} were your original RAID, w/ sdd as a spare, then when sdb failed you should have been able to remove that from the array and add sdd and your array should rebuild using sd{c,e} and ending up online w/ sd{c,d,e}. If sdb is genuinely borked, you don't want it in the array when you rebuild.
% mdadm /dev/md0 -r /dev/sdb
% mdadm /dev/md0 -a /dev/sdd
% cat /proc/mdstat
...should show rebuild in progress
Of course, at this point, you could hit horrible things like http://blogs.zdnet.com/storage/?p=162, in which case you're knackered.
-nogoma
---
Code Happy, Code Ruby!
http://www.last.fm/user/nogoma/
Offline