You are not logged in.
Hi...
I recently (2 week ago) had a system crash. Only thing I was able to do was a system reset. Since I was in a hurry and everything went fine I did no more.
Now yesterday I had the same failure but this time the array did not get up again.
Here is the errorlog (seems to be the controller)
May 14 12:44:38 marie irq 18: nobody cared (try booting with the "irqpoll" option)
May 14 12:44:38 marie [<c0143f3a>] __report_bad_irq+0x2a/0xa0
May 14 12:44:38 marie [<c01436fd>] handle_IRQ_event+0x3d/0x70
May 14 12:44:38 marie [<c0144057>] note_interrupt+0x87/0xf0
May 14 12:44:38 marie [<c014382d>] __do_IRQ+0xfd/0x110
May 14 12:44:38 marie [<c01059e9>] do_IRQ+0x19/0x30
May 14 12:44:38 marie [<c0103c1e>] common_interrupt+0x1a/0x20
May 14 12:44:38 marie [<c0100de0>] default_idle+0x0/0x60
May 14 12:44:38 marie [<c0100e0c>] default_idle+0x2c/0x60
May 14 12:44:38 marie [<c0100ebf>] cpu_idle+0x5f/0x80
May 14 12:44:38 marie [<c044ea55>] start_kernel+0x195/0x1f0
May 14 12:44:38 marie [<c044e3c0>] unknown_bootoption+0x0/0x1f0
May 14 12:44:38 marie handlers:
May 14 12:44:38 marie [<e0c945b0>] (pdc_interrupt+0x0/0x1c0 [sata_promise])
May 14 12:44:38 marie Disabling IRQ #18
May 14 12:45:07 marie ata2: command timeout
May 14 12:45:07 marie ata2: status=0x50 { DriveReady SeekComplete }
May 14 12:45:08 marie ata4: command timeout
May 14 12:45:08 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 12:45:08 marie sdg: Current: sense key=0x0
May 14 12:45:08 marie ASC=0x0 ASCQ=0x0
May 14 12:45:37 marie ata2: command timeout
May 14 12:45:37 marie ata2: status=0x50 { DriveReady SeekComplete }
May 14 12:45:37 marie sde: Current: sense key=0x0
May 14 12:45:37 marie ASC=0x0 ASCQ=0x0
May 14 12:46:07 marie ata1: command timeout
May 14 12:46:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie ata4: command timeout
May 14 12:46:07 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie ata3: command timeout
May 14 12:46:07 marie ata3: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie ata2: command timeout
May 14 12:46:07 marie ata2: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie sde: Current: sense key=0x0
May 14 12:46:07 marie ASC=0x0 ASCQ=0x0
May 14 12:46:37 marie ata1: command timeout
May 14 12:46:37 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 12:46:37 marie sdd: Current: sense key=0x0
May 14 12:46:37 marie ASC=0x0 ASCQ=0x0
May 14 12:46:37 marie ata3: command timeout
May 14 12:46:37 marie ata3: status=0x50 { DriveReady SeekComplete }
May 14 12:46:37 marie sdf: Current: sense key=0x0
May 14 12:46:37 marie ASC=0x0 ASCQ=0x0
May 14 12:46:37 marie ata4: command timeout
May 14 12:46:37 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 12:46:37 marie sdg: Current: sense key=0x0
May 14 12:46:37 marie ASC=0x0 ASCQ=0x0
May 14 12:47:07 marie ata1: command timeout
May 14 12:47:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 12:47:07 marie sdd: Current: sense key=0x0
May 14 12:47:07 marie ASC=0x0 ASCQ=0x0
May 14 12:47:37 marie ata1: command timeout
.
.
Hell lot of seek errors on ata1-4
.
.
and a few of
May 14 13:03:07 marie ata4: command timeout
May 14 13:03:07 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 13:03:07 marie sdg: Current: sense key=0x0
May 14 13:03:07 marie ASC=0x0 ASCQ=0x0
May 14 13:03:07 marie ata1: command timeout
May 14 13:03:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 13:03:07 marie sdd: Current: sense key=0x0
May 14 13:03:07 marie ASC=0x0 ASCQ=0x0
May 14 13:03:07 marie ata2: command timeout
May 14 13:03:07 marie ATA: abnormal status 0xFF on port 0xE0C9E29C
May 14 13:03:07 marie ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
May 14 13:03:07 marie ata2: status=0xff { Busy }
May 14 13:03:07 marie sd 5:0:0:0: SCSI error: return code = 0x8000002
May 14 13:03:07 marie sde: Current: sense key=0xb
May 14 13:03:07 marie ASC=0x47 ASCQ=0x0
May 14 13:03:07 marie end_request: I/O error, dev sde, sector 213838143
May 14 13:03:07 marie ata3: command timeout
May 14 13:03:07 marie ata3: status=0x50 { DriveReady SeekComplete }
May 14 13:03:07 marie sdf: Current: sense key=0x0
May 14 13:03:07 marie ASC=0x0 ASCQ=0x0
May 14 13:03:37 marie ata4: command timeout
May 14 13:03:37 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 13:03:37 marie sdg: Current: sense key=0x0
May 14 13:03:37 marie ASC=0x0 ASCQ=0x0
...
May 14 13:12:37 marie ASC=0x0 ASCQ=0x0
May 14 13:12:37 marie ata2: command timeout
May 14 13:12:37 marie ATA: abnormal status 0xFF on port 0xE0C9E29C
May 14 13:12:37 marie ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
May 14 13:12:37 marie ata2: status=0xff { Busy }
May 14 13:12:37 marie sd 5:0:0:0: SCSI error: return code = 0x8000002
May 14 13:12:37 marie sde: Current: sense key=0xb
May 14 13:12:37 marie ASC=0x47 ASCQ=0x0
May 14 13:12:37 marie end_request: I/O error, dev sde, sector 126745431
May 14 13:12:37 marie RAID5 conf printout:
May 14 13:12:37 marie --- rd:4 wd:3 fd:1
May 14 13:12:37 marie disk 0, o:1, dev:sdd1
May 14 13:12:37 marie disk 1, o:0, dev:sde1
May 14 13:12:37 marie disk 2, o:1, dev:sdf1
May 14 13:12:37 marie disk 3, o:1, dev:sdg1
May 14 13:12:37 marie RAID5 conf printout:
May 14 13:12:37 marie --- rd:4 wd:3 fd:1
May 14 13:12:37 marie disk 0, o:1, dev:sdd1
May 14 13:12:37 marie disk 2, o:1, dev:sdf1
May 14 13:12:37 marie disk 3, o:1, dev:sdg1
May 14 13:13:07 marie ata1: command timeout
May 14 13:13:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 13:13:07 marie sdd: Current: sense key=0x0
May 14 13:13:07 marie ASC=0x0 ASCQ=0x0
I checked the mdstat and saw that one disk was missing and the array was inactive, so I added the "failed" harddrive (since it really was okay, tested in another machine). It added fine but I couldn't get the array up. I also tried to get the array up without the "failed" drive but couldn't, but it should be able to right?
Here is some info from mdstat and mdadm, notice that the "failed" drive seems to think that everything is fine?!
[root@marie hvidgaard]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
md0 : inactive sdd1[0] sdg1[3] sdf1[2]
732587520 blocks
unused devices: <none>
******************************
[root@marie hvidgaard]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sun Mar 12 17:01:28 2006
Raid Level : raid5
Device Size : 244195840 (232.88 GiB 250.06 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun May 14 13:32:07 2006
State : active, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
Events : 0.791947
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
4218 0 0 0 removed
2 8 81 2 active sync /dev/sdf1
3 8 97 3 active sync /dev/sdg1
******************************
[root@marie hvidgaard]# mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
Creation Time : Sun Mar 12 17:01:28 2006
Raid Level : raid5
Device Size : 244195840 (232.88 GiB 250.06 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun May 14 13:32:07 2006
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 2
Spare Devices : 0
Checksum : b6ba56c0 - correct
Events : 0.791947
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 8 49 0 active sync /dev/sdd1
0 0 8 49 0 active sync /dev/sdd1
1 1 0 0 1 faulty removed
2 2 8 81 2 active sync /dev/sdf1
3 3 8 97 3 active sync /dev/sdg1
******************************
[root@marie hvidgaard]# mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
Creation Time : Sun Mar 12 17:01:28 2006
Raid Level : raid5
Device Size : 244195840 (232.88 GiB 250.06 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun May 14 12:45:37 2006
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : b6ba4bdb - correct
Events : 0.791942
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 1 8 65 1 active sync /dev/sde1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 65 1 active sync /dev/sde1
2 2 8 81 2 active sync /dev/sdf1
3 3 8 97 3 active sync /dev/sdg1
******************************
[root@marie hvidgaard]# mdadm -E /dev/sdf1
/dev/sdf1:
Magic : a92b4efc
Version : 00.90.00
UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
Creation Time : Sun Mar 12 17:01:28 2006
Raid Level : raid5
Device Size : 244195840 (232.88 GiB 250.06 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun May 14 13:32:07 2006
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 2
Spare Devices : 0
Checksum : b6ba56e4 - correct
Events : 0.791947
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 2 8 81 2 active sync /dev/sdf1
0 0 8 49 0 active sync /dev/sdd1
1 1 0 0 1 faulty removed
2 2 8 81 2 active sync /dev/sdf1
3 3 8 97 3 active sync /dev/sdg1
******************************
[root@marie hvidgaard]# mdadm -E /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 00.90.00
UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
Creation Time : Sun Mar 12 17:01:28 2006
Raid Level : raid5
Device Size : 244195840 (232.88 GiB 250.06 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sun May 14 13:32:07 2006
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 2
Spare Devices : 0
Checksum : b6ba56f6 - correct
Events : 0.791947
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 3 8 97 3 active sync /dev/sdg1
0 0 8 49 0 active sync /dev/sdd1
1 1 0 0 1 faulty removed
2 2 8 81 2 active sync /dev/sdf1
3 3 8 97 3 active sync /dev/sdg1
I have also tried to replace the controller and set the drives in the same ordre, but no luck.
Any suggestions or solutions?
I have found this post
http://groups.google.com/group/mlist.li … 534e5bcacb
And perhaps mdadm -C -l5 -n4 /dev/md0 /dev/sdd1 missing /dev/sdf1 /dev/sdg1 will work, but the data on the array is pretty important (and the array went down one day before scheduled backup )
Offline
oooups - forgot to add the error at startup:
May 14 21:28:26 marie Freeing unused kernel memory: 252k freed
May 14 21:28:26 marie md: md0 stopped.
May 14 21:28:26 marie md: bind<sde1>
May 14 21:28:26 marie md: bind<sdf1>
May 14 21:28:26 marie md: bind<sdg1>
May 14 21:28:26 marie md: bind<sdd1>
May 14 21:28:26 marie md: kicking non-fresh sde1 from array!
May 14 21:28:26 marie md: unbind<sde1>
May 14 21:28:26 marie md: export_rdev(sde1)
May 14 21:28:26 marie md: md0: raid array is not clean -- starting background reconstruction
May 14 21:28:26 marie raid5: device sdd1 operational as raid disk 0
May 14 21:28:26 marie raid5: device sdg1 operational as raid disk 3
May 14 21:28:26 marie raid5: device sdf1 operational as raid disk 2
May 14 21:28:26 marie raid5: cannot start dirty degraded array for md0
May 14 21:28:26 marie RAID5 conf printout:
May 14 21:28:26 marie --- rd:4 wd:3 fd:1
May 14 21:28:26 marie disk 0, o:1, dev:sdd1
May 14 21:28:26 marie disk 2, o:1, dev:sdf1
May 14 21:28:26 marie disk 3, o:1, dev:sdg1
May 14 21:28:26 marie raid5: failed to run raid set md0
May 14 21:28:26 marie md: pers->run() failed ...
May 14 21:28:26 marie Adding 506008k swap on /dev/sda1. Priority:-1 extents:1 across:506008k
May 14 21:28:26 marie EXT3 FS on sda2, internal journal
May 14 21:28:26 marie XFS: SB read failed
Offline
Solved
mdadm --create /dev/md0 --chunk=128 --level=5 --raid-devices=4 /dev/sdd1 missing /dev/sdf1 /dev/sdg1
did the trick. I used the exact same command as when I first created the array (of cause not with 'missing'), added the last drive back to the array and I'm back to business 8)
Offline