You are not logged in.

#1 2006-05-15 21:59:24

hvidgaard
Member
Registered: 2006-05-15
Posts: 10

[Solved :D]raid5 array mismatch after unknown failure

Hi...

I recently (2 week ago) had a system crash. Only thing I was able to do was a system reset. Since I was in a hurry and everything went fine I did no more.

Now yesterday I had the same failure but this time the array did not get up again.

Here is the errorlog (seems to be the controller)

May 14 12:44:38 marie irq 18: nobody cared (try booting with the "irqpoll" option)
May 14 12:44:38 marie [<c0143f3a>] __report_bad_irq+0x2a/0xa0
May 14 12:44:38 marie [<c01436fd>] handle_IRQ_event+0x3d/0x70
May 14 12:44:38 marie [<c0144057>] note_interrupt+0x87/0xf0
May 14 12:44:38 marie [<c014382d>] __do_IRQ+0xfd/0x110
May 14 12:44:38 marie [<c01059e9>] do_IRQ+0x19/0x30
May 14 12:44:38 marie [<c0103c1e>] common_interrupt+0x1a/0x20
May 14 12:44:38 marie [<c0100de0>] default_idle+0x0/0x60
May 14 12:44:38 marie [<c0100e0c>] default_idle+0x2c/0x60
May 14 12:44:38 marie [<c0100ebf>] cpu_idle+0x5f/0x80
May 14 12:44:38 marie [<c044ea55>] start_kernel+0x195/0x1f0
May 14 12:44:38 marie [<c044e3c0>] unknown_bootoption+0x0/0x1f0
May 14 12:44:38 marie handlers:
May 14 12:44:38 marie [<e0c945b0>] (pdc_interrupt+0x0/0x1c0 [sata_promise])
May 14 12:44:38 marie Disabling IRQ #18
May 14 12:45:07 marie ata2: command timeout
May 14 12:45:07 marie ata2: status=0x50 { DriveReady SeekComplete }
May 14 12:45:08 marie ata4: command timeout
May 14 12:45:08 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 12:45:08 marie sdg: Current: sense key=0x0
May 14 12:45:08 marie ASC=0x0 ASCQ=0x0
May 14 12:45:37 marie ata2: command timeout
May 14 12:45:37 marie ata2: status=0x50 { DriveReady SeekComplete }
May 14 12:45:37 marie sde: Current: sense key=0x0
May 14 12:45:37 marie ASC=0x0 ASCQ=0x0
May 14 12:46:07 marie ata1: command timeout
May 14 12:46:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie ata4: command timeout
May 14 12:46:07 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie ata3: command timeout
May 14 12:46:07 marie ata3: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie ata2: command timeout
May 14 12:46:07 marie ata2: status=0x50 { DriveReady SeekComplete }
May 14 12:46:07 marie sde: Current: sense key=0x0
May 14 12:46:07 marie ASC=0x0 ASCQ=0x0
May 14 12:46:37 marie ata1: command timeout
May 14 12:46:37 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 12:46:37 marie sdd: Current: sense key=0x0
May 14 12:46:37 marie ASC=0x0 ASCQ=0x0
May 14 12:46:37 marie ata3: command timeout
May 14 12:46:37 marie ata3: status=0x50 { DriveReady SeekComplete }
May 14 12:46:37 marie sdf: Current: sense key=0x0
May 14 12:46:37 marie ASC=0x0 ASCQ=0x0
May 14 12:46:37 marie ata4: command timeout
May 14 12:46:37 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 12:46:37 marie sdg: Current: sense key=0x0
May 14 12:46:37 marie ASC=0x0 ASCQ=0x0
May 14 12:47:07 marie ata1: command timeout
May 14 12:47:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 12:47:07 marie sdd: Current: sense key=0x0
May 14 12:47:07 marie ASC=0x0 ASCQ=0x0
May 14 12:47:37 marie ata1: command timeout
.
.
Hell lot of seek errors on ata1-4
.
.
and a few of 
May 14 13:03:07 marie ata4: command timeout
May 14 13:03:07 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 13:03:07 marie sdg: Current: sense key=0x0
May 14 13:03:07 marie ASC=0x0 ASCQ=0x0
May 14 13:03:07 marie ata1: command timeout
May 14 13:03:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 13:03:07 marie sdd: Current: sense key=0x0
May 14 13:03:07 marie ASC=0x0 ASCQ=0x0
May 14 13:03:07 marie ata2: command timeout
May 14 13:03:07 marie ATA: abnormal status 0xFF on port 0xE0C9E29C
May 14 13:03:07 marie ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
May 14 13:03:07 marie ata2: status=0xff { Busy }
May 14 13:03:07 marie sd 5:0:0:0: SCSI error: return code = 0x8000002
May 14 13:03:07 marie sde: Current: sense key=0xb
May 14 13:03:07 marie ASC=0x47 ASCQ=0x0
May 14 13:03:07 marie end_request: I/O error, dev sde, sector 213838143
May 14 13:03:07 marie ata3: command timeout
May 14 13:03:07 marie ata3: status=0x50 { DriveReady SeekComplete }
May 14 13:03:07 marie sdf: Current: sense key=0x0
May 14 13:03:07 marie ASC=0x0 ASCQ=0x0
May 14 13:03:37 marie ata4: command timeout
May 14 13:03:37 marie ata4: status=0x50 { DriveReady SeekComplete }
May 14 13:03:37 marie sdg: Current: sense key=0x0
May 14 13:03:37 marie ASC=0x0 ASCQ=0x0
...
May 14 13:12:37 marie ASC=0x0 ASCQ=0x0
May 14 13:12:37 marie ata2: command timeout
May 14 13:12:37 marie ATA: abnormal status 0xFF on port 0xE0C9E29C
May 14 13:12:37 marie ata2: translated ATA stat/err 0xff/00 to SCSI SK/ASC/ASCQ 0xb/47/00
May 14 13:12:37 marie ata2: status=0xff { Busy }
May 14 13:12:37 marie sd 5:0:0:0: SCSI error: return code = 0x8000002
May 14 13:12:37 marie sde: Current: sense key=0xb
May 14 13:12:37 marie ASC=0x47 ASCQ=0x0
May 14 13:12:37 marie end_request: I/O error, dev sde, sector 126745431
May 14 13:12:37 marie RAID5 conf printout:
May 14 13:12:37 marie --- rd:4 wd:3 fd:1
May 14 13:12:37 marie disk 0, o:1, dev:sdd1
May 14 13:12:37 marie disk 1, o:0, dev:sde1
May 14 13:12:37 marie disk 2, o:1, dev:sdf1
May 14 13:12:37 marie disk 3, o:1, dev:sdg1
May 14 13:12:37 marie RAID5 conf printout:
May 14 13:12:37 marie --- rd:4 wd:3 fd:1
May 14 13:12:37 marie disk 0, o:1, dev:sdd1
May 14 13:12:37 marie disk 2, o:1, dev:sdf1
May 14 13:12:37 marie disk 3, o:1, dev:sdg1
May 14 13:13:07 marie ata1: command timeout
May 14 13:13:07 marie ata1: status=0x50 { DriveReady SeekComplete }
May 14 13:13:07 marie sdd: Current: sense key=0x0
May 14 13:13:07 marie ASC=0x0 ASCQ=0x0

I checked the mdstat and saw that one disk was missing and the array was inactive, so I added the "failed" harddrive (since it really was okay, tested in another machine). It added fine but I couldn't get the array up. I also tried to get the array up without the "failed" drive but couldn't, but it should be able to right?

Here is some info from mdstat and mdadm, notice that the "failed" drive seems to think that everything is fine?!

[root@marie hvidgaard]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10]
md0 : inactive sdd1[0] sdg1[3] sdf1[2]
      732587520 blocks

unused devices: <none>

******************************

[root@marie hvidgaard]# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sun Mar 12 17:01:28 2006
     Raid Level : raid5
    Device Size : 244195840 (232.88 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun May 14 13:32:07 2006
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
         Events : 0.791947

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   /dev/sdd1
    4218       0        0        0      removed
       2       8       81        2      active sync   /dev/sdf1
       3       8       97        3      active sync   /dev/sdg1

******************************

[root@marie hvidgaard]# mdadm -E /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
  Creation Time : Sun Mar 12 17:01:28 2006
     Raid Level : raid5
    Device Size : 244195840 (232.88 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun May 14 13:32:07 2006
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : b6ba56c0 - correct
         Events : 0.791947

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     0       8       49        0      active sync   /dev/sdd1

   0     0       8       49        0      active sync   /dev/sdd1
   1     1       0        0        1      faulty removed
   2     2       8       81        2      active sync   /dev/sdf1
   3     3       8       97        3      active sync   /dev/sdg1

******************************

[root@marie hvidgaard]# mdadm -E /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
  Creation Time : Sun Mar 12 17:01:28 2006
     Raid Level : raid5
    Device Size : 244195840 (232.88 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun May 14 12:45:37 2006
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : b6ba4bdb - correct
         Events : 0.791942

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     1       8       65        1      active sync   /dev/sde1

   0     0       8       49        0      active sync   /dev/sdd1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       81        2      active sync   /dev/sdf1
   3     3       8       97        3      active sync   /dev/sdg1

******************************

[root@marie hvidgaard]# mdadm -E /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
  Creation Time : Sun Mar 12 17:01:28 2006
     Raid Level : raid5
    Device Size : 244195840 (232.88 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun May 14 13:32:07 2006
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : b6ba56e4 - correct
         Events : 0.791947

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     2       8       81        2      active sync   /dev/sdf1

   0     0       8       49        0      active sync   /dev/sdd1
   1     1       0        0        1      faulty removed
   2     2       8       81        2      active sync   /dev/sdf1
   3     3       8       97        3      active sync   /dev/sdg1

******************************

[root@marie hvidgaard]# mdadm -E /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : f4b7594e:8642ee21:e60ee0cb:156e4ac9
  Creation Time : Sun Mar 12 17:01:28 2006
     Raid Level : raid5
    Device Size : 244195840 (232.88 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Sun May 14 13:32:07 2006
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : b6ba56f6 - correct
         Events : 0.791947

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this     3       8       97        3      active sync   /dev/sdg1

   0     0       8       49        0      active sync   /dev/sdd1
   1     1       0        0        1      faulty removed
   2     2       8       81        2      active sync   /dev/sdf1
   3     3       8       97        3      active sync   /dev/sdg1

I have also tried to replace the controller and set the drives in the same ordre, but no luck.

Any suggestions or solutions?

I have found this post
http://groups.google.com/group/mlist.li … 534e5bcacb

And perhaps mdadm -C -l5 -n4 /dev/md0 /dev/sdd1 missing /dev/sdf1 /dev/sdg1 will work, but the data on the array is pretty important (and the array went down one day before scheduled backup sad)

Offline

#2 2006-05-15 22:02:50

hvidgaard
Member
Registered: 2006-05-15
Posts: 10

Re: [Solved :D]raid5 array mismatch after unknown failure

oooups - forgot to add the error at startup:

May 14 21:28:26 marie Freeing unused kernel memory: 252k freed
May 14 21:28:26 marie md: md0 stopped.
May 14 21:28:26 marie md: bind<sde1>
May 14 21:28:26 marie md: bind<sdf1>
May 14 21:28:26 marie md: bind<sdg1>
May 14 21:28:26 marie md: bind<sdd1>
May 14 21:28:26 marie md: kicking non-fresh sde1 from array!
May 14 21:28:26 marie md: unbind<sde1>
May 14 21:28:26 marie md: export_rdev(sde1)
May 14 21:28:26 marie md: md0: raid array is not clean -- starting background reconstruction
May 14 21:28:26 marie raid5: device sdd1 operational as raid disk 0
May 14 21:28:26 marie raid5: device sdg1 operational as raid disk 3
May 14 21:28:26 marie raid5: device sdf1 operational as raid disk 2
May 14 21:28:26 marie raid5: cannot start dirty degraded array for md0
May 14 21:28:26 marie RAID5 conf printout:
May 14 21:28:26 marie --- rd:4 wd:3 fd:1
May 14 21:28:26 marie disk 0, o:1, dev:sdd1
May 14 21:28:26 marie disk 2, o:1, dev:sdf1
May 14 21:28:26 marie disk 3, o:1, dev:sdg1
May 14 21:28:26 marie raid5: failed to run raid set md0
May 14 21:28:26 marie md: pers->run() failed ...
May 14 21:28:26 marie Adding 506008k swap on /dev/sda1.  Priority:-1 extents:1 across:506008k
May 14 21:28:26 marie EXT3 FS on sda2, internal journal
May 14 21:28:26 marie XFS: SB read failed

Offline

#3 2006-05-16 18:50:26

hvidgaard
Member
Registered: 2006-05-15
Posts: 10

Re: [Solved :D]raid5 array mismatch after unknown failure

Solved big_smile

mdadm --create /dev/md0 --chunk=128 --level=5 --raid-devices=4 /dev/sdd1 missing /dev/sdf1 /dev/sdg1

did the trick. I used the exact same command as when I first created the array (of cause not with 'missing'), added the last drive back to the array and I'm back to business  8)

Offline

Board footer

Powered by FluxBB