[SOLVED] mdadm / RAID trouble

Fackamato · 2010-10-20 08:50:48

Hi all,

In relation to this post: Migrating data to a "new" setup, questions

I think I've a problem with my new RAID5 array everything seemed to work fine, I was transferring stuff to it, which went ok. Then I reduced the old fs on /home (which held everything) to the minimum size ~2.3TB, and pvmove:ed a partition off it so I could add it to the RAID array. After unmounting the raid to do an fsck, something happened (though it didn't say!), it's like the RAID array just disappeared.

Here's the dmesg: http://dpaste.org/1oUi/ snippet below:

md0: detected capacity change from 4000795590656 to 0
md: md0 stopped.
md: unbind<sdb>
md: export_rdev(sdb)
md: unbind<sda>
md: export_rdev(sda)
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md/raid:md0: device sdb1 operational as raid disk 1
md/raid:md0: device sda1 operational as raid disk 0
md/raid:md0: allocated 3175kB
md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
RAID conf printout:
--- level:5 rd:3 wd:2
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
md0: detected capacity change from 0 to 4000793231360
RAID conf printout:
--- level:5 rd:3 wd:2
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:sdc1
md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
md: using 128k window, over a total of 1953512320 blocks.
md0: detected capacity change from 0 to 4000793231360
md0: unknown partition table
EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs (dm-2): re-mounted. Opts: (null)
ata2.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x6 frozen
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/00:00:00:cf:70/01:00:e7:00:00/40 tag 0 ncq 131072 in
res 40/00:04:00:37:4b/00:00:de:00:00/40 Emask 0x4 (timeout)

etc etc...

Here is messages.log: http://dpaste.org/OO7I/
I also have the SMART (smartctl --all) info for all (sda ... sdg) drives:http://dpaste.org/40Wd/

I'm rebooting the server now to see if that helps...

edit: Ugh, can't ssh in! BALLS. I'll have to check the status when I get home..

In the mean time, do you guys think the array is borked?

Last edited by Fackamato (2010-11-01 12:10:55)

lilsirecho · 2010-10-20 15:54:57

Query the mdadm.conf file and/or /etc/proc/mdstat for some info on your raid.

Fackamato · 2010-10-20 16:44:53

The server doesn't boot for some reason. I'm using the Archboot 2010 r7 usb stick now, trying to salvage things... (6TB of precious data)

[Arch Linux: /]# mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.

[Arch Linux: /]# cat /proc/mdstat 
Personalities : [raid0] [raid6] [raid5] [raid4] 
md0 : inactive sda1[0](S) sdc1[3](S) sdb1[1](S)
      5860537608 blocks super 1.2
       
unused devices: <none>

I don't understand. It's a RAID5 array!!

[Arch Linux: /]# fdisk -l /dev/sd?

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9196636e

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  3907029167  1953513560   fd  Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x73ad1b41

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048  3907029167  1953513560   fd  Linux raid autodetect

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
63 heads, 63 sectors/track, 984386 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x8bf2433e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  3907029167  1953513560   fd  Linux raid autodetect

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x933782af

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048      206847      102400   83  Linux
/dev/sdd2          206848    46344191    23068672   8e  Linux LVM

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sde doesn't contain a valid partition table

Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9a8ba58b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1              63  2930272064  1465136001   8e  Linux LVM

Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1               1  1953525167   976762583+  8e  Linux LVM

Disk /dev/sdh: 8086 MB, 8086618112 bytes
64 heads, 32 sectors/track, 7712 cylinders, total 15794176 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x6b8b4567

dmesg:

md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: md0 stopped.
md: unbind<sda1>
md: export_rdev(sda1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: md0 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: bind<sda1>
md: md0 stopped.
md: unbind<sda1>
md: export_rdev(sda1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: md0 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: bind<sda1>
md: kicking non-fresh sdb1 from array!
md: unbind<sdb1>
md: export_rdev(sdb1)
md/raid:md0: device sda1 operational as raid disk 0
md/raid:md0: allocated 3175kB
md/raid:md0: not enough operational devices (2/3 failed)
RAID conf printout:
 --- level:5 rd:3 wd:1
 disk 0, o:1, dev:sda1
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md: md127 stopped.
md: bind<sdb1>
md: md127 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: md127 stopped.
md: bind<sdb>
md: md127 stopped.
md: unbind<sdb>
md: export_rdev(sdb)
md: md0 stopped.
md: unbind<sda1>
md: export_rdev(sda1)
md: unbind<sdc1>
md: export_rdev(sdc1)

Last edited by Fackamato (2010-10-20 16:52:18)

Fackamato · 2010-10-20 17:16:38

Edit: I hope I will look back at this some day and think "I was so dumb!", if I can find the mistake I did.

I tried assembling the md0 with --force option, which gave me this:

[root@ion tmp]# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
[root@ion tmp]# mdadm --force --assemble --verbose /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm: --assemble would set mdadm mode to "assemble", but it is already set to "manage".
[root@ion tmp]# mdadm --assemble --force --verbose /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot -1.
mdadm: forcing event count in /dev/sdb1(1) from 23514 upto 23521
mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/sdb1
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: no uptodate device for slot 2 of /dev/md0
mdadm: added /dev/sdc1 to /dev/md0 as -1
mdadm: added /dev/sda1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 2 drives (out of 3) and 1 spare.
[root@ion tmp]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0]
md0 : active raid5 sda1[0] sdc1[3] sdb1[1]
3907024640 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
[>....................] recovery = 0.4% (9697388/1953512320) finish=409.0min speed=79192K/sec

unused devices: <none>

Here is mdadm -E (examine) on the 3 partitions:

[root@ion ~]# mdadm -E /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
           Name : ion:0  (local to host ion)
  Creation Time : Tue Oct 19 08:58:41 2010
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 7814049280 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907024640 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : af0cc588:9b3de06f:692718a7:980ca2b5

    Update Time : Wed Oct 20 09:31:31 2010
       Checksum : 8881e667 - correct
         Events : 23521

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 0
   Array State : A.. ('A' == active, '.' == missing)

[root@ion ~]# mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
           Name : ion:0  (local to host ion)
  Creation Time : Tue Oct 19 08:58:41 2010
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 7814049280 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907024640 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 5f79327e:8a83b021:325b2627:01d0a19b

    Update Time : Wed Oct 20 03:38:00 2010
       Checksum : 95d33d54 - correct
         Events : 23514

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 1
   Array State : AAA ('A' == active, '.' == missing)

[root@ion ~]# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
           Name : ion:0  (local to host ion)
  Creation Time : Tue Oct 19 08:58:41 2010
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
     Array Size : 7814049280 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 3907024640 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 634f3893:7af5fdd3:7ff344c7:8e3c4cff

    Update Time : Wed Oct 20 09:31:31 2010
       Checksum : 60e9dd0a - correct
         Events : 23521

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : spare
   Array State : A.. ('A' == active, '.' == missing)

[root@ion tmp]# mdadm --assemble --verbose /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot -1.
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: no uptodate device for slot 2 of /dev/md0
mdadm: added /dev/sdc1 to /dev/md0 as -1
mdadm: added /dev/sda1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.
[root@ion tmp]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [raid0] 
md0 : inactive sda1[0](S) sdc1[3](S) sdb1[1](S)
      5860537608 blocks super 1.2
       
unused devices: <none>

Last edited by Fackamato (2010-10-20 17:26:28)

lilsirecho · 2010-10-20 17:42:54

Perhaps mkinitcpio needs to be run..........................

Fackamato · 2010-10-20 18:57:56

Thanks,

I sorted the boot problem, needed to reinstall grub for some reason.

Status of the RAID is now that it's recovering:

[root@ion ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0]
md0 : active raid5 sda1[0] sdc1[3] sdb1[1]
      3907024640 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
      [====>................]  recovery = 22.2% (433971632/1953512320) finish=332.1min speed=76240K/sec

unused devices: <none>

edit:

I'm quite worried about this:

sata_mv: Highpoint RocketRAID BIOS CORRUPTS DATA on all attached drives, regardless of if/how they are configured. BEWARE!

They aren't configured at all in the RocketRAID BIOS. All RAID partitions start at sector 2048 (to get it aligned) and use the rest of the space.. I'm a bit worried here!!

Last edited by Fackamato (2010-10-20 18:59:55)

Fackamato · 2010-10-21 06:43:28

This is weird. The array was recovering during the night, and today I get this:

ata2: EH complete
ata2.00: NCQ disabled due to excessive errors
ata2.00: exception Emask 0x0 SAct 0x7fe SErr 0x0 action 0x6 frozen
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/00:08:00:d0:70/01:00:e7:00:00/40 tag 1 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/38:10:c8:cf:70/00:00:e7:00:00/40 tag 2 ncq 28672 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/c8:18:00:cf:70/00:00:e7:00:00/40 tag 3 ncq 102400 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/00:20:00:ce:70/01:00:e7:00:00/40 tag 4 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/e8:28:18:cd:70/00:00:e7:00:00/40 tag 5 ncq 118784 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/10:30:08:cd:70/00:00:e7:00:00/40 tag 6 ncq 8192 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/00:38:00:d2:70/01:00:e7:00:00/40 tag 7 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/08:40:00:cd:70/00:00:e7:00:00/40 tag 8 ncq 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/00:48:00:cc:70/01:00:e7:00:00/40 tag 9 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2.00: failed command: READ FPDMA QUEUED
ata2.00: cmd 60/00:50:00:d1:70/01:00:e7:00:00/40 tag 10 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2.00: device reported invalid CHS sector 0
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: failed command: READ DMA EXT
ata2.00: cmd 25/00:00:00:cc:70/00:01:e7:00:00/e0 tag 0 dma 131072 in
res 40/00:0c:a8:6c:e9/00:00:e1:00:00/40 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: failed command: READ DMA EXT
ata2.00: cmd 25/00:00:00:cc:70/00:01:e7:00:00/e0 tag 0 dma 131072 in
res 40/00:0c:a8:6c:e9/00:00:e1:00:00/40 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
sd 1:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08
sd 1:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 00 00 a7
sd 1:0:0:0: [sdb] ASC=0x0 ASCQ=0x0
sd 1:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 e7 70 cc 00 00 01 00 00
end_request: I/O error, dev sdb, sector 3882929152
md/raid:md0: read error not correctable (sector 3882927104 on sdb1).
md/raid:md0: Disk failure on sdb1, disabling device.
<1>md/raid:md0: Operation continuing on 1 devices.
md/raid:md0: read error not correctable (sector 3882927112 on sdb1).
md/raid:md0: read error not correctable (sector 3882927120 on sdb1).
md/raid:md0: read error not correctable (sector 3882927128 on sdb1).
md/raid:md0: read error not correctable (sector 3882927136 on sdb1).
md/raid:md0: read error not correctable (sector 3882927144 on sdb1).
md/raid:md0: read error not correctable (sector 3882927152 on sdb1).
md/raid:md0: read error not correctable (sector 3882927160 on sdb1).
md/raid:md0: read error not correctable (sector 3882927168 on sdb1).
md/raid:md0: read error not correctable (sector 3882927176 on sdb1).
ata2: EH complete
md: md0: recovery done.

etc... Hm. Is sdb borked? How can I check it?

edit: Trying to recover.

[root@ion ~]# cat /proc/mdstat && mdadm --detail --verbose /dev/md0
Personalities : [raid6] [raid5] [raid4] [raid0]
md0 : active raid5 sda1[0] sdc1[3] sdb1[1]
3907024640 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
[=>...................] recovery = 6.9% (134951552/1953512320) finish=429.1min speed=70626K/sec
unused devices: <none>
/dev/md0:
Version : 1.2
Creation Time : Tue Oct 19 08:58:41 2010
Raid Level : raid5
Array Size : 3907024640 (3726.03 GiB 4000.79 GB)
Used Dev Size : 1953512320 (1863.01 GiB 2000.40 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Thu Oct 21 09:05:13 2010
State : clean, degraded, recovering
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 128K
Rebuild Status : 6% complete
Name : ion:0 (local to host ion)
UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
Events : 23550
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
3 8 33 2 spare rebuilding /dev/sdc1

I tried to add the forth drive (sdd1) but couldn't:

mdadm –add /dev/md0 /dev/hdd1
mdadm: add new device failed for /dev/hdd1 as 4: Invalid argument

edit: That's because the raid was active, duh. Stopped md0, could add the drive. I couldn't _grow_ it though, as it's recovering..

[root@ion ~]# mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue Oct 19 08:58:41 2010
     Raid Level : raid5
     Array Size : 3907024640 (3726.03 GiB 4000.79 GB)
  Used Dev Size : 1953512320 (1863.01 GiB 2000.40 GB)
   Raid Devices : 3
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Thu Oct 21 12:41:26 2010
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 2

         Layout : left-symmetric
     Chunk Size : 128K

 Rebuild Status : 52% complete

           Name : ion:0  (local to host ion)
           UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
         Events : 23568

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       3       8       33        2      spare rebuilding   /dev/sdc1

       4       8       49        -      spare   /dev/sdd1
[root@ion ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0]
md0 : active raid5 sdd1[4](S) sda1[0] sdc1[3] sdb1[1]
      3907024640 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]
      [==========>..........]  recovery = 52.0% (1016981760/1953512320) finish=232.6min speed=67084K/sec

unused devices: <none>

Last edited by Fackamato (2010-10-21 10:42:18)

Fackamato · 2010-10-22 07:09:18

Okay.

It looks like my sdb1 is broken. From different recovery attempts I get

md/raid:md0: read error not correctable (sector 3882927384 on sdb1)

etc..

When I run badblocks around these blocks I get:

[root@ion ~]# badblocks  -b 512 -o badblocks-sdb.txt -v -n /dev/sdb 3882927432 3882927360
Checking for bad blocks in non-destructive read-write mode
From block 3882927360 to 3882927432
Testing with random pattern: Pass completed, 0 bad blocks found.

So badblocks can't find any problems with those blocks, what gives?

Arch Linux

#1 2010-10-20 08:50:48

[SOLVED] mdadm / RAID trouble

#2 2010-10-20 15:54:57

Re: [SOLVED] mdadm / RAID trouble

#3 2010-10-20 16:44:53

Re: [SOLVED] mdadm / RAID trouble

#4 2010-10-20 17:16:38

Re: [SOLVED] mdadm / RAID trouble

#5 2010-10-20 17:42:54

Re: [SOLVED] mdadm / RAID trouble

#6 2010-10-20 18:57:56

Re: [SOLVED] mdadm / RAID trouble

#7 2010-10-21 06:43:28

Re: [SOLVED] mdadm / RAID trouble

#8 2010-10-22 07:09:18

Re: [SOLVED] mdadm / RAID trouble

Board footer