You are not logged in.

#1 2019-03-08 05:02:04

slackcub
Member
Registered: 2009-03-14
Posts: 144

mdadm raid 5 reshape "stuck" and drive errors

I added a new drive to my mdadm raid 5 array last night, and since everything started off ok, I went to bed.  Wasn't able to look at the status before I left for work this morning.  I got home tonight to see that the reshaping has stopped advancing. 

I'm going to try and type here what I am seeing.  Unfortunately it doesn't have network access, so I can't just copy/paste

in /proc/mdstat, the progress line hasn't updated, it's showing 14.0% (274171900/1951428608) with an absurdly high finish time and a speed of only 17K/sec.

mdadm --detail /dev/md0 says the status is 'clean, reshaping'

In the journal, I don't see anything current, however I do see a lot of errors over night, but those eventually stopped appearing.  I'm assuming when the errors stopped showing up is when it stopped advancing. The errors I'm seeing are actually related to a disk that was already in the array, not the new one. 

What I am seeing is as follows.  I tried to type them out as best I could, please excuse any possible mistypes I may have done.

ata6.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
ata6.00: irq_stat 0x40000008
ata6.00: failed command: READ FPDMA QUEUED
ata6.00: cmd 60/08:98:40:72:60/00:00:01:00:00/40 tag 19 ncq dma 4096 in
         res 41/40:00:40:72:60/00:00:01:00:00/40 Emask 0x409 (media error) <F>
ata6.00: status: { DRDY ERR }
ata6.00: error: { UNC }
ata6.00: configured for UDMA/133
sd 5:0:0:0: [sde] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sde] tag#19 Sense Key : Medium Error [current]
sd 5:0:0:0: [sde] tag#19 Add. Sense: Unrecovered read error - auto reallocate failed
sd 5:0:0:0: [sde] tag#19 CDB: Read(10) 28 00 01 60 72 40 00 00 08 00
print_req_error: I/O error, dev sde, sector 23097920
ata6: EH complete
md/raid:md0: read error corrected (8 sectors at 23095872 on sde1)

The read error corrected lines didn't happen every time the other log entries happened (with different sector numbers), but they did happen typically in bunches together, multiples lines one after each other.
All the other lines occur regularly, though the values in the cmd and res lines do change.

I do have a spare attached to the array still, 3 devices, reshaping to 4, with 1 additional marked as spare.

I do regularly check the status of my array, and it has always shown clean, so I'm surprised to see this now. 

It doesn't seem like all is lost as I still have access to the data on the array. 

What I can do to safely remove sde from the array and get through the reshaping?  Is that even possible?

Why didn't mdadm --detail or /proc/mdstat report any issues with the array?

Last edited by slackcub (2019-03-08 05:02:43)

Offline

Board footer

Powered by FluxBB