Recover MDADM raid 5 from disk failure and possible clobbering

thoss · 2022-05-04 03:58:43

I have a 4-disk RAID 5 on a server consisting of 1TB hot swappable drives with LUKS on LVM.

Today after rebooting the server after a few days of downtime, I noticed one of the drives was failing with a blinking light on the chassis, the server wasn't booting, and giving read errors that looked correspondent to the drive showing an error LED.

I pulled the drive and rebooted the machine, hoping to get the degraded array up, and ordered a replacement. While booting, a systemd fsck job was running for a long time, but I had to go to work so I left and came back.

Job is still running 8 hours later. Turned off the server and turned it back on. Now another drive seems to be giving read errors. Weird.

I decided to boot to live ISO and replace the first failed drive to see if it will work since I'm not able to FSCK the root partition after unlocking; unlocking works though.

However, when I tried to add the old drive back to the array, MDADM started "rebuilding" the array -- oh no. I stopped the array immediately.

I tried to re-assemble the array by creating a new RAID 5 with 4 devices, naming the first failing drive (which sort of appears to be healthy) and the two good drives. It'll recognize as an array now but I can't run cryptsetup on that device.

Is there any way to recover from this? Can I attempt to copy the superblock or other metadata from the 4th drive and write it to the other three drives?

Arch Linux

#1 2022-05-04 03:58:43

Recover MDADM raid 5 from disk failure and possible clobbering

Board footer