You are not logged in.
Hi all,
I recently set up an Msmtp and I now wanted to verify that I would be emailed on raid failure. So, I manually flagged an device as failed as follows
mdadm -f /dev/md127 /dev/sda1
It worked as expected. The drive was flagged as failed and I got an email send delivered to the configured email address informing me of the failure. Next, I attempted to add the failed drive back into the array as follows
mdadm --manage /dev/md127 -a /dev/sda1
this didn't work. I got the following message.
mdadm: Cannot open /dev/sda1: Device or resource busy
I thought the easiest way to remove what ever lock might be present on the device , would be to reboot. So I did. But, I was greeted with the same message. When attempting to stop the RAID, after unmounting all volumes , the following happened
mdadm --stop /dev/md127
mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?
At this stage, I noticed that /proc/mdproc was actually in the process of recovering my raid. I waited for this process to finish, and the device I had manually failed had been added back into the RAID.
Now, just for my personal understanding. What actually happened here? Did the reboot trigger a reassemble, which then started the recovery process?
Offline
--fail only marks a drive as failed, it does not remove it from the array, for that you have to follow it up with --remove (or --stop the array altogether).
Without remove, mdadm --add says "Device or resource busy" because the failed device is still part of the md device, and still listed in /proc/mdstat with (F) failed mark.
The error message in this case could be better...
Do you have syslog of when the reassembly / rebuild process triggered? Just stopping and reassembling the array should not trigger it.
Offline
all I got in the logs is
Dec 23 11:21:32 server kernel: md/raid10:md127: Disk failure on sda1, disabling device.
md/raid10:md127: Operation continuing on 3 devices.
Dec 23 11:26:46 server kernel: md/raid10:md127: active with 3 out of 4 devices
Dec 23 11:34:12 server kernel: md/raid10:md127: active with 4 out of 4 devices
and im back to
cat /proc/mdstat
Personalities : [raid10]
md127 : active raid10 sdd1[3] sda1[0] sdb1[1] sdc1[2]
23437502464 blocks super 1.2 512K chunks 2 far-copies [4/4] [UUUU]
Last edited by justdanyul (2023-12-23 21:08:30)
Offline