You are not logged in.

#1 2023-12-23 11:49:00

justdanyul
Member
Registered: 2011-09-29
Posts: 130

mdadm and adding a manually failed drive

Hi all,

I recently set up an Msmtp and I now wanted to verify that I would be emailed on raid failure. So, I manually flagged an device as failed as follows


mdadm -f /dev/md127 /dev/sda1

It worked as expected. The drive was flagged as failed and I got an email send delivered to the configured email address informing me of the failure.  Next, I attempted to add the failed drive back into the array as follows

mdadm --manage /dev/md127 -a /dev/sda1

this didn't work. I got the following message.

mdadm: Cannot open /dev/sda1: Device or resource busy

I thought the easiest way to remove what ever lock might be present on the device , would be to reboot. So I did. But, I was greeted with the same message. When attempting to stop the RAID, after unmounting all volumes , the following happened

mdadm --stop /dev/md127
mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?

At this stage, I noticed that /proc/mdproc was actually in the process of recovering my raid. I waited for this process to finish, and the device I had manually failed had been added back into the RAID.

Now, just for my personal understanding. What actually happened here? Did the reboot trigger a reassemble, which then started the recovery process?

Offline

#2 2023-12-23 14:00:17

frostschutz
Member
Registered: 2013-11-15
Posts: 1,421

Re: mdadm and adding a manually failed drive

--fail only marks a drive as failed, it does not remove it from the array, for that you have to follow it up with --remove (or --stop the array altogether).

Without remove, mdadm --add says "Device or resource busy" because the failed device is still part of the md device, and still listed in /proc/mdstat with (F) failed mark.

The error message in this case could be better...

Do you have syslog of when the reassembly / rebuild process triggered? Just stopping and reassembling the array should not trigger it.

Offline

#3 2023-12-23 21:07:49

justdanyul
Member
Registered: 2011-09-29
Posts: 130

Re: mdadm and adding a manually failed drive

all I got in the logs is

Dec 23 11:21:32 server kernel: md/raid10:md127: Disk failure on sda1, disabling device.
                               md/raid10:md127: Operation continuing on 3 devices.
Dec 23 11:26:46 server kernel: md/raid10:md127: active with 3 out of 4 devices
Dec 23 11:34:12 server kernel: md/raid10:md127: active with 4 out of 4 devices

and im back to

cat /proc/mdstat
Personalities : [raid10]
md127 : active raid10 sdd1[3] sda1[0] sdb1[1] sdc1[2]
      23437502464 blocks super 1.2 512K chunks 2 far-copies [4/4] [UUUU]

Last edited by justdanyul (2023-12-23 21:08:30)

Offline

Board footer

Powered by FluxBB