You are not logged in.

#1 2024-03-03 09:26:30

Akusari
Member
Registered: 2019-02-26
Posts: 18

New mdadm package 4.3-1: Alarm spins around

Hello guys,

after more then two years a new version of mdadm dropped in and my own alarm scripts has some trouble to deal with it. :-/
Unfortunately it looks like that the new version spins around during the boot phase and devices are tested if they are active or not.
Of course a new assemble and renew the mdadm.conf file was my first step und a mkinitcpio runs after that.
At the moment i include a dirty hack with file in my script which leads to other problems if a "real" error happens.
However, do you have an solution for the new version that it's testing  devices on boot ?
(I tested a package downgrade of course and everything is fine again)

My script output:

 journalctl -b -u mdmonitor.service
Mär 03 09:54:26 daten-box systemd[1]: Started MD array monitor.
Mär 03 09:54:26 daten-box mdadm[719]: mdadm: DeviceDisappeared event detected on md device /dev/md/md127
Mär 03 09:54:26 daten-box mdadm[727]: Assume boot in progress. Waiting...
Mär 03 09:54:26 daten-box mdadm[719]: mdadm: DeviceDisappeared event detected on md device /dev/md/md1
Mär 03 09:54:26 daten-box mdadm[746]: Assume boot in progress. Waiting...
Mär 03 09:54:26 daten-box mdadm[719]: mdadm: NewArray event detected on md device /dev/md127
Mär 03 09:54:26 daten-box mdadm[755]: Assume boot in progress. Waiting...
Mär 03 09:54:26 daten-box mdadm[719]: mdadm: NewArray event detected on md device /dev/md1
Mär 03 09:54:26 daten-box md_monitor_alarm.sh[769]: md_monitor_alarm.sh called with NewArray /dev/md1 arguments and no /var fs is present
Mär 03 09:54:26 daten-box mdadm[765]: Assume boot in progress Waiting...

mdadm.conf:

...
 MAILADDR <hidden>
PROGRAM /usr/local/bin/md_monitor_alarm.sh
# old arch package 4.2.2 #ARRAY /dev/md/daten-box:127 metadata=1.2 UUID=da1092e8:99e02141:26ae0fc4:e82b972a
# old arch packahe 4.2.2 #ARRAY /dev/md1 metadata=1.2 UUID=b615704c:787643d8:2dcd3a41:52cca4c6
# Reassemble with my rescue system and 'mkinitcpio -P' done
ARRAY /dev/md1 metadata=1.2 name=debian-rescue:pool UUID=b615704c:787643d8:2dcd3a41:52cca4c6
ARRAY /dev/md127 metadata=1.2 name=debian-rescue:ssd UUID=da1092e8:99e02141:26ae0fc4:e82b972a

My script:  (I'm a lazy bitch, so no comments about the script style please)

#!/usr/bin/bash
#
# MD raid alarm script
# Copyright Akusari 2023 (<hidden>)
#

# Called by mdadm
# $1 = event
# $2 = device

#
# Start conditions
#

if [ "$1" == "" ]; then
        echo "Missing first argument!"
        exit 255
fi

if [ "$2" == "" ]; then
        echo "Missing second argument!"
        exit 255
fi

#
# Logging
#

call_date="$(/usr/bin/date +%Y-%m-%d-%T)"
message="$(basename $0) called with $@ arguments"
log_file="/var/log/monitor_md_alarm.log"

if [ ! -f $log_file ]; then
  echo "$message and no /var fs is present" | /usr/bin/systemd-cat -p warning -t $(basename $0)
  echo "Assume boot in progress! Waiting..."
  exit 254
fi

echo "${call_date}: $message" >> $log_file
echo "$message" | /usr/bin/systemd-cat -p warning -t $(basename $0)

#
# conditions
#

if [ "$1" == "TestMessage" ]; then
        echo "${call_date}: Abort because Test-Mode detected" >> $log_file
        echo $1 | mail -Ssendwait -s "$(hostname) $(basename $0) device $2" root@<hidden>
        exit 0
fi

if [ "$(cat /sys/block/${2}/md/sync_action)" == "check" ]; then
        echo "Abort raid alarm because there is a raid check on $2 running" >> $log_file
        echo $1 | mail -Ssendwait -s "$(hostname) $(basename $0) device $2" root@<hidden>
        exit 0
fi

if [[ $1 == "Rebuild"* ]]; then
        echo "Abort raid alarm because there is a raid rebuilding on $2 running" >> $log_file
        echo $1 | mail -Ssendwait -s "$(hostname) $(basename $0) device $2" root@<hidden>
        exit 0
fi


if [ "$1" == "NewArray" ]; then
        echo "New Array $2 detected - There is nothing todo for us" >> $log_file
        echo $1 | mail -Ssendwait -s "$(hostname) $(basename $0) device $2" root@<hidden>
        exit 0
fi

if [ "$1" == "RebuildStarted" ]; then
        echo "Rebuild Array $2 detected - There is nothing to do for us" >> $log_file
        echo $1 | mail -Ssendwait -s "$(hostname) $(basename $0) device $2" root@<hidden>
        exit 0
fi

if [ "$1" == "RebuildFinished" ]; then
        echo "Abort raid alarm because Raid rebuild $2 finised" >> $log_file
        echo $1 | mail -Ssendwait -s "$(hostname) $(basename $0) device $2" root@<hidden>
        exit 0
fi

#
# Run endless
#

while true; do
        if [ ! -f /.md_silent ]; then
           echo -e '\a' > /dev/console
        fi
        sleep 30

        if [ -f /.md_exit ]; then
           rm /.md_exit
           break
        fi
done

exit 0

Regards
Akusari

Last edited by Akusari (2024-03-03 09:29:47)

Offline

#2 2024-03-03 16:34:11

WorMzy
Forum Moderator
From: Scotland
Registered: 2010-06-16
Posts: 11,892
Website

Re: New mdadm package 4.3-1: Alarm spins around

Seems like a bug, but maybe raise it on the mdadm mailing list. If you're not overly reliant on the device name being /dev/md#, then you could just change it in your mdadm.conf to match what mdadm 'wants' to call it. i.e. /dev/md/md127 and /dev/md/md1.

FWIW I can reproduce locally if I change

ARRAY /dev/md/ssdraid metadata=1.2 name=sakura:ssdraid UUID=b7a499c0:c424e415:011ce8fd:934931ab

to

ARRAY /dev/md127 metadata=1.2 name=sakura:ssdraid UUID=b7a499c0:c424e415:011ce8fd:934931ab

mdadm creates /dev/md/md127 first, then removes that and creates /dev/md127, triggering the DeviceDisappeared and NewArray events.


Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD

Making lemonade from lemons since 2015.

Offline

#3 2024-03-03 21:39:01

Akusari
Member
Registered: 2019-02-26
Posts: 18

Re: New mdadm package 4.3-1: Alarm spins around

WorMzy wrote:

Seems like a bug, but maybe raise it on the mdadm mailing list. If you're not overly reliant on the device name being /dev/md#, then you could just change it in your mdadm.conf to match what mdadm 'wants' to call it. i.e. /dev/md/md127 and /dev/md/md1.

Yes, i think it's a bug and not a feature too. Thanks anyway :-)

I guess that we could get a problem with your (good) suggestion, because 99% of all Wiki's and documentations around mdadm using the old /dev/mdX standard and this might be a problem for a lot of users.
It should be fixed anyway.   

Regards
Akusari

Last edited by Akusari (2024-03-04 20:25:29)

Offline

#4 2024-03-04 10:03:57

frostschutz
Member
Registered: 2013-11-15
Posts: 1,419

Re: New mdadm package 4.3-1: Alarm spins around

You can set MONITORDELAY in mdadm.conf but I'm not sure if it would help in this case.

You could just ignore calls with unusual array names in your script. Or ask the linux-raid mailing list for advice.

Offline

#5 2024-03-04 20:37:50

Akusari
Member
Registered: 2019-02-26
Posts: 18

Re: New mdadm package 4.3-1: Alarm spins around

frostschutz wrote:

You can set MONITORDELAY in mdadm.conf but I'm not sure if it would help in this case.

It's a problem from the mdadm script itself (upstream problem), so it doesn't work.

frostschutz wrote:

You could just ignore calls with unusual array names in your script.

Yeah, but i moved on to symbolic /dev/md/X struct and it works for me anyway.
BTW: A possible package downgrade no longer works since the ARCH-Team announced a mkinitcpio upgrade: https://archlinux.org/news/mkinitcpio-h … microcode/

But thanks anyway for your help :-)

Regards
Akusari

Last edited by Akusari (2024-03-04 21:19:01)

Offline

Board footer

Powered by FluxBB