You are not logged in.
I have a 4 disk raid 6 mdraid array set up that i've been running for a couple years (2-3) in my NAS device.
In the last few months, I've had repeated instances where when I boot the NAS or bring it up from sleep (i suspend it when not in use), one of the disks will fail to join the array. SMART self tests say the disks are still healthy, and if I re-add it with:
sudo mdadm --manage /dev/md127 --re-add /dev/sdb1
... then it recovers fine. But today, it happened twice with the same disk: I booted it up, noticed a disk was missing, put it back in, rebooted after a kernel update, and the same disk was missing.
Any idea what I need to check? Is this a sign the disks are dying? Or is it just a weird new mdraid bug? Can I adjust the timeout for how long it waits for disks to spin up before calling them failed?
Here's my dmesg from the current boot: http://0x0.st/Hyce.txt
Offline
SMART self tests say the disks are still healthy
What exactly does it say?
Ideally also please post your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
sdd shows up 500ms after a,b&c but is not the troublesome device and there're no further errors or even oddities (about sdb) in the dmesg.
There's no indication that the drive somehow failed to spin up and the raid is built 35s into the boot.
One more thing you could to is try to swap it w/ a "good" device (assuming it's always sdb) to see whether the problem moves (to eg. sdc) or the new disk on the same bus also exhibits the problem.
Offline
what exactly does it say?
I'm just looking at http://0x0.st/Hya6.txt
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
The first time I noticed it it was /dev/sdd, but this time it was /dev/sdb twice in a row.
Offline
The smart data looks uncritical - the drive order isn't necessarily deterministic, you'll have to check the disk UUID to see whether it's always the same drive.
Since the raid is create late into the boot, what happens in the userspace in the meantime and is there anything interesting in the journal?
Offline
It might also be a good idea to run a `smartctl -t long` on the drive. I have multiple Reds and it will take a while. Do not suspend/reboot the system during testing.
Offline
Hmm. I did notice a few cases where the journalctl logs showed:
Sep 23 12:33:50 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Sep 25 19:07:41 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Sep 26 19:33:58 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 03 18:22:25 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 04 18:28:23 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 06 18:38:03 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 07 15:30:38 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 12 18:47:04 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 13 20:50:41 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 13 22:13:00 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 13 22:13:24 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
Oct 27 11:31:54 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdb1 from array!
but I suspect that's just subsequent boots, where it found a disk that had been kicked out before and noticed it was out of sync on boot.
I also noticed a bunch of warnings that /tmp wasn't empty when mounting the tmpfs on top of it, so I looked inside and found this odd thing:
$ ls -lah rootbind/tmp/
total 16K
drwxrwxrwt 1 root root 272 Oct 28 13:44 ./
drwxr-xr-x 1 root root 134 Sep 24 12:01 ../
brw------- 1 root root 9, 127 Aug 18 19:47 .tmp.md.293:9:127
brw------- 1 root root 9, 127 Sep 17 15:53 .tmp.md.303:9:127
brw------- 1 root root 9, 127 Aug 9 21:05 .tmp.md.305:9:127
brw------- 1 root root 9, 127 Oct 28 13:44 .tmp.md.306:9:127
brw------- 1 root root 9, 127 Aug 11 16:50 .tmp.md.307:9:127
brw------- 1 root root 9, 127 Sep 30 08:38 .tmp.md.308:9:127
brw------- 1 root root 9, 127 Oct 18 18:35 .tmp.md.309:9:127
brw------- 1 root root 9, 127 Sep 10 10:06 .tmp.md.310:9:127
For some reason it's creating block devices under /tmp before /tmp is mounted, which is suspicious, but might be a red herring.
I think the disk names do line up, because every message in dmesg i can find shows that sda1 matches raid disk 0, sdb1 is raid disk 1, etc. However, I have written down the disk identifiers from fdisk in a file so I can compare the next time this happens.
I also notice a lot of these, even for boots where the raid array is healthy:
$ sudo journalctl -b 0 | grep md127
Oct 28 13:44:56 nas-bot.uraxa.lan kernel: md/raid:md127: device sdb1 operational as raid disk 1
Oct 28 13:44:56 nas-bot.uraxa.lan kernel: md/raid:md127: device sda1 operational as raid disk 0
Oct 28 13:44:56 nas-bot.uraxa.lan kernel: md/raid:md127: device sdd1 operational as raid disk 3
Oct 28 13:44:56 nas-bot.uraxa.lan kernel: md/raid:md127: device sdc1 operational as raid disk 2
Oct 28 13:44:56 nas-bot.uraxa.lan kernel: md/raid:md127: raid level 6 active with 4 out of 4 devices, algorithm 2
Oct 28 13:44:56 nas-bot.uraxa.lan systemd[1]: Started Timer to wait for more drives before activating degraded array md127..
Oct 28 13:44:56 nas-bot.uraxa.lan kernel: md127: detected capacity change from 0 to 23441149952
Oct 28 13:44:56 nas-bot.uraxa.lan systemd[1]: mdadm-last-resort@md127.timer: Deactivated successfully.
Oct 28 13:44:56 nas-bot.uraxa.lan systemd[1]: Stopped Timer to wait for more drives before activating degraded array md127..
This, is saying it found all 4 disks, started them all up and added them to the raid device but it's starting a timer to wait for more drives for the degraded array. But the array is not degraded.
Last observation: I see this warning every boot; could it mean something?
Oct 28 13:44:55 nas-bot.uraxa.lan kernel: block device autoloading is deprecated and will be removed.
I've started a long test on all four disks, and will update with findings when I get them.
Thanks for the help so far!
Last edited by scott_fakename (2023-10-28 23:19:32)
Offline
I'm just looking at http://0x0.st/Hya6.txt
Am I missing something? Looks like those tests were over 6,300 hours ago.
$ sudo smartctl a /dev/sda
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9817
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3491 -
------------------------------
$ sudo smartctl a /dev/sdb
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9961
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3636 -
------------------------------
$ sudo smartctl a /dev/sdc
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9640
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3315 -
------------------------------
$ sudo smartctl a /dev/sdd
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9941
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3637
Cheers,
Edit -- see below
If there was a problem with the drives you would see some noise in the logs, but the logs you posted so far are suspiciously silent.
Last edited by dakota (2023-10-29 19:20:32)
"Before Enlightenment chop wood, carry water. After Enlightenment chop wood, carry water." -- Zen proverb
Online
mdadm.conf? mdadm --examine? (preferably --examine before --assemble, but it happens automatically, unless you spin up an initramfs with mdadm, but without md raid kernel modules).
mdadm creates /dev/.tmp.md or /tmp/.tmp.md internally, that by itself should not be a cause of concern, lots of programs use temporary files.
If there was a problem with the drives you would see some noise in the logs, but the logs you posted so far are suspiciously silent. So I suspect it's a configuration issue of some kind, or a bug. It would be nice to see output of mdadm --assemble directly (if you disable the automatisms, perhaps by including a 0-byte /etc/udev/rules.d/64-md-raid-assembly.rules - but systemd also has some services like mdadm-last-resort and others).
Last edited by frostschutz (2023-10-29 07:10:41)
Offline
Sep 23 12:33:50 nas-bot.uraxa.lan kernel: md: kicking non-fresh sdd1 from array!
[...]
Oct 28 13:44:56 nas-bot.uraxa.lan systemd[1]: Started Timer to wait for more drives before activating degraded array md127..
Maybe your are looking at the wrong side of the boot process - those messages imply that maybe some things on the shutdown of the raid failed or were incomplete. You'll need to "catch" a system log of a shutdown preceding each case.
Last edited by -thc (2023-10-29 10:10:24)
Offline
I'd highly recommend to not grep around for random tokens in the journal but to look at the startup of the raid in full context.
Offline
Am I missing something? Looks like those tests were over 6,300 hours ago.
You're right, good catch, those were very out of date. Reran, and the fresh SMART self test assesses it ok: http://0x0.st/HyjL.txt
I'd highly recommend to not grep around for random tokens in the journal but to look at the startup of the raid in full context.
I wasn't, i was looking at it all in context but didn't see anything else specifically mentioning the disks, and only posted the journalctl logs that looked the most suspicious. Here's a complete journalctl log from a bad boot: http://0x0.st/HyjI.txt here's the journal from the boot before the most recent 3/4 disk boot: http://0x0.st/Hyeo.txt and here's from a clean, warm reboot: http://0x0.st/Hyjl.txt
Maybe your are looking at the wrong side of the boot process - those messages imply that maybe some things on the shutdown of the raid failed or were incomplete. You'll need to "catch" a system log of a shutdown preceding each case.
Interesting I wonder if what's happening is the disks spin down due to idleness, and then I power the machine off, it finishes shutting down before the disk spins up to be properly closed. Next time it boots, it finds the disk out of sync and kicks it. Something to think about.
mdadm.conf? mdadm --examine? (preferably --examine before --assemble, but it happens automatically, unless you spin up an initramfs with mdadm, but without md raid kernel modules).
My mdadm.conf contains mostly just a bunch of commented out lines and then the only non empty non comment line is:
DEVICE partitions
Last edited by scott_fakename (2023-10-29 17:32:46)
Offline
Interesting I wonder if what's happening is the disks spin down due to idleness, and then I power the machine off, it finishes shutting down before the disk spins up to be properly closed. Next time it boots, it finds the disk out of sync and kicks it. Something to think about.
Interesting theory. Can you disable power saving?
Offline
https://man.archlinux.org/man/core/hdparm/hdparm.8.en#C
But
Oct 28 13:43:39 nas-bot.uraxa.lan kernel: XFS (dm-1): Unmounting Filesystem b15d9f35-85cb-44c1-ad81-bd0a6e3b4b62
Oct 28 13:44:12 nas-bot.uraxa.lan kernel: md127: detected capacity change from 23441149952 to 0
Oct 28 13:44:12 nas-bot.uraxa.lan kernel: md: md127 stopped.
Oct 28 13:44:21 nas-bot.uraxa.lan systemd-logind[399]: The system will reboot now!
And there's really nothing in the journal suggesting anything to be wrong w/ any drive, next to the self-tests, there're also no signs of trouble (re-allocated/pending sectors) in the smart data.
You could also try to explicitly umount/stop the raid well before the shutdown.
Likewise not automounting it, but only explicitly well after the boot.
Also, stupid question:
What if you boldly mask mdadm-last-resort@md127.timer and mdadm-last-resort@md127.service ?
Edit: likely unrelated, but you're running dhcpcd and systemd-networkd concurrently; pick one, disable the other.
Last edited by seth (2023-10-30 07:53:12)
Offline