You are not logged in.
Since a week, sometimes my system won't startup because the boot partition can not be mounted.
After 1-3 reboots it is working again. It happend in 5 of 22 bootings starting march 24.
I did a update ('pacman -Syu') right before the error happens first (2023-03-24T09:13:18+0100), this included a kernel update: 6.2.7.arch1-1 -> 6.2.8.arch1-1
Maybe someone can tell me more about the error.
Here the error from journald:
-- Boot d775bd7f3c714476b042e0a20788b925 --
Mär 24 09:43:02 loges-desktop kernel: Linux version 6.2.8-arch1-1 (linux@archlinux) (gcc (GCC) 12.2.1 20230201, GNU ld (GNU Binutils) 2.40) #1 SMP PREEMPT_DYNAMIC Wed, 22 Mar 2023 22:52:35 +0000
Mär 24 09:43:02 loges-desktop kernel: Command line: initrd=\initramfs-linux.img root=LABEL=p_arch rw resume=LABEL=p_swap
...
Mär 24 09:43:02 loges-desktop kernel: nvme nvme1: pci function 0000:01:00.0
Mär 24 09:43:02 loges-desktop kernel: nvme nvme0: pci function 0000:08:00.0
Mär 24 09:43:02 loges-desktop kernel: nvme nvme1: missing or invalid SUBNQN field.
Mär 24 09:43:02 loges-desktop kernel: nvme nvme0: missing or invalid SUBNQN field.
Mär 24 09:43:02 loges-desktop kernel: nvme nvme1: Shutdown timeout set to 8 seconds
Mär 24 09:43:02 loges-desktop kernel: nvme nvme0: Shutdown timeout set to 8 seconds
Mär 24 09:43:02 loges-desktop kernel: nvme nvme1: 16/0/0 default/read/poll queues
Mär 24 09:43:02 loges-desktop kernel: nvme nvme0: 16/0/0 default/read/poll queues
Mär 24 09:43:02 loges-desktop kernel: nvme1n1: p1 p2
Mär 24 09:43:02 loges-desktop kernel: nvme0n1: p1 p2 p3
Mär 24 09:43:02 loges-desktop kernel: EXT4-fs (nvme0n1p2): mounted filesystem 883d6f47-0b5c-453a-9642-53e9bc8e29da with ordered data mode. Quota mode: none.
Mär 24 09:43:02 loges-desktop kernel: usb 3-4: new high-speed USB device number 2 using xhci_hcd
...
Mär 24 09:43:02 loges-desktop kernel: EXT4-fs (nvme0n1p2): re-mounted 883d6f47-0b5c-453a-9642-53e9bc8e29da. Quota mode: none.
...
Mär 24 09:43:03 loges-desktop systemd[1]: Mounting /boot...
Mär 24 09:43:03 loges-desktop kernel: igb: Intel(R) Gigabit Ethernet Network Driver
Mär 24 09:43:03 loges-desktop kernel: igb: Copyright (c) 2007-2014 Intel Corporation.
Mär 24 09:43:03 loges-desktop kernel: cryptd: max_cpu_qlen set to 1000
Mär 24 09:43:03 loges-desktop systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Mär 24 09:43:03 loges-desktop kernel: asus-ec-sensors asus-ec-sensors: board has 8 EC sensors that span 10 registers
Mär 24 09:43:03 loges-desktop kernel: FAT-fs (nvme1n1p1): bogus number of reserved sectors
Mär 24 09:43:03 loges-desktop kernel: FAT-fs (nvme1n1p1): Can't find a valid FAT filesystem
Mär 24 09:43:03 loges-desktop systemd[1]: boot.mount: Mount process exited, code=exited, status=32/n/a
Mär 24 09:43:03 loges-desktop mount[469]: mount: /boot: Falscher Dateisystemtyp, ungültige Optionen, der Superblock von /dev/nvme1n1p1 ist beschädigt, fehlende Kodierungsseite oder ein anderer Fehler.
Mär 24 09:43:03 loges-desktop mount[469]: dmesg(1) könnte nach einem fehlgeschlagenen mount-Systemaufruf
Mär 24 09:43:03 loges-desktop mount[469]: weitere Informationen liefern.
Mär 24 09:43:03 loges-desktop systemd[1]: boot.mount: Failed with result 'exit-code'.
...
Mär 24 09:43:03 loges-desktop systemd[1]: Failed to mount /boot.
Mär 24 09:43:03 loges-desktop systemd[1]: Dependency failed for Local File Systems.
Mär 24 09:43:03 loges-desktop systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
/etc/fstab
# UUID=883d6f47-0b5c-453a-9642-53e9bc8e29da LABEL=p_arch
/dev/nvme1n1p2 / ext4 rw,defaults,noatime,discard 0 1
# UUID=1121-2A22 LABEL=EFIBOOT
/dev/nvme1n1p1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 2
# UUID=75b5dd50-7189-4687-a324-ba54e1985dbd LABEL=p_swap
/dev/nvme1n1p3 none swap defaults,noatime,discard 0 0
Last edited by Norkos (2023-04-28 06:11:21)
Offline
fsck it. Are you shutting down "abnormally" when reproducing the issue (power button, sudden shutdown)? How is the space situation on /boot, do you have sufficient free space? FWIW a FAT partition in general is not really the most advanced/fault resilient FS and stuff like this is one of the main reasons I'm personally not using FAT for my /boot but keep the kernels on an ext4 partition and using a bootloader that can read kernels from there.
Offline
fsck it.
Shows "differences between boot sector and its backup" and the "dirty bit is set" - but I assume that's because I have it mounted right now.
Next time the issue appears I will try again.
Are you shutting down "abnormally" when reproducing the issue (power button, sudden shutdown)?
- Always shutdown by using the Gnome UI
- the first time this happens was after a reboot
- System was installed in 2019 (swapped SSD to NVME in 2020), but no changes in the recent past
- I can't intentionally reproduce it, yet
How is the space situation on /boot, do you have sufficient free space?
Only 20% in use
/dev/nvme1n1p1 511M 100M 412M 20% /boot
Offline
After a week with no occurrences of the behavior, now I got it three times in a row.
But now I could fsck it:
# fsck /dev/nvme1n1p1
fsck von util-linux 2.38.1
fsck.fat 4.2 (2021-01-31)
Currently, only 1 or 2 FATs are supported, not 0.
Offline
Zero FATs effectively mean that there's no usable FS at all on the drive.
nvme nvme0: missing or invalid SUBNQN field.
https://unix.stackexchange.com/question … bnqn-field
https://forums.gentoo.org/viewtopic-t-1 … art-0.html
https://wiki.archlinux.org/title/Solid_ … leshooting
Offline
Your fstab has /dev/nvme1n1p1 , p2 , p3 (three partitions on nvme1)
Your dmesg has nvme1n1: p1 p2 (only two partitions on nvme1)
But there is nvme0n1: p1 p2 p3 (three partitions on nvme0)
Furthermore your fstab says (# commented out # UUID 883d6f47-0b5c-453a-9642-53e9bc8e29da) for /dev/nvme1n1p2 but dmesg says EXT4-fs (nvme0n1p2): mounted filesystem 883d6f47-0b5c-453a-9642-53e9bc8e29da
You've hardcoded your device names in fstab... that might be fine in some setups but just like sda, sdb, ... nvme drives are perfectly capable of randomly changing their order / numbering. nvme device names are assigned first come, first serve. if any of your nvme's is detected a little late for any reason, the order changes.
On my system with a single onboard nvme slot, and nvme addon card, this happens sometimes. So using /dev/nvme in fstab does not work for me.
Stick to UUID or LABEL (for both kernel/initrd parameters, and fstab) and with any luck you won't have these issues anymore.
When working with device names in a root shell directly, always verify what is what first (with l sblk, blkid, etc.)
Without UUID/LABEL you'd have to use LVM or similar solution which provides stable device names for you (but that really just means that LVM is using UUIDs in your place).
Last edited by frostschutz (2023-04-15 14:05:34)
Online
@Seth
When its the firmware of the drive, it could explain why it happens after years of using exactly this setup.
Assuming, there was a firmware update. Which I never did proactive, but maybe it was done in the background.
I'm using a Asus Prime X470-PRO mainboard, with two NVME slots. Both running Samsung 970 EVO Plus.
@frostschutz
I changed my fstab now, lets see if the error happens again.
fstab before:
# UUID=883d6f47-0b5c-453a-9642-53e9bc8e29da LABEL=p_arch
/dev/nvme1n1p2 / ext4 rw,defaults,noatime,discard 0 1
# UUID=1121-2A22 LABEL=EFIBOOT
/dev/nvme1n1p1 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 2
# UUID=75b5dd50-7189-4687-a324-ba54e1985dbd LABEL=p_swap
/dev/nvme1n1p3 none swap defaults,noatime,discard 0 0
fstab now:
# UUID=883d6f47-0b5c-453a-9642-53e9bc8e29da LABEL=p_arch
UUID=883d6f47-0b5c-453a-9642-53e9bc8e29da / ext4 rw,defaults,noatime,discard 0 1
# UUID=1121-2A22 LABEL=EFIBOOT
UUID=1121-2A22 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro 0 2
# UUID=75b5dd50-7189-4687-a324-ba54e1985dbd LABEL=p_swap
UUID=75b5dd50-7189-4687-a324-ba54e1985dbd none swap defaults,noatime,discard 0 0
Thanks guys.
Offline
After changing the fstab to UUIDs the error no longer occurred - so I marked the thread as solved.
Offline