You are not logged in.
I'm using zfs-linux, so I'm often lagging a bit behind on updates. To keep most of my packages up-to-date, I often update the system while ignoring linux and zfs-linux packages.
Today, they got in sync again, so I could update to linux 6.4.8. The update seemingly went fine, but after a reboot I was stuck on an emergency prompt where the system complained it could not mount boot. After a bit of digging, this was because it was actually running linux 6.4.2 while only 6.4.8 modules were available under /lib/modules/.
My /boot partition was mounted during the update, and the initramfs-linux.img and vmlinuz-linux files were updated. The checksum of vmlinuz-linux was identical to the vmlinuz file under /lib/modules/..6.4.8../.
To recover, I just installed linux 6.4.2 and corresponding zfs-linux from the pacman cache, then removed everything from the /boot folder before mounting the actual boot partition (which then was possible again since the required modules were available again), and running pacman -U with linux 6.4.2 and corresponding zfs-linux again (to make sure that the contents of /boot were also back in synd with the reinstalled 6.4.2. After that, I can boot in 6.4.2 again.
I have no idea why I seem to be able to install linux 6.4.8 successful, yet on boot it runs 6.4.2. I suspect there is something misconfigured and the system boots a kernel that's located on a file not updated properly by the update. Anybody that can point me into the right direction?
Last edited by Ibex (2023-08-08 16:40:20)
Offline
My /boot partition was mounted during the update
./.
removed everything from the /boot folder before mounting the actual boot partition
…
Did you try to update the kernel again?
Offline
./.
Not sure what you mean here.
…
Did you try to update the kernel again?
Yes, and I had the same issue.
To summarize what I did:
0. Happily running linux 6.4.2
1. Verified /boot was mounted, did a system upgrade (quite a lot of packages, including linux 6.4.8 and corresponding zfs-linux).
2. Verified /boot/vmlinuz-linux and /lib/modules/..6.4.8../vmlinuz were the same files
3. Reboot
4. Stuck in emergency prompt
5. Noticed /boot was not and could not be mounted due to mo modules available. Running uname -a reported kernel 6.4.2 active
6. Installed linux 6.4.2. This generated new files under /boot, which was now just a folder, since /boot wasn't mounted
7. Removed everything from this /boot folder, and then mounted /boot (which worked again, since now I had the right modules for the running kernel
8. Installed linux 6.4.2 again, to make sure the files under /boot (now mounted), matched the 6.4.2 ones
9. I could successfully boot
If instead of 9 I installed 6.4.8 again, I was stuck at the emergency prompt again on boot.
Offline
I mean that there should be no files in the mountpoint for the boot partition if the boot partion was properly mounted.
Please post your complete system journal:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
Offline
Please post your complete system journal:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
At what moment? Before the initial installation of 6.4.8, right after the update, or at the recovery prompt?
Offline
"Now" - I want to see where you're booting from, what you're mounting, how and whether there're IO errors.
Maybe "sudo touch /boot/foo" or so before to trigger pot. issues itr.
Offline
-link to journal logs-
Already a huge thanks for helping me.
Last edited by Ibex (2023-08-08 16:40:55)
Offline
https://wiki.archlinux.org/title/SLiM (read the warning!) seems to murk around in /boot
df -h
lsblk -f
and
Aug 07 22:00:00 aeolus zrepl[1965]: 2023-08-07T22:00:00+02:00 [WARN][snapshot_boot][hook][pgTD$LAED$NYQM$NYQM]: hook output command="/etc/zrepl/hooks/boot.sh" stderr="1023+1 records in" snap="zrepl_20230807_200000_000" fs="zroot/boot"
looks like there's some automated snapshot organization for the boot directory.
If the boot partition isn't simply full, make sure the /boot partition is mounted, update the kernel, check "file /boot/vmlinuz-linux" for what kernel is actually presently there, re-check that /boot is still mounted.
Then take a look into older journals whether there's some shutdown scripts that maybe restore any snapshots (of the /boot path)
Offline
Didn't know SLiM was so outdated. Using that for a long long time now, time for something else. I'll take a look at this after work.
The outputs:
~ ························································································· at 09:50:28
❯ df -h
Filesystem Size Used Avail Use% Mounted on
zroot/ROOT/default 733G 47G 687G 7% /
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 24G 53M 24G 1% /dev/shm
tmpfs 9.4G 9.9M 9.4G 1% /run
tmpfs 24G 18M 24G 1% /tmp
/dev/zd0 1021M 97M 925M 10% /boot
zroot/data/home 813G 127G 687G 16% /home
zroot/data/home/root 687G 424M 687G 1% /root
tmpfs 4.7G 60K 4.7G 1% /run/user/1000
~ ························································································· at 09:50:30
❯ lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
zd0 vfat FAT32 DD91-7454 924.4M 9% /boot
nvme0n1
├─nvme0n1p1 vfat FAT32 DD91-7454
└─nvme0n1p2 crypto_LUKS 2 2e6d9f54-b7c1-42e4-b71a-9bc9fcdd2e34
└─luks_zfs zfs_member 5000 zroot 7391103467436721126
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zroot 236G 686G 96K none
zroot/ROOT 59.0G 686G 96K none
zroot/ROOT/default 59.0G 686G 46.6G /
zroot/boot 21.7G 686G 1.01G -
zroot/data 155G 686G 96K none
zroot/data/home 155G 686G 126G /home
zroot/data/home/root 436M 686G 423M /root
The zrepl snapshot hook is running the following command before taking the snapshot:
dd if=/dev/nvme0n1p1 of=/dev/zvol/zroot/boot bs=1M conv=notrunc
I'll give it a try this evening (it's morning over here) after work and take a look at the mentioned things. Only question here is how I can see what kernel /boot/vmlinuz-linux actually is. I md5sum-compared it with "/lib/modules/..6.4.8../vmlinuz" and it was identical.
Offline
Whoa, I think I'm getting what's going on.
The device /dev/zd0 is mounted to /boot, but that's not the device that's supposed to be mounted. /dev/zd0 is my zvol that I use to snapshot the blockdevice that should be mounted as /boot, which should be my nvme0n1p1. So probably it boots from nvme0n1p1, loads the kernel which is on there, then starts ZFS which mounts /dev/zd0 on /boot instead of the correct device, and then runs the system as such. But if I then update the kernel, it writes to the zvol instead of the actual device, which might be the reason the old kernel is still booting.
It seems my /etc/fstab is incorrectly pointing towards the zvol. I wonder what I have done in the past to achieve this .
Let's investigate further this evening.
Offline
The issue is solved. Let's give a breakdown on what's going on for the archive.
My /boot partition was mounted using /etc/fstab and most likely some reference to /dev/nvme0n1p1. I'm using dm-crypt for encryption and ZFS as backup (by taking snapshots and replication). The boot partition however, is on a vfat partition. To be able to take snapshots from this boot partition at the same moment as the rest of my disk, I'm using scripts to dd the block device into a ZFS volume and afterwards include it in my snapshots/replication.
This dd'ing causes the boot partition to have an identical copy on that ZFS volume, including it's UUID. At some point, for a reason I don't remember, I changed my /etc/fstab to use a UUID instead of the ID or classic device name. From that moment on, it seems like my system mounted the ZFS volume instead of my boot partition. This worked fine 'till the moment I did a kernel update. So my system boots from the actual boot partition and loads the kernel over there. It starts the system and at some point it needs modules to mount the boot partition into my system. There it failed, because the installed kernel and modules were another version.
I updated /etc/fstab to use the ID, mounted the correct device and updated the kernel. I rebooted and everything worked as expected.
Thanks again @seth for your time and help, it was the lsblk -f that eventually cleared the mist in my head. Next up is migrating away from SLiM, but that's not for this topic .
Offline