You are not logged in.

#1 2023-08-07 19:51:05

Ibex
Member
Registered: 2006-03-02
Posts: 135

[SOLVED] System boots previous kernel after kernel upgrade

I'm using zfs-linux, so I'm often lagging a bit behind on updates. To keep most of my packages up-to-date, I often update the system while ignoring linux and zfs-linux packages.

Today, they got in sync again, so I could update to linux 6.4.8. The update seemingly went fine, but after a reboot I was stuck on an emergency prompt where the system complained it could not mount boot. After a bit of digging, this was because it was actually running linux 6.4.2 while only 6.4.8 modules were available under /lib/modules/.

My /boot partition was mounted during the update, and the initramfs-linux.img and vmlinuz-linux files were updated. The checksum of vmlinuz-linux was identical to the vmlinuz file under /lib/modules/..6.4.8../.

To recover, I just installed linux 6.4.2 and corresponding zfs-linux from the pacman cache, then removed everything from the /boot folder before mounting the actual boot partition (which then was possible again since the required modules were available again), and running pacman -U with linux 6.4.2 and corresponding zfs-linux again (to make sure that the contents of /boot were also back in synd with the reinstalled 6.4.2. After that, I can boot in 6.4.2 again.

I have no idea why I seem to be able to install linux 6.4.8 successful, yet on boot it runs 6.4.2. I suspect there is something misconfigured and the system boots a kernel that's located on a file not updated properly by the update. Anybody that can point me into the right direction?

Last edited by Ibex (2023-08-08 16:40:20)

Offline

#2 2023-08-07 20:46:18

seth
Member
Registered: 2012-09-03
Posts: 54,562

Re: [SOLVED] System boots previous kernel after kernel upgrade

My /boot partition was mounted during the update

./.

removed everything from the /boot folder before mounting the actual boot partition

Did you try to update the kernel again?

Offline

#3 2023-08-08 04:56:54

Ibex
Member
Registered: 2006-03-02
Posts: 135

Re: [SOLVED] System boots previous kernel after kernel upgrade

seth wrote:

./.

Not sure what you mean here.

seth wrote:

Did you try to update the kernel again?

Yes, and I had the same issue.

To summarize what I did:
0. Happily running linux 6.4.2
1. Verified /boot was mounted, did a system upgrade (quite a lot of packages, including linux 6.4.8 and corresponding zfs-linux).
2. Verified /boot/vmlinuz-linux and /lib/modules/..6.4.8../vmlinuz were the same files
3. Reboot
4. Stuck in emergency prompt
5. Noticed /boot was not and could not be mounted due to mo modules available. Running uname -a reported kernel 6.4.2 active
6. Installed linux 6.4.2. This generated new files under /boot, which was now just a folder, since /boot wasn't mounted
7. Removed everything from this /boot folder, and then mounted /boot (which worked again, since now I had the right modules for the running kernel
8. Installed linux 6.4.2 again, to make sure the files under /boot (now mounted), matched the 6.4.2 ones
9. I could successfully boot

If instead of 9 I installed 6.4.8 again, I was stuck at the emergency prompt again on boot.

Offline

#4 2023-08-08 05:43:08

seth
Member
Registered: 2012-09-03
Posts: 54,562

Re: [SOLVED] System boots previous kernel after kernel upgrade

I mean that there should be no files in the mountpoint for the boot partition if the boot partion was properly mounted.

Please post your complete system journal:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

Offline

#5 2023-08-08 06:35:54

Ibex
Member
Registered: 2006-03-02
Posts: 135

Re: [SOLVED] System boots previous kernel after kernel upgrade

seth wrote:

Please post your complete system journal:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

At what moment? Before the initial installation of 6.4.8, right after the update, or at the recovery prompt?

Offline

#6 2023-08-08 06:48:11

seth
Member
Registered: 2012-09-03
Posts: 54,562

Re: [SOLVED] System boots previous kernel after kernel upgrade

"Now" - I want to see where you're booting from, what you're mounting, how and whether there're IO errors.
Maybe "sudo touch /boot/foo" or so before to trigger pot. issues itr.

Offline

#7 2023-08-08 07:04:15

Ibex
Member
Registered: 2006-03-02
Posts: 135

Re: [SOLVED] System boots previous kernel after kernel upgrade

-link to journal logs-

Already a huge thanks for helping me.

Last edited by Ibex (2023-08-08 16:40:55)

Offline

#8 2023-08-08 07:40:14

seth
Member
Registered: 2012-09-03
Posts: 54,562

Re: [SOLVED] System boots previous kernel after kernel upgrade

https://wiki.archlinux.org/title/SLiM (read the warning!) seems to murk around in /boot

df -h
lsblk -f

and

Aug 07 22:00:00 aeolus zrepl[1965]: 2023-08-07T22:00:00+02:00 [WARN][snapshot_boot][hook][pgTD$LAED$NYQM$NYQM]: hook output command="/etc/zrepl/hooks/boot.sh" stderr="1023+1 records in" snap="zrepl_20230807_200000_000" fs="zroot/boot"

looks like there's some automated snapshot organization for the boot directory.

If the boot partition isn't simply full, make sure the /boot partition is mounted, update the kernel, check "file /boot/vmlinuz-linux" for what kernel is actually presently there, re-check that /boot is still mounted.
Then take a look into older journals whether there's some shutdown scripts that maybe restore any snapshots (of the /boot path)

Offline

#9 2023-08-08 07:56:22

Ibex
Member
Registered: 2006-03-02
Posts: 135

Re: [SOLVED] System boots previous kernel after kernel upgrade

Didn't know SLiM was so outdated. Using that for a long long time now, time for something else. I'll take a look at this after work.

The outputs:

  ~ ························································································· at  09:50:28
❯ df -h
Filesystem            Size  Used Avail Use% Mounted on
zroot/ROOT/default    733G   47G  687G   7% /
devtmpfs              4.0M     0  4.0M   0% /dev
tmpfs                  24G   53M   24G   1% /dev/shm
tmpfs                 9.4G  9.9M  9.4G   1% /run
tmpfs                  24G   18M   24G   1% /tmp
/dev/zd0             1021M   97M  925M  10% /boot
zroot/data/home       813G  127G  687G  16% /home
zroot/data/home/root  687G  424M  687G   1% /root
tmpfs                 4.7G   60K  4.7G   1% /run/user/1000
  ~ ························································································· at  09:50:30
❯ lsblk -f
NAME         FSTYPE      FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
zd0          vfat        FAT32       DD91-7454                             924.4M     9% /boot
nvme0n1                                                                                  
├─nvme0n1p1  vfat        FAT32       DD91-7454                                           
└─nvme0n1p2  crypto_LUKS 2           2e6d9f54-b7c1-42e4-b71a-9bc9fcdd2e34                
  └─luks_zfs zfs_member  5000  zroot 7391103467436721126   
# zfs list
NAME                   USED  AVAIL     REFER  MOUNTPOINT
zroot                  236G   686G       96K  none
zroot/ROOT            59.0G   686G       96K  none
zroot/ROOT/default    59.0G   686G     46.6G  /
zroot/boot            21.7G   686G     1.01G  -
zroot/data             155G   686G       96K  none
zroot/data/home        155G   686G      126G  /home
zroot/data/home/root   436M   686G      423M  /root

The zrepl snapshot hook is running the following command before taking the snapshot:

dd if=/dev/nvme0n1p1 of=/dev/zvol/zroot/boot bs=1M conv=notrunc

I'll give it a try this evening (it's morning over here) after work and take a look at the mentioned things. Only question here is how I can see what kernel /boot/vmlinuz-linux actually is. I md5sum-compared it with "/lib/modules/..6.4.8../vmlinuz" and it was identical.

Offline

#10 2023-08-08 09:40:55

Ibex
Member
Registered: 2006-03-02
Posts: 135

Re: [SOLVED] System boots previous kernel after kernel upgrade

Whoa, I think I'm getting what's going on.

The device /dev/zd0 is mounted to /boot, but that's not the device that's supposed to be mounted. /dev/zd0 is my zvol that I use to snapshot the blockdevice that should be mounted as /boot, which should be my nvme0n1p1. So probably it boots from nvme0n1p1, loads the kernel which is on there, then starts ZFS which mounts /dev/zd0 on /boot instead of the correct device, and then runs the system as such. But if I then update the kernel, it writes to the zvol instead of the actual device, which might be the reason the old kernel is still booting.

It seems my /etc/fstab is incorrectly pointing towards the zvol. I wonder what I have done in the past to achieve this hmm.

Let's investigate further this evening.

Offline

#11 2023-08-08 16:40:05

Ibex
Member
Registered: 2006-03-02
Posts: 135

Re: [SOLVED] System boots previous kernel after kernel upgrade

The issue is solved. Let's give a breakdown on what's going on for the archive.

My /boot partition was mounted using /etc/fstab and most likely some reference to /dev/nvme0n1p1. I'm using dm-crypt for encryption and ZFS as backup (by taking snapshots and replication). The boot partition however, is on a vfat partition. To be able to take snapshots from this boot partition at the same moment as the rest of my disk, I'm using scripts to dd the block device into a ZFS volume and afterwards include it in my snapshots/replication.

This dd'ing causes the boot partition to have an identical copy on that ZFS volume, including it's UUID. At some point, for a reason I don't remember, I changed my /etc/fstab to use a UUID instead of the ID or classic device name. From that moment on, it seems like my system mounted the ZFS volume instead of my boot partition. This worked fine 'till the moment I did a kernel update. So my system boots from the actual boot partition and loads the kernel over there. It starts the system and at some point it needs modules to mount the boot partition into my system. There it failed, because the installed kernel and modules were another version.

I updated /etc/fstab to use the ID, mounted the correct device and updated the kernel. I rebooted and everything worked as expected.

Thanks again @seth for your time and help, it was the lsblk -f that eventually cleared the mist in my head. Next up is migrating away from SLiM, but that's not for this topic smile.

Offline

Board footer

Powered by FluxBB