You are not logged in.
Good afternoon,
I am currently running 6.10.10-arch1-1 kernel on my system and I noticed that whenever I put my system to hibernation, while in initramfs, it fails to resume from it, saying "non-boot CPUs are not disabled". The system would then boot, but in a new state, and not in the one saved in resume image. Previously, my laptop had issues with suspension, in a sense that it would wake up immediately after being suspended, however, switching to FE Ethernet LINUX driver r8101 for kernel up to 6.1 and blacklisting built-in r8169 kernel module resolved these issues and now my laptop can suspend properly.
I am using Arch Linux in UEFI mode, with systemd-boot bootloader. I have Secure Boot enabled in deployed mode with self-signed keys using sbctl, as described in this ArchWiki article section on sbctl. I have also encrypted my disk with TPM2, bound it to PCR7 using Clevis and set it up to be unlocked automatically using an mkinitcpio hook, as described in this ArchWiki section on Clevis.
Here is what my partition setup looks like (output of lsblk):
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 238.5G 0 disk
├─nvme0n1p1 259:1 0 1G 0 part /efi
└─nvme0n1p2 259:2 0 237.5G 0 part
└─cryptlvm 254:0 0 237.5G 0 crypt
├─SystemGroup-swap 254:1 0 16G 0 lvm [SWAP]
└─SystemGroup-root 254:2 0 221.2G 0 lvm /
Here is /etc/mkinitcpio.conf, with all the relevant hooks:
MODULES=()
BINARIES=()
FILES=()
HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont numlock block clevis encrypt lvm2 filesystems resume fsck)
Here is the preset used for generation of an UKI:
ALL_kver="/boot/vmlinuz-linux"
PRESETS=('default' 'fallback')
default_uki="/efi/EFI/Linux/arch-linux.efi"
default_options="--splash /usr/share/systemd/bootctl/splash-arch.bmp"
fallback_uki="/efi/EFI/Linux/arch-linux-fallback.efi"
fallback_options="-S autodetect"
Here are the parameters I pass to the kernel:
nvme_core.default_ps_max_latency_us=0 cryptdevice=UUID=7941e0ce-4eef-421c-80a0-2b8a3d7391ad:cryptlvm root=/dev/SystemGroup/root rw resume=/dev/SystemGroup/swap
I use systemd-ukify for generating UKIs, however, no PCR11 measuerment is done.
Running the following commands for testing hibernation:
echo core | sudo tee /sys/power/pm_test
echo platform | sudo tee /sys/power/disk
echo disk | sudo tee /sys/power/state
yields the following output in dmesg:
[ 3570.989459] PM: hibernation: hibernation entry
[ 3571.002178] Filesystems sync: 0.007 seconds
[ 3571.002343] Freezing user space processes
[ 3571.004801] Freezing user space processes completed (elapsed 0.002 seconds)
[ 3571.004814] OOM killer disabled.
[ 3571.005024] PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff]
[ 3571.005027] PM: hibernation: Marking nosave pages: [mem 0x0009e000-0x0009efff]
[ 3571.005029] PM: hibernation: Marking nosave pages: [mem 0x000a0000-0x000fffff]
[ 3571.005031] PM: hibernation: Marking nosave pages: [mem 0x40000000-0x403fffff]
[ 3571.005047] PM: hibernation: Marking nosave pages: [mem 0x9046a000-0x9046afff]
[ 3571.005049] PM: hibernation: Marking nosave pages: [mem 0x90476000-0x90476fff]
[ 3571.005050] PM: hibernation: Marking nosave pages: [mem 0x973a2000-0x973a3fff]
[ 3571.005051] PM: hibernation: Marking nosave pages: [mem 0x9a9d1000-0x9aa59fff]
[ 3571.005055] PM: hibernation: Marking nosave pages: [mem 0x9ee26000-0x9fffdfff]
[ 3571.005115] PM: hibernation: Marking nosave pages: [mem 0x9ffff000-0xa7ffffff]
[ 3571.005242] PM: hibernation: Marking nosave pages: [mem 0xa8200000-0xffffffff]
[ 3571.006861] PM: hibernation: Basic memory bitmaps created
[ 3571.006967] PM: hibernation: Preallocating image memory
[ 3571.572361] PM: hibernation: Allocated 911188 pages for snapshot
[ 3571.572365] PM: hibernation: Allocated 3644752 kbytes in 0.56 seconds (6508.48 MB/s)
[ 3571.572368] Freezing remaining freezable tasks
[ 3571.573773] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 3571.593708] printk: Suspending console(s) (use no_console_suspend to debug)
[ 3571.594143] wlan0: deauthenticating from 48:ee:0c:99:0a:16 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 3571.606639] pcieport 0000:00:1d.0: AER: Multiple Correctable error message received from 0000:01:00.0
[ 3571.606651] r8101 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[ 3571.606653] r8101 0000:01:00.0: device [10ec:8136] error status/mask=00000001/00006000
[ 3571.606655] r8101 0000:01:00.0: [ 0] RxErr (First)
[ 3572.545174] ACPI: EC: interrupt blocked
[ 3572.548071] ACPI: PM: Preparing to enter system sleep state S4
[ 3572.552634] ACPI: EC: event blocked
[ 3572.552635] ACPI: EC: EC stopped
[ 3572.552636] ACPI: PM: Saving platform NVS memory
[ 3572.552991] Disabling non-boot CPUs ...
[ 3572.554394] smpboot: CPU 1 is now offline
[ 3572.556248] smpboot: CPU 2 is now offline
[ 3572.558136] smpboot: CPU 3 is now offline
[ 3572.560075] smpboot: CPU 4 is now offline
[ 3572.561728] smpboot: CPU 5 is now offline
[ 3572.563530] smpboot: CPU 6 is now offline
[ 3572.565284] smpboot: CPU 7 is now offline
[ 3572.567639] PM: hibernation: debug: Waiting for 5 seconds.
[ 3577.568830] Enabling non-boot CPUs ...
[ 3577.568966] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 3577.569652] CPU1 is up
[ 3577.569774] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 3577.570496] CPU2 is up
[ 3577.570631] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 3577.571326] CPU3 is up
[ 3577.571426] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 3577.572163] CPU4 is up
[ 3577.572266] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 3577.572926] CPU5 is up
[ 3577.573078] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 3577.573746] CPU6 is up
[ 3577.573843] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 3577.574526] CPU7 is up
[ 3577.577561] ACPI: EC: EC started
[ 3577.577717] ACPI: PM: Waking up from system sleep state S4
[ 3577.623370] ACPI: EC: interrupt unblocked
/proc/acpi/wakeup lists these devices:
Device S-state Status Sysfs node
RP01 S4 *disabled
PXSX S4 *disabled
RP02 S4 *disabled
PXSX S4 *disabled
RP03 S4 *disabled
PXSX S4 *disabled
RP04 S4 *disabled
PXSX S4 *disabled
RP05 S4 *disabled
PXSX S4 *disabled
RP06 S4 *disabled
PXSX S4 *disabled
RP07 S4 *disabled
PXSX S4 *disabled
RP08 S4 *disabled
PXSX S4 *disabled
RP09 S4 *enabled pci:0000:00:1d.0
PXSX S4 *enabled pci:0000:01:00.0
RP11 S4 *disabled
PXSX S4 *disabled
RP12 S4 *disabled
PXSX S4 *disabled
RP13 S4 *enabled pci:0000:00:1d.4
PXSX S4 *disabled pci:0000:02:00.0
RP14 S4 *disabled
PXSX S4 *disabled
RP15 S4 *disabled
PXSX S4 *disabled
RP16 S4 *disabled
PXSX S4 *disabled
RP17 S4 *disabled
PXSX S4 *disabled
RP18 S4 *disabled
PXSX S4 *disabled
RP19 S4 *disabled
PXSX S4 *disabled
RP20 S4 *disabled
PXSX S4 *disabled
RP21 S4 *disabled
PXSX S4 *disabled
RP22 S4 *disabled
PXSX S4 *disabled
RP23 S4 *disabled
PXSX S4 *disabled
RP24 S4 *disabled
PXSX S4 *disabled
GLAN S4 *disabled
XHC S0 *enabled pci:0000:00:14.0
XDCI S4 *disabled
HDAS S4 *disabled pci:0000:00:1f.3
AWAC S4 *disabled
LID0 S3 *enabled platform:PNP0C0D:00
PBTN S3 *enabled platform:PNP0C0C:00
Here is my /etc/systemd/sleep.conf:
[Sleep]
AllowSuspend=yes
AllowHibernation=yes
AllowSuspendThenHibernate=yes
AllowHybridSleep=yes
SuspendState=mem standby freeze
HibernateMode=platform shutdown
MemorySleepMode=
HibernateDelaySec=
SuspendEstimationSec=60min
So far I've tried:
Switching to LTS kernel, as suggested in this section of ArchWiki. Returns the same error when resuming from disk.
Enabling fastboot in BIOS settings, as suggested in this discussion in Fedora forum. Returns the same error when resuming from disk.
Setting HibernateMode=shutdown in /etc/systemd/sleep.conf. Returns the same error when resuming from disk, and results in more ext4 errors.
Passing "nvme_core.default_ps_max_latency_us=0" to kernel parameters, as mentioned in this section of ArchWiki. Returns the same error when resuming from disk.
Switching to a systemd-based initramfs and enrolling the TPM keys with systemd-cryptenroll, as mentioned in this ArchWiki section on encrypting the disk. One thing to mention is that I did not rely on systemd's gpt automounting and instead used the same kernel parameters adapted to systemd-cryptsetup-generator. Nevertheless, it resulted in the same errors.
Blacklisting intel_hid kernel module in /etc/modprobe.d/blacklist.conf, as mentioned in this forum post regarding the same issue. Returns the same error when resuming from disk.
One thing that did fix hibernation issues was instaling Linux 5.15 LTS kernel from the official site, compiling it with a traditional .config + modprobed-db from this kernel and then installing it as explained in this section of ArchWiki on compiling your own kernel. However, I then ran into issues with iwlwifi being compressed and the kernel being unable to decompress it, which means such solution is probably unsustainable in the long run. In any case, in 5.15 hibernation seems to work properly, while in 6.10, it does not.
Any help would be appreciated!
Last edited by retractant0916 (2024-09-27 19:42:30)
Offline
Do you also get these issues without the out-of-tree kernel modules?
Offline
Do you also get these issues without the out-of-tree kernel modules?
The issue persists with the out-of-tree module blacklisted, as well as after running
echo RP09 | sudo tee /proc/acpi/wakeup
to put the Ethernet module to sleep.
Last edited by retractant0916 (2024-09-27 13:03:37)
Offline
So since the issue is reproducible we could do a bisection to find the culprit for the regression behaviour, but the range is kinda broad ... Could you maybe try to install a few older kernels from https://archive.archlinux.org/packages/l/linux/ to narrow it down a bit more?
Offline
Gave 5.19.13 a try, hibernation works.
Tried it on 6.0.1, and received the "non-boot CPUs are disabled" error.
Chances are breaking changes have began since 6.x
Offline
This could be a kernel regression, which should be bisected and reported to the upstream kernel developers (in this case the stable team)
Are you confident to do the bisection on your own or do you need some help?
If you want we could also provide you with pre-built kernel images for you to test
Good info to get you started is:
- https://docs.kernel.org/admin-guide/rep … sions.html
- https://wiki.archlinux.org/title/Kernel … egressions
Since these are kinda old versions we might also have to check if the issue still is there with the latest mainline kernel.
Offline
Tested hibernation on 6.11 (mainline), resuming from disk seems to be working.
However, I am still not certain whether it will hold up. Should I perform the bisection anyway?
Offline
Hmmm, not sure if it's worth the effort ... If it's some kind of a trivial change it could be backported to the stable kernels, but yeah if the next version fixes it you could also just use that ..
Offline
Very well, marking this thread as solved then.
Offline