You are not logged in.
I installed Arch a couple of days ago and wanted to set up hibernation for my laptop, but ran into the following issue: After starting sway (which automatically starts waybar in my case), then hibernating (systemctl hibernate), and then powering my laptop on again, at first everything seems to work, but after a couple of seconds, my waybar freezes (in htop it is shown to be stuck in D state). I am also unable to switch TTY unlike normally. After running sway exit, sway freezes completely and I can only continue by force-shutting down my laptop by holding down the power button.
If I don't try to exit sway but just continue using my system, then sometimes with time more processes will get permanently stuck in D state, sometimes sway also freezes on its own. If I try to shutdown, it will always get stuck at something (usually "Stopping IPTS"), even if I didn't see any frozen processes in htop right before.
System Information:
Hardware: Surface Book 2, 8GB RAM, no GPU
Kernel:
uname -a
Linux isar 6.10.5-arch1-1-surface #1 SMP PREEMPT_DYNAMIC Sat, 17 Aug 2024 01:43:51 +0000 x86_64 GNU/Linux
Kernel parameters:
BOOT_IMAGE=/vmlinuz-linux-surface root=UUID=8d116a8f-6d5d-4039-964b-72ef459c5a63 rw loglevel=3 quiet resume=UUID=8d116a8f-6d5d-4039-964b-72ef459c5a63 resume_offset=53168128 hibernate=nocompress
Hibernation setup: I have a swap partition of size 4GB and swapfile of size 12GB and followed the archwiki hibernation guide to specify that the hibernation image should be saved in the swap file. I also increased /sys/power/image_size:
cat /sys/power/image_size
12348030976
Other relevant Information:
If I hibernate after only logging into a virtual console, everything works perfectly, even starting sway and waybar afterwards. Suspend (s2idle) also works perfectly, even from within sway.
What I have tried so far:
Using stock kernel isntead of linux-surface: Changed nothing
Increasing swap size (originally I set the hibernation image to be saved on my 4GB swap partition, now on the 12GB swap file /swapfile): Changed nothing
Kernel parameter hibernate=nocompress: Changed nothing
Killing waybar before hibernating: Everything is the same (except for waybar freezing of course)
Different wm: I tried running niri instead of sway: I haven't noticed niri freeze yet, although after exiting I usually have the same issues, i.e. can't switch TTY, can't reboot. However, in rare cases hibernating from niri doesn't completely break the system, just something like wifi.
Logs: After this issue appears, I will usually see an error message starting like this (full messages contained in the journal below):
Sep 03 22:37:22 isar kernel: kernel BUG at mm/slub.c:553!
Sep 03 22:37:22 isar kernel: Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
Sep 03 22:37:22 isar kernel: CPU: 0 PID: 314 Comm: kworker/0:3 Tainted: G WC 6.10.5-arch1-1-surface #1 8a497d4a5d3f440b4d05c408e07176d4a6653d79
so of course I tried searching for it on the internet. Unfortunately, all that I could find was about CVE-2024-41087, which from my understanding has already been fixed (and my kernel is up to date). I also tried looking at the source code but couldn't understand a lot.
Logs
Journal from the boot I described in the beginning: https://pastebin.com/iRxUiHrY
Stack of waybar when it is frozen:
cat /proc/664/stack
[<0>] ssam_request_do_sync_with_buffer+0x11c/0x150 [surface_aggregator]
[<0>] san_rqst.isra.0+0x17d/0x200 [surface_acpi_notify]
[<0>] san_opreg_handler+0x68/0x140 [surface_acpi_notify]
[<0>] acpi_ev_address_space_dispatch+0x302/0x4c0
[<0>] acpi_ex_access_region+0x28a/0x510
[<0>] acpi_ex_write_serial_bus+0x101/0x2c0
[<0>] acpi_ex_write_data_to_field+0x31b/0x3b0
[<0>] acpi_ex_store_object_to_node+0x1b3/0x3a0
[<0>] acpi_ex_store+0x223/0x4b0
[<0>] acpi_ex_opcode_1A_1T_1R+0x10a/0x680
[<0>] acpi_ds_exec_end_op+0x1f6/0x870
[<0>] acpi_ps_parse_loop+0x137/0xa30
[<0>] acpi_ps_parse_aml+0xbd/0x5e0
[<0>] acpi_ps_execute_method+0x171/0x3e0
[<0>] acpi_ns_evaluate+0x191/0x5c0
[<0>] acpi_evaluate_object+0x1cf/0x450
[<0>] acpi_evaluate_integer+0x6f/0x130
[<0>] acpi_thermal_get_temperature+0x43/0xa0
[<0>] thermal_get_temp+0x23/0x60
[<0>] __thermal_zone_get_temp+0x1a/0x60
[<0>] thermal_zone_get_temp+0x4c/0x90
[<0>] temp_show+0x35/0x70
[<0>] dev_attr_show+0x19/0x40
[<0>] sysfs_kf_seq_show+0xa8/0xf0
[<0>] seq_read_iter+0x11f/0x460
[<0>] vfs_read+0x296/0x370
[<0>] ksys_read+0x6d/0xf0
[<0>] do_syscall_64+0x82/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Since it mentions something about temperature: My waybar shows the temperature and updates it every 5 seconds, so that is probabaly where it gets stuck.
Let me know if there are any other relevant logs or other information I can provide.
I tried really hard but I could not find anyone having a similar issue to mine, so I would be very greatful if somebody could maybe point me in the right direction to understanding what is going on with my system.
Last edited by hejas (2024-09-04 20:23:05)
Offline
Mod note: moving to AUR Issues.
Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD
Making lemonade from lemons since 2015.
Offline
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
I added the kernel parameter i915.enable_guc=0 (and checked that it is really applied), but am still facing the same issues. I also tried using the stock kernel, but with no success.
Offline
22:36:18 - boot starts
22:36:21 - RIP: 0010:mwifiex_cmd_802_11_scan_ext+0x83/0x90 [mwifiex]
22:36:29 - Reached target Sleep.
22:37:19 - Operation 'hibernate' finished.
22:37:22 - RIP: 0010:__slab_free+0x152/0x2f0
22:38:34 - Process 871 (alacritty) of user 1000 dumped core.
22:38:56 - Power key pressed long.
I don't think the oopses are related and the kernel does not freeze/panic (the long power key event is processed by systemd)
Does suspend to ram (S3/S2idle) work w/o issues?
Because of the delay, https://wiki.archlinux.org/title/Solid_ … leshooting
Online
Yes, s2idle works without issues. S3 sleep is not available on my system. And just to highlight this again: Hibernate seems to work as long as I am not running a window manager when hibernating (i.e. only a virtual console).
Offline
What you're seeing is fundamentally a kernel memory corruption, and there's very suspiciously
Sep 03 22:36:21 isar kernel: memcpy: detected field-spanning write (size 116) of single field "ext_scan->tlv_buffer" at drivers/net/wireless/marvell/mwifiex/scan.c:2239 (size 1)
but that also happens before you even login.
Still, try to blacklist mwifiex (your wifi…)
The thing directly responsible is i915_request_retire … let's see whether the simplydumb device is good for something after all: try to blacklist i915, you should™ end up running on the simpledrm device and that should™ even allow you to start sway (or any other WM) and then hibernate the system from there…
https://wiki.archlinux.org/title/Kernel … and_line_2
Online
Thanks for helping me out: I tried both of your suggestions and blacklisting mwifiex and mwifiex_pcie solved all of my problems! Should I just disable them before every hibernate (I already tested this and so far it seems to work) or is there a better solution to solve the root cause?
Last edited by hejas (2024-09-04 19:49:30)
Offline
Most basic solution/test would be to rfkill the wifi around the hibernation, if that doesn't work rfkill and unload the module before hibernation.
The root cause is probably a bug in that module.
https://www.google.com/search?q=%22mwif … ab_free%22 has an impressive amount of hits, dating back to 2017
modinfo mwifiex
has some options, most interestingly perhaps disable_auto_ds and disconnect_on_suspend
Ideally
systool -vm mwifiex
shows their current vaules.
Online
Unfortunately
rfkill block wlan
doesn't seem to be enough. Unloading the modules worked though, e.g. using the following script (leaving it here for future reference):
/usr/lib/systemd/system-sleep/hibernate.sh
--------------------------------------------------------
#!/bin/sh
case $1/$2 in
pre/hibernate)
echo "Going into hibernation. Stopping mwifiex, mwifiex_pcie"
modprobe -r mwifiex_pcie
modprobe -r mwifiex
;;
post/hibernate)
echo "Waking up from hibernation. Starting mwifiex, mwifiex_pcie"
modprobe -i mwifiex
modprobe -i mwifiex_pcie
;;
esac
Marking as solved; Thanks again!
Offline