[solved] transient freeze after resume from sleep

dr1fter · 2024-12-28 09:12:52

some time after my last posting, where my issue could be resolved (thanks!), starting a couple of weeks ago (I think correlated to upgrade to linux-6.12), I now again see a similar issue (freeze after resume from sleep). I checked kmod-cfg - the described option for nvidia-module (options nvidia NVreg_PreserveVideoMemoryAllocations=1) is still present.

What I observe: After resuming from sleep (systemctl suspend), my monitors are quickly powered on, showing a black screen + mouse-cursor. However, the mouse-cursor will only move after e.g. pressing a mouse-button. It takes about ~10..20s, however, until my actual desktop is displayed (instead of just black). This happens consistently, also after reboots, and regardless of sleep-length (my machine typically sleeps overnight, but I also checked behaviour for shorter intervals).

After wake-up, every graphical application (typically some xfce-terminals, google-chrome, and slack) works as expected. However, sometimes (maybe three times out of four), chromium-based applications (chrome + slack (electron)) will freeze for another ~10..20s upon first interaction, which will sometimes prevent setting focus to other applications (terminal), until end of freeze. After that, everything works exactly as expected.

Inspecting "the usual" logfiles, I found some (repeating) occurrences:

# /var/log/error.log
123203 2024/12/28 08:53:02 [info] 1722#1722: epoll_wait() failed (4: Interrupted system call)

# /var/log/Xorg.0.log
1268 [142113.308] (WW) NVIDIA: Wait for channel idle timed out.
1269 [142118.353] (WW) NVIDIA: Wait for channel idle timed out.

# journalctl -r
Dec 28 08:53:02 arch kernel: spd5118 5-0050: PM: failed to resume async: error -6
Dec 28 08:53:02 arch kernel: spd5118 5-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6

Edit: I suppose I should provide some details about my (relevant) installed packages / desktop environment:

- cinnamon 6.4.3-1
- lightdm 1:1.32.0-6
- nvidia 565.77-9

Last edited by dr1fter (2025-06-11 05:30:12)

seth · 2025-01-04 21:31:34

spd5118 is a RAM temperature sensor, does the freeze happen when adding "module_blacklist=spd5118" to the https://wiki.archlinux.org/title/Kernel_parameters ?

However

the mouse-cursor will only move after e.g. pressing a mouse-button. It takes about ~10..20s, however, until my actual desktop is displayed (instead of just black)… chromium-based applications (chrome + slack (electron)) will freeze for another ~10..20s upon first interaction

sound more like you're losing the VRAM.

x-ref, https://bbs.archlinux.org/viewtopic.php … 7#p2215527

dr1fter · 2025-01-06 09:53:52

I read through https://bbs.archlinux.org/viewtopic.php?id=290126&p=7, and found a mentioning of tmpfs + required storage-size. On my machine, I did not configure a partition for tmpfs:

]$ findmnt -T /tmp
TARGET SOURCE FSTYPE OPTIONS
/tmp   tmpfs  tmpfs  rw,nosuid,nodev,nr_inodes=1048576,inode64

df -h
Filesystem      Size  Used Avail Use% Mounted on
dev              32G     0   32G   0% /dev
run              32G  5.6M   32G   1% /run
efivarfs        192K  129K   59K  69% /sys/firmware/efi/efivars
/dev/nvme2n1p2  507G  151G  351G  31% /
tmpfs            32G  1.8G   30G   6% /dev/shm
tmpfs            32G  173M   32G   1% /tmp
tmpfs           1.0M     0  1.0M   0% /run/credentials/systemd-journald.service
/dev/nvme2n1p3  125G   74G   51G  60% /mnt/shared_profile
/dev/nvme2n1p1  511M  304M  208M  60% /boot
/dev/sda1       1.5T  638G  884G  42% /mnt/data
/dev/sdb2       916G  698G  177G  80% /mnt/games
tmpfs           6.3G  168K  6.3G   1% /run/user/1000

My graphics adapter has 12 GiB of memory (if my usual applications are running, about 1 GiB seems to be used). Might it be an issue that my tmpfs is backed by main memory (at least that's what I understand is the case)?

will try blacklisting spd5118

Edit: I just saw that I already tried disabling it (I placed a link to https://bbs.archlinux.org/viewtopic.php … 5#p2200765) into `/etc/modprobe.d/blacklist.conf`. So I suppose it will not help to blacklist spdf5188 (again).

Last edited by dr1fter (2025-01-06 09:59:08)

seth · 2025-01-09 13:15:04

If it's blacklisted, why is it loaded?
Add "module_blacklist=spd5118" to the https://wiki.archlinux.org/title/Kernel_parameters to kill it.

The other link shows an ongoing problem w/ the nvidia drivers, VRAM preservation recently doesn't have seemed to work at all.
Using RAM is otherwise fine, it's refreshed during S2/S3

dr1fter · 2025-01-09 16:01:43

If it's blacklisted, why is it loaded?

I think that's a mis-understanding. I blacklisted it already some weeks back when you suggested me to blacklist it in the linked thread https://bbs.archlinux.org/viewtopic.php … 5#p2200765. After blacklisting (I did validate it was not loaded back then) did not help, I removed blacklisting again.

As for the other thread: so if I understand it correctly, the best I can do is wait for an upstream patch for either linux and/or nvidia-kmod to arrive (considering I do not want to downgrade / pin packages)?

seth · 2025-01-09 16:48:10

so if I understand it correctly, the best I can do is wait for an upstream patch

You can test https://aur.archlinux.org/packages/nvidia-535xx-dkms and the LTS kernel (doesn't seem to build for 6.12.x kernels)

dr1fter · 2025-01-09 19:58:17

yes, I might test that. but: let's assume the issue will not occur w/ lts-linux, then I have the choice of either staying w/ lts-linux (which I actually do not want), or wait for upstream-fix (actually, the ~10..20s-wait after resume is annoying, but bearable for me..). wdyt?

seth · 2025-01-09 20:20:50

Those are your options - at least constrasting lts+565 and lts+535 can tell you whether this is mostly an nvidia problem or you're maybe waiting for the wrong show to drop.

dr1fter · 2025-01-09 21:20:38

understood, thanks :-)
-> will post updates after trying out so maybe help others

timeon · 2025-03-25 05:16:14

Hi, I have the same problem and did not get any closer to solving. Switching to LTS only made things even worse (permanent freeze after resuming from suspend). I would like to know if and how you solved it. :-)

Way back in September last year, after a pacman -Syu there were also some issues with nvidia GPU after resume from suspend and I found the following modifications that fixed them until these new issues appeared end of 2024:

sudo systemctl enable nvidia-suspend.service
sudo systemctl enable nvidia-hibernate.service
sudo systemctl enable nvidia-resume.service

added the following ling to /lib/modeprobe.d/systemd.conf:

options nvidia NVreg_PreserveVideoMemoryAllocations=1

Last edited by timeon (2025-03-25 05:27:28)

dr1fter · 2025-03-25 06:54:39

Unfortunately, while I did try some of the suggested tweaks (except for kernel-downgrade) I did not find an actual solution (hence I did not post an update, as I promised in my last posting to this thread).

I did notice the delay seems to have changed over time (I assume this might be related to kernel and nvidia-kmod-upgrades). The last couple of weeks, the time it takes for my machine to turn from black screen to displaying my actual application GUIs ranges at about 10..15s (did not measure precisely). Subsequent initial freeze (google-chrome mostly) is also down to ~5..10s. Both is still somewhat annoying, but not bad enough to motivate me into spending additional debugging efforts (I have basically no hands-on-experience w/ debugging this kind of issue :-().

That being said, I might consider trying to report this issue w/ NVIDIA-Support

timeon · 2025-03-26 19:09:16

Thanks, dr1fter, for your reply

FYI:
installed nvidia-535xx-dkms 535.230.02-1 from AUR, kernel is linux-6.13.6-arch1-1
permanent freeze of screen, no response to key press or mouse input when going to suspend

Last edited by timeon (2025-03-26 19:09:51)

dr1fter · 2025-03-26 19:26:06

@timeon: does this "heal" after a couple of tens of seconds, or do you have to reboot (on my machine, with an exception that occurred yesterday, which was likely related to a nvida-kmod-update that was installed in the meantime, I consistently observe that after some waiting time, everything works again)?

I am (by now) on nvidia-570.133.07-2 (from regular repository) and linux-6.13.8.arch1-1. None of the past updates had significant impact on freezing-issues, though.

timeon · 2025-03-28 17:28:53

With the nvidia-535xx the freeze happened while suspending and it was permanent for the GUI (screen, keyboard, mouse) though the system was still running, e. g. writing to the journal.

Today I checked my pacman.log again, the freezes started after upgrading from nvidia-560 to nvidia-565, with this upgrade there was also a package called egl-x11 added to the system.
We are both using cinnamon where xorg is the default and after switching to experimental wayland, there was no freeze after resuming from suspend though there was no password query at all!
Maybe it is not the nvidia driver but a combination of packages which do not work well together.

Another finding: If I do not unlock the computer after the freeze in time, the user account gets locked as if a wrong password was entered. I do not know how this is involved, may be it is a follow-up error of the system freeze.

dr1fter · 2025-03-28 18:33:36

interesting find. I might try switching to wayland (although, truth be told, I am very reluctant in that regard). Not sure if this is useful information, but I configured automatic logon on lightdm (so no explicit locking/unlocking on my machine - it is a desktop machine located in my house, so I have no need for locking (also, I am lazy))

seth · 2025-03-29 07:11:11

with this upgrade there was also a package called egl-x11 added to the system

This is merely a split off from nvidia-utils

though there was no password query at all!

Disable the cinnabun screenlocker, try again.

timeon · 2025-05-25 06:11:14

seth wrote:

though there was no password query at all!
Disable the cinnabun screenlocker, try again.

How can disabling a screenlocker solve the screen not to lock?

Anyway, wayland and cinnamon together seem so experimental to me.
E. g. fundamental things like choosing a keyboard layout seem impossible: https://github.com/linuxmint/wayland/issues/14
At least, this issue shall be fixed soon: https://github.com/linuxmint/cinnamon/pull/12758
Until then, wayland and cinnamon are just unusable for a lot of people. Personally I went back to cinnamon with xorg.

Today I downgraded to nvidia-560 which solved the issue for me. To do so, I executed the following commands:

pacman -ddR egl-x11
pacman -U /var/cache/pacman/pkg/nvidia-560.35.03-14-x86_64.pkg.tar.zst /var/cache/pacman/pkg/nvidia-utils-560.35.03-16-x86_64.pkg.tar.zst
pacman -U /var/cache/pacman/pkg/linux-6.11.3.arch1-1-x86_64.pkg.tar.zst

So why did I downgrade the linux kernel as well (which already was linux-6.14)?
Because the nvidia-560 modules were built for that linux kernel version, I have no explanation for that.
To find this out, I even had to run the pacman command, that downgraded nvidia, with the --debug option.

Last edited by timeon (2025-05-25 06:13:42)

seth · 2025-05-25 07:58:56

How can disabling a screenlocker solve the screen not to lock?

The idea was that since wayland doesn't activate any screenlocker and doesn't cause any freezes, to disable the screenlocker on X11 and see whether that likewise prevents such freeze.

https://wiki.archlinux.org/title/Dynami … le_Support - though you might have a hard time building the 560xx drivers w/ gcc15 and the 6.14 kernel.
https://aur.archlinux.org/packages/nvidia-535xx-dkms

timeon · 2025-05-27 01:11:08

seth wrote:

How can disabling a screenlocker solve the screen not to lock?
The idea was that since wayland doesn't activate any screenlocker and doesn't cause any freezes, to disable the screenlocker on X11 and see whether that likewise prevents such freeze.
The transient freeze could still be experienced without screen locker.
https://wiki.archlinux.org/title/Dynami … le_Support - though you might have a hard time building the 560xx drivers w/ gcc15 and the 6.14 kernel.
https://aur.archlinux.org/packages/nvidia-535xx-dkms

I used pacman for the kernel module building.
So the 560xx driver are not made for 6.14 kernel, that is ok with me.
I already installed the nvidia-535xx-dkms without success in March 2025 and mentioned it in this thread.

I thought the system is running smooth since the downgrade to nvidia-560 and linux-6.11 but today I noticed multiple stack trace entries in the journal every time after resuming from suspend. Though I do not experience any problems.
I went back in the journal to confirm that this happened as well when the nvidia-560 were still the latest in January 2025.
The stack trace always starts with this line:

WARNING: CPU: 0 PID: 9671 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200

seth · 2025-05-27 06:58:22

pacman doesn't build any kernel modules, you've essentially downgraded the kernel and nvidia driver to january.
I suppose the dkms build failed back then and you were left w/ nvidia-535xx-utils and no usable kernel module.
If you want to retry, make sure to check "dkms status" (the driver needs to be "installed", not "added") and post the dkms build log.

If you want a comment on the stack trace, you'll have to post everything. All I can tell is that it's a semaphore warning, the cited function became relevant ~1yr ago.

Did you test to disable the screenlocker on X11?

dr1fter · 2025-06-11 05:33:48

I would like to share that the issue I reported seems to have been solved by (at least I assume so) either the recent linux-upgrade (-> 6.15.1-arch1-2) or nvidia-kmod-upgrade (-> 575.57.08-4). I did no cfg-changes. Since I upgraded said packages yesterday, the issue was gone (as I just confirmed by doing a couple of sleep -> resume-cycles today).

Arch Linux

#1 2024-12-28 09:12:52

[solved] transient freeze after resume from sleep

#2 2025-01-04 21:31:34

Re: [solved] transient freeze after resume from sleep

#3 2025-01-06 09:53:52

Re: [solved] transient freeze after resume from sleep

#4 2025-01-09 13:15:04

Re: [solved] transient freeze after resume from sleep

#5 2025-01-09 16:01:43

Re: [solved] transient freeze after resume from sleep

#6 2025-01-09 16:48:10

Re: [solved] transient freeze after resume from sleep

#7 2025-01-09 19:58:17

Re: [solved] transient freeze after resume from sleep

#8 2025-01-09 20:20:50

Re: [solved] transient freeze after resume from sleep

#9 2025-01-09 21:20:38

Re: [solved] transient freeze after resume from sleep

#10 2025-03-25 05:16:14

Re: [solved] transient freeze after resume from sleep

#11 2025-03-25 06:54:39

Re: [solved] transient freeze after resume from sleep

#12 2025-03-26 19:09:16

Re: [solved] transient freeze after resume from sleep

#13 2025-03-26 19:26:06

Re: [solved] transient freeze after resume from sleep

#14 2025-03-28 17:28:53

Re: [solved] transient freeze after resume from sleep

#15 2025-03-28 18:33:36

Re: [solved] transient freeze after resume from sleep

#16 2025-03-29 07:11:11

Re: [solved] transient freeze after resume from sleep

#17 2025-05-25 06:11:14

Re: [solved] transient freeze after resume from sleep

#18 2025-05-25 07:58:56

Re: [solved] transient freeze after resume from sleep

#19 2025-05-27 01:11:08

Re: [solved] transient freeze after resume from sleep

#20 2025-05-27 06:58:22

Re: [solved] transient freeze after resume from sleep

#21 2025-06-11 05:33:48

Re: [solved] transient freeze after resume from sleep

Board footer