You are not logged in.

#1 2024-10-06 17:38:37

dr1fter
Member
From: Germany
Registered: 2022-01-21
Posts: 37

[solved] - Resume from suspend fails since upgrade to linux-6.11

Since upgrading to linux-6.11 (same behaviour for both 6.11.1-arch1-1, 6.11.2-arch1-1) when returning from suspension (`systemctl suspend -i`), my machine seems to "wake up" (e.g. keyboard and mouse are turned on / keyboard reacts to numlock, etc.), however my displays do not receive a signal. Entering a rescue-shell (CTRL-ALT-F{1..6}) works (and screens are turned on upon doing so). Switching to graphical session using CTRL-ALT-F7 also works, however, cinnamon (my DE), will greet me with a popup telling me about an error and suggesting a restart. After restarting cinnamon, everything seems to work fine. This behaviour is persistent between reboots (rebooting will work fine w/o mentioned issues).

the following outputs seem to be somewhat related (however, I could not quite make a lot of sense out of them:

# dmesg
[24860.036997] cinnamon[1263]: segfault at 4 ip 000072c22a4a9097 sp 00007ffc1bd4ad60 error 4 in libnvidia-glcore.so.560.35.03[6a9097,72c22a200000+c00000] likely on CPU 11 (core 20, socket 0)
[24860.037005] Code: 0f 1f 00 48 89 ef 5d e9 a7 cd 18 00 0f 1f 80 00 00 00 00 48 8b 05 69 6f 96 01 55 64 48 8b 28 83 ff 0f 0f 87 0f b3 bb ff 89 f8 <8b> 4e 04 8b 36 48 8d 90 64 4c 00 00 48 c1 e0 04 48 c1 e2 04 89 4c
[24885.468565] Bluetooth: hci2: command 0x0c24 tx timeout
[24885.468590] Bluetooth: hci2: Opcode 0x0c24 failed: -110
[24887.601846] Bluetooth: hci2: Opcode 0x0c24 failed: -110
[24887.601887] Bluetooth: hci2: command 0x0c24 tx timeout
[24919.362512] docker0: port 1(veth08ed4f0) entered blocking state
[24919.362516] docker0: port 1(veth08ed4f0) entered disabled state

# the bluetooth-errors were also presented to me in rescue-shell

# /var/log/Xorg.0.log
[ 24845.103] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
558 [ 24845.104] (EE) NVIDIA(0):     recover...
559 [ 24845.130] (II) NVIDIA(0): Error recovery was successful.
560 [ 24850.184] (WW) NVIDIA: Wait for channel idle timed out.

# /var/log/error.log
121606 2024/10/06 19:14:19 [info] 1555#1555: epoll_wait() failed (4: Interrupted system call)

# $ pacman -Q linux cinnamon lightdm
linux 6.11.2.arch1-1
cinnamon 6.2.9-1
lightdm 1:1.32.0-6

I did not do any configuration (nor hardware) changes in the past couple of weeks, and the most prominent package-upgrade seems to have been mentioned upgrade of linux (6.10 -> 6.11). Any help / additional debugging hints are very appreciated

Last edited by dr1fter (2024-10-08 06:31:34)

Offline

#2 2024-10-06 20:01:45

seth
Member
Registered: 2012-09-03
Posts: 58,508

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

the following outputs seem to be somewhat related

From your description, the system wake up fine but cinnabun crashes.

Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

and your entire Xorg log

BT looks like https://bbs.archlinux.org/viewtopic.php?id=299972 what might be related to a freeze/stall and if muffin has inherited mutter's insane RT policy, it commited suicide in response to that…

Online

#3 2024-10-07 05:50:25

dr1fter
Member
From: Germany
Registered: 2022-01-21
Posts: 37

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

system-journal: http://0x0.st/XEJS.txt
/var/log/Xorg.0.log:  http://0x0.st/XEJ1.0.log

thanks for hinting to bt-related thread (will check whether I can find some help there)

Offline

#4 2024-10-07 07:57:42

seth
Member
Registered: 2012-09-03
Posts: 58,508

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

Oct 06 19:14:19 arch kernel: spd5118 5-0051: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0051: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0051: PM: failed to resume async: error -6
Oct 06 19:14:19 arch kernel: spd5118 5-0050: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0050: PM: failed to resume async: error -6
Oct 06 19:14:19 arch kernel: spd5118 5-0052: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0052: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0052: PM: failed to resume async: error -6
Oct 06 19:14:19 arch kernel: spd5118 5-0053: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0053: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0053: PM: failed to resume async: error -6
…
Oct 06 19:14:19 arch kernel: NVRM: GPU at PCI:0000:01:00: GPU-1065b8ad-1b06-2566-f21f-9a65186c488d
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 11 Error
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0xa0040800
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0004, Class 0000c797, Offset 00000000, Data 00000000
…
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0x82040000
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0006, Class 0000c797, Offset 00000000, Data 00000000
…
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 11 Error
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0xa2040800
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0009, Class 0000c797, Offset 00000000, Data 00000000
…
Oct 06 19:14:34 arch kernel: cinnamon[1263]: segfault at 4 ip 000072c22a4a9097 sp 00007ffc1bd4ad60 error 4 in libnvidia-glcore.so.560.35.03[6a9097,72c22a200000+c00000] likely on CPU 11 (core 20, socket 0)
Oct 06 19:14:34 arch kernel: Code: 0f 1f 00 48 89 ef 5d e9 a7 cd 18 00 0f 1f 80 00 00 00 00 48 8b 05 69 6f 96 01 55 64 48 8b 28 83 ff 0f 0f 87 0f b3 bb ff 89 f8 <8b> 4e 04 8b 36 48 8d 90 64 4c 00 00 48 c1 e0 04 48 c1 e2 04 89 4c
Oct 06 19:14:34 arch systemd-coredump[104311]: Process 1263 (cinnamon) of user 1000 terminated abnormally with signal 11/SEGV, processing...
Oct 06 19:14:34 arch systemd[1]: Created slice Slice /system/systemd-coredump.
Oct 06 19:14:34 arch systemd[1]: Started Process Core Dump (PID 104311/UID 0).
Oct 06 19:14:35 arch systemd-coredump[104312]: Process 1263 (cinnamon) of user 1000 dumped core.
                                               
                                               Stack trace of thread 1263:
                                               #0  0x000072c22a4a9097 n/a (libnvidia-glcore.so.560.35.03 + 0x6a9097)
                                               #1  0x000072c22a71c438 n/a (libnvidia-glcore.so.560.35.03 + 0x91c438)
                                               #2  0x000072c22a60d579 n/a (libnvidia-glcore.so.560.35.03 + 0x80d579)
                                               #3  0x000072c22a5fca58 n/a (libnvidia-glcore.so.560.35.03 + 0x7fca58)
                                               #4  0x000072c22a6193e6 n/a (libnvidia-glcore.so.560.35.03 + 0x8193e6)

https://wiki.archlinux.org/title/NVIDIA … er_suspend
But https://docs.kernel.org/hwmon/spd5118.html is related to DDR5 RAM, so there might be an additional problem w/ memory integrity.

Does this only happen if you sleep for longer (eg. 4h in the presented case) or also after a 30 second nap?

Online

#5 2024-10-07 08:41:06

dr1fter
Member
From: Germany
Registered: 2022-01-21
Posts: 37

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

Does this only happen if you sleep for longer (eg. 4h in the presented case) or also after a 30 second nap?

I just checked: the same behaviour also shows if resuming after ~30s (although this time, all running applications from my session did not survive restart of cinnamon + I had to manually restart it (CTRL-F2 + `r`), as it did not offer me to do a restart via mentioned popup dialogue.

Offline

#6 2024-10-07 15:43:54

seth
Member
Registered: 2012-09-03
Posts: 58,508

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

Same spd5118 errors? Do you get them w/ the LTS kernel?
What if you flat-out blacklist the module?

Online

#7 2024-10-07 17:20:45

dr1fter
Member
From: Germany
Registered: 2022-01-21
Posts: 37

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

Same spd5118 errors? Do you get them w/ the LTS kernel?

will have to test. I might downgrade back to 6.10 maybe instead of going to lts for testing (I did not have any issues until. before I upgraded to 6.11)

What if you flat-out blacklist the module?

just to double-check: you suggest I should blacklist kmod named `spd5118`?

Offline

#8 2024-10-07 19:26:42

seth
Member
Registered: 2012-09-03
Posts: 58,508

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

just to double-check: you suggest I should blacklist kmod named `spd5118`?

Yes - from what I can tell it's just a temperature sensor.

Online

#9 2024-10-07 20:56:19

dr1fter
Member
From: Germany
Registered: 2022-01-21
Posts: 37

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

blacklisting spd5118 did not seem to make any difference. I tried both `rmmod` (w/o reboot) and then blacklisting + reboot (I verified in both cases using lsmod that spd5118 was not loaded).

Offline

#10 2024-10-07 21:06:48

seth
Member
Registered: 2012-09-03
Posts: 58,508

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

The error you're receiving is hallmark of VRAM decay, https://bbs.archlinux.org/viewtopic.php?id=294612
While that doens't fit the 30s thing or explain the 6.11 condition, make sure to enable https://wiki.archlinux.org/title/NVIDIA … er_suspend

Online

#11 2024-10-08 06:30:41

dr1fter
Member
From: Germany
Registered: 2022-01-21
Posts: 37

Re: [solved] - Resume from suspend fails since upgrade to linux-6.11

Affirmative! Configuring nvidia-module as described here did in fact resolve the issue. It is however not necessary to blacklist spd5118 kmod.

Thank you so much for your quick help (again) :-)

Offline

Board footer

Powered by FluxBB