You are not logged in.
Since upgrading to linux-6.11 (same behaviour for both 6.11.1-arch1-1, 6.11.2-arch1-1) when returning from suspension (`systemctl suspend -i`), my machine seems to "wake up" (e.g. keyboard and mouse are turned on / keyboard reacts to numlock, etc.), however my displays do not receive a signal. Entering a rescue-shell (CTRL-ALT-F{1..6}) works (and screens are turned on upon doing so). Switching to graphical session using CTRL-ALT-F7 also works, however, cinnamon (my DE), will greet me with a popup telling me about an error and suggesting a restart. After restarting cinnamon, everything seems to work fine. This behaviour is persistent between reboots (rebooting will work fine w/o mentioned issues).
the following outputs seem to be somewhat related (however, I could not quite make a lot of sense out of them:
# dmesg
[24860.036997] cinnamon[1263]: segfault at 4 ip 000072c22a4a9097 sp 00007ffc1bd4ad60 error 4 in libnvidia-glcore.so.560.35.03[6a9097,72c22a200000+c00000] likely on CPU 11 (core 20, socket 0)
[24860.037005] Code: 0f 1f 00 48 89 ef 5d e9 a7 cd 18 00 0f 1f 80 00 00 00 00 48 8b 05 69 6f 96 01 55 64 48 8b 28 83 ff 0f 0f 87 0f b3 bb ff 89 f8 <8b> 4e 04 8b 36 48 8d 90 64 4c 00 00 48 c1 e0 04 48 c1 e2 04 89 4c
[24885.468565] Bluetooth: hci2: command 0x0c24 tx timeout
[24885.468590] Bluetooth: hci2: Opcode 0x0c24 failed: -110
[24887.601846] Bluetooth: hci2: Opcode 0x0c24 failed: -110
[24887.601887] Bluetooth: hci2: command 0x0c24 tx timeout
[24919.362512] docker0: port 1(veth08ed4f0) entered blocking state
[24919.362516] docker0: port 1(veth08ed4f0) entered disabled state
# the bluetooth-errors were also presented to me in rescue-shell
# /var/log/Xorg.0.log
[ 24845.103] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
558 [ 24845.104] (EE) NVIDIA(0): recover...
559 [ 24845.130] (II) NVIDIA(0): Error recovery was successful.
560 [ 24850.184] (WW) NVIDIA: Wait for channel idle timed out.
# /var/log/error.log
121606 2024/10/06 19:14:19 [info] 1555#1555: epoll_wait() failed (4: Interrupted system call)
# $ pacman -Q linux cinnamon lightdm
linux 6.11.2.arch1-1
cinnamon 6.2.9-1
lightdm 1:1.32.0-6
I did not do any configuration (nor hardware) changes in the past couple of weeks, and the most prominent package-upgrade seems to have been mentioned upgrade of linux (6.10 -> 6.11). Any help / additional debugging hints are very appreciated
Last edited by dr1fter (2024-10-08 06:31:34)
Offline
the following outputs seem to be somewhat related
From your description, the system wake up fine but cinnabun crashes.
Please post your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
and your entire Xorg log
BT looks like https://bbs.archlinux.org/viewtopic.php?id=299972 what might be related to a freeze/stall and if muffin has inherited mutter's insane RT policy, it commited suicide in response to that…
Online
system-journal: http://0x0.st/XEJS.txt
/var/log/Xorg.0.log: http://0x0.st/XEJ1.0.log
thanks for hinting to bt-related thread (will check whether I can find some help there)
Offline
Oct 06 19:14:19 arch kernel: spd5118 5-0051: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0051: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0051: PM: failed to resume async: error -6
Oct 06 19:14:19 arch kernel: spd5118 5-0050: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0050: PM: failed to resume async: error -6
Oct 06 19:14:19 arch kernel: spd5118 5-0052: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0052: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0052: PM: failed to resume async: error -6
Oct 06 19:14:19 arch kernel: spd5118 5-0053: Failed to write b = 0: -6
Oct 06 19:14:19 arch kernel: spd5118 5-0053: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
Oct 06 19:14:19 arch kernel: spd5118 5-0053: PM: failed to resume async: error -6
…
Oct 06 19:14:19 arch kernel: NVRM: GPU at PCI:0000:01:00: GPU-1065b8ad-1b06-2566-f21f-9a65186c488d
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 11 Error
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0xa0040800
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0004, Class 0000c797, Offset 00000000, Data 00000000
…
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0x82040000
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Oct 06 19:14:19 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0006, Class 0000c797, Offset 00000000, Data 00000000
…
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 11 Error
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0xa2040800
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Oct 06 19:14:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0009, Class 0000c797, Offset 00000000, Data 00000000
…
Oct 06 19:14:34 arch kernel: cinnamon[1263]: segfault at 4 ip 000072c22a4a9097 sp 00007ffc1bd4ad60 error 4 in libnvidia-glcore.so.560.35.03[6a9097,72c22a200000+c00000] likely on CPU 11 (core 20, socket 0)
Oct 06 19:14:34 arch kernel: Code: 0f 1f 00 48 89 ef 5d e9 a7 cd 18 00 0f 1f 80 00 00 00 00 48 8b 05 69 6f 96 01 55 64 48 8b 28 83 ff 0f 0f 87 0f b3 bb ff 89 f8 <8b> 4e 04 8b 36 48 8d 90 64 4c 00 00 48 c1 e0 04 48 c1 e2 04 89 4c
Oct 06 19:14:34 arch systemd-coredump[104311]: Process 1263 (cinnamon) of user 1000 terminated abnormally with signal 11/SEGV, processing...
Oct 06 19:14:34 arch systemd[1]: Created slice Slice /system/systemd-coredump.
Oct 06 19:14:34 arch systemd[1]: Started Process Core Dump (PID 104311/UID 0).
Oct 06 19:14:35 arch systemd-coredump[104312]: Process 1263 (cinnamon) of user 1000 dumped core.
Stack trace of thread 1263:
#0 0x000072c22a4a9097 n/a (libnvidia-glcore.so.560.35.03 + 0x6a9097)
#1 0x000072c22a71c438 n/a (libnvidia-glcore.so.560.35.03 + 0x91c438)
#2 0x000072c22a60d579 n/a (libnvidia-glcore.so.560.35.03 + 0x80d579)
#3 0x000072c22a5fca58 n/a (libnvidia-glcore.so.560.35.03 + 0x7fca58)
#4 0x000072c22a6193e6 n/a (libnvidia-glcore.so.560.35.03 + 0x8193e6)
https://wiki.archlinux.org/title/NVIDIA … er_suspend
But https://docs.kernel.org/hwmon/spd5118.html is related to DDR5 RAM, so there might be an additional problem w/ memory integrity.
Does this only happen if you sleep for longer (eg. 4h in the presented case) or also after a 30 second nap?
Online
Does this only happen if you sleep for longer (eg. 4h in the presented case) or also after a 30 second nap?
I just checked: the same behaviour also shows if resuming after ~30s (although this time, all running applications from my session did not survive restart of cinnamon + I had to manually restart it (CTRL-F2 + `r`), as it did not offer me to do a restart via mentioned popup dialogue.
Offline
Same spd5118 errors? Do you get them w/ the LTS kernel?
What if you flat-out blacklist the module?
Online
Same spd5118 errors? Do you get them w/ the LTS kernel?
will have to test. I might downgrade back to 6.10 maybe instead of going to lts for testing (I did not have any issues until. before I upgraded to 6.11)
What if you flat-out blacklist the module?
just to double-check: you suggest I should blacklist kmod named `spd5118`?
Offline
just to double-check: you suggest I should blacklist kmod named `spd5118`?
Yes - from what I can tell it's just a temperature sensor.
Online
blacklisting spd5118 did not seem to make any difference. I tried both `rmmod` (w/o reboot) and then blacklisting + reboot (I verified in both cases using lsmod that spd5118 was not loaded).
Offline
The error you're receiving is hallmark of VRAM decay, https://bbs.archlinux.org/viewtopic.php?id=294612
While that doens't fit the 30s thing or explain the 6.11 condition, make sure to enable https://wiki.archlinux.org/title/NVIDIA … er_suspend
Online
Affirmative! Configuring nvidia-module as described here did in fact resolve the issue. It is however not necessary to blacklist spd5118 kmod.
Thank you so much for your quick help (again) :-)
Offline