You are not logged in.
I have this issue where the Xorg server crashes / core dumps (not always but sometimes) when the laptop wakes from suspend while connected to my dock station and external screen.
This seems to be related to the AMD GPU driver but not sure if it's a misconfiguration on my end or a bug that should be reported.
[ 73578.643] (EE) Backtrace:
[ 73578.644] (EE) 0: /usr/lib/Xorg (dri3_send_open_reply+0xdd) [0x55f93b8719dd]
[ 73578.644] (EE) 1: /usr/lib/libc.so.6 (__sigaction+0x50) [0x7f3a60eadf50]
[ 73578.644] (EE) unw_get_proc_name failed: no unwind info found [-10]
[ 73578.644] (EE) 2: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (?+0x0) [0x7f3a601bd1f2]
[ 73578.644] (EE) unw_get_proc_name failed: no unwind info found [-10]
[ 73578.645] (EE) 3: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (?+0x0) [0x7f3a601c1c9d]
[ 73578.645] (EE) unw_get_proc_name failed: no unwind info found [-10]
[ 73578.645] (EE) 4: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (?+0x0) [0x7f3a601b2872]
[ 73578.645] (EE) 5: /usr/lib/Xorg (dixSaveScreens+0x1fc) [0x55f93b7c6a2c]
[ 73578.645] (EE) 6: /usr/lib/Xorg (mieqProcessInputEvents+0x179) [0x55f93b764df9]
[ 73578.645] (EE) 7: /usr/lib/Xorg (ProcessInputEvents+0x1f) [0x55f93b880a9f]
[ 73578.645] (EE) 8: /usr/lib/Xorg (SProcXkbDispatch+0x26c8) [0x55f93b754da4]
[ 73578.645] (EE) 9: /usr/lib/libc.so.6 (__libc_init_first+0x90) [0x7f3a60e98790]
[ 73578.645] (EE) 10: /usr/lib/libc.so.6 (__libc_start_main+0x8a) [0x7f3a60e9884a]
[ 73578.645] (EE) 11: /usr/lib/Xorg (_start+0x25) [0x55f93b7552b5]
[ 73578.645] (EE)
[ 73578.645] (EE) Segmentation fault at address 0x0
[ 73578.645] (EE)
Fatal server error:
[ 73578.645] (EE) Caught signal 11 (Segmentation fault). Server aborting
I've dug around for a while and I've seen some slightly similar issues but nothing recent that I could find.
Any ideas on where to start?
Xorg full log: https://pastebin.com/Y1cKE5Du
dmesg full log: https://pastebin.com/mwXZNKVs
journalctl log: https://pastebin.com/XZPkz7yw
Specs:
Laptop: Thinkpad Z16 - 6850H AMD processor
GPU: Integrated AMD 680M
Docking station: Thinkpad thunderbolt 4 workstation dock
Versions:
Linux 6.2.9-arch1-1
xf86-video-amdgpu-23.0.0-1
mesa-23.0.1-1
Last edited by mido (2023-04-03 05:53:27)
Offline
related to the AMD GPU driver
Do you get the same behavior (but maybe a better backtrace) w/ the modesetting driver (ie. when removing xf86-video-amdgpu and all config files referencing it)?
Offline
I haven't tried but the thing is reproducing the issue is tricky as it's not consistent.
It has sth to do with the laptop being suspended for a while and being connected to the docking station, then getting removed while still in suspend and then trying to wake it up.
It doesn't always crash but I would say probably around 1 in 10 times.
Would I be able to use Xorg with no performance difference if I uninstall the driver, given that the laptop only has an integrated GPU?
Offline
*No* *difference* is pretty specific, but it should not vary a lot (up or down, depending on the context)
Are you running a plasma session?
The kscreen daemon might trigger this.
Offline
Ah I see.
I'm running only i3 with sddm.
If it turns out it's the driver that causes this, what would be the approach to fixing this?
Offline
Ifff it is xf86-video-amdgpu: not using it.
The trend is towards the modesetting driver on top of the kernel module anyway (b/c wayland doesn't rely on those drivers at all)
Offline
I'll try to reproduce and update.
And excuse my ignorance, but would stuff like hardware video acceleration still work without the xf86-video-amdgpu driver? or are they not related?
From the dependency of the packages, it seems that xf86-video-amdgpu depends on mesa, for example, so I'm guessing it would still work?
Offline
They are not related, it will still work.
Offline
After uninstalling xf86-video-amdgpu the crashes are gone however they are replaced by a black screen that freezes the system when I try to disconnect the laptop from the dock (TB4 if it matters)
The lines in the journalctl logs that seems interesting:
Apr 05 04:00:28 tpz kernel: ACPI: EC: interrupt blocked
Apr 05 04:00:28 tpz kernel: ACPI: EC: interrupt unblocked
Apr 05 04:00:28 tpz kernel: pcieport 0000:04:00.0: not ready 1023ms after resume; giving up
Apr 05 04:00:28 tpz kernel: pcieport 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 05 04:00:28 tpz kernel: pcieport 0000:05:04.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 05 04:00:28 tpz kernel: pcieport 0000:05:00.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 05 04:00:28 tpz kernel: pcieport 0000:05:03.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 05 04:00:28 tpz kernel: pcieport 0000:05:01.0: Unable to change power state from D3cold to D0, device inaccessible
Apr 05 04:00:28 tpz kernel: pcieport 0000:05:02.0: Unable to change power state from D3cold to D0, device inaccessible
full journalctl log: https://pastebin.com/EiyEeurC
Is this related to uninstalling the amdgpu driver? Is it fixable?
Offline
That's the thunderbolt bridge
freezes the system
You seem to be able to login from TTY2 fine?
What's the output of
xrandr --display :0
in that situation?
Do you somehow rely on the output being moved from some external display to the internal display when disconnecting from the dock or is the internal output always active?
Offline
Yes I was able to (after a bit of banging on all keys and disconnecting the dock) to switch to TTY2 and get the logs once. And yes it seems that when I returned back to TTY1 that the system was still trying to output on the external screen even though it was disconnected. When I opened arandr, the external screen was still active. And my windows that were on the external display was not moved to the internal one.
Offline
https://aur.archlinux.org/packages/x-on-resize - but the actual xrandr ouptut would probably be helpful.
(arandr is an abstracting wrapper)
I guess deactivating the external output before S3/detach from dock/wake would prevent the problems?
Offline
Sorry it has been a while but the system has been behaving until today when it froze upon waking from suspend while connected to the TB4 dock. This time I couldn't even switch to TTY.
The main issue seems to be the following:
Apr 11 04:02:30 tpz kernel: [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
which eventually leads to
Apr 11 04:04:29 tpz kernel: [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
Apr 11 04:04:29 tpz kernel: [drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!
Apr 11 04:04:29 tpz kernel: Freezing user space processes failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0):
Apr 11 04:04:29 tpz kernel: task:Xorg state:D stack:0 pid:1527 ppid:1521 flags:0x00404006
Apr 11 04:04:29 tpz kernel: Call Trace:
Apr 11 04:04:29 tpz kernel: <TASK>
Apr 11 04:04:29 tpz kernel: __schedule+0x3c8/0x12e0
Apr 11 04:04:29 tpz kernel: schedule+0x5e/0xd0
Apr 11 04:04:29 tpz kernel: schedule_timeout+0x98/0x160
Apr 11 04:04:29 tpz kernel: ? __pfx_process_timeout+0x10/0x10
Apr 11 04:04:29 tpz kernel: wait_for_completion_timeout+0x83/0x170
Apr 11 04:04:29 tpz kernel: amdgpu_dm_process_dmub_aux_transfer_sync+0x67/0x1e0 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: dm_dp_aux_transfer+0xdc/0x1a0 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: drm_dp_dpcd_access+0xad/0x130 [drm_display_helper 459c9339bb4e515b13be92d93010e7ffa9f20647]
Apr 11 04:04:29 tpz kernel: drm_dp_dpcd_write+0x8d/0xe0 [drm_display_helper 459c9339bb4e515b13be92d93010e7ffa9f20647]
Apr 11 04:04:29 tpz kernel: dm_helpers_dp_write_dpcd+0x2c/0x50 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: core_link_write_dpcd+0x8f/0x100 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: dp_receiver_power_ctrl+0x42/0x60 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: power_down_all_hw_blocks+0x2e/0x180 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: dce110_enable_accelerated_mode+0x2ef/0x390 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: dc_commit_state_no_check+0x6ac/0xcd0 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: dc_commit_state+0x10b/0x130 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: amdgpu_dm_atomic_commit_tail+0x5c3/0x2d20 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: ? dcn31_validate_bandwidth+0x12b/0x2c0 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: ? dc_validate_global_state+0x310/0x3e0 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: ? dma_resv_iter_first_unlocked+0x66/0x70
Apr 11 04:04:29 tpz kernel: ? dma_resv_get_fences+0x61/0x220
Apr 11 04:04:29 tpz kernel: ? wait_for_completion_timeout+0x13e/0x170
Apr 11 04:04:29 tpz kernel: ? wait_for_completion_interruptible+0x139/0x1e0
Apr 11 04:04:29 tpz kernel: commit_tail+0x94/0x130
Apr 11 04:04:29 tpz kernel: drm_atomic_helper_commit+0x116/0x140
Apr 11 04:04:29 tpz kernel: drm_atomic_commit+0x9a/0xd0
Apr 11 04:04:29 tpz kernel: ? __pfx___drm_printfn_info+0x10/0x10
Apr 11 04:04:29 tpz kernel: drm_atomic_helper_set_config+0x74/0xb0
Apr 11 04:04:29 tpz kernel: drm_mode_setcrtc+0x515/0x7e0
Apr 11 04:04:29 tpz kernel: ? __pfx_drm_mode_setcrtc+0x10/0x10
Apr 11 04:04:29 tpz kernel: drm_ioctl_kernel+0xcd/0x170
Apr 11 04:04:29 tpz kernel: drm_ioctl+0x233/0x410
Apr 11 04:04:29 tpz kernel: ? __pfx_drm_mode_setcrtc+0x10/0x10
Apr 11 04:04:29 tpz kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu dea85af9f21526a783f0c672a8d2ac67d227b3c0]
Apr 11 04:04:29 tpz kernel: __x64_sys_ioctl+0x94/0xd0
Apr 11 04:04:29 tpz kernel: do_syscall_64+0x5f/0x90
Apr 11 04:04:29 tpz kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
Apr 11 04:04:29 tpz kernel: RIP: 0033:0x7f272e91353f
Apr 11 04:04:29 tpz kernel: RSP: 002b:00007ffe7385b0e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 11 04:04:29 tpz kernel: RAX: ffffffffffffffda RBX: 000055f4299fe980 RCX: 00007f272e91353f
Apr 11 04:04:29 tpz kernel: RDX: 00007ffe7385b170 RSI: 00000000c06864a2 RDI: 0000000000000013
Apr 11 04:04:29 tpz kernel: RBP: 00007ffe7385b170 R08: 0000000000000000 R09: 000055f429a05490
Apr 11 04:04:29 tpz kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c06864a2
Apr 11 04:04:29 tpz kernel: R13: 0000000000000013 R14: 000055f42981c5d8 R15: 00007ffe7385b220
Apr 11 04:04:29 tpz kernel: </TASK>
Full log: https://pastebin.com/4Jdzw04P
I have found the following issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2068 which seems to be similar to mine.
Is there a way that I can fix the crashes until a root cause is found?
EDIT: Might be related to this as well: https://bbs.archlinux.org/viewtopic.php?id=284076
Last edited by mido (2023-04-11 02:23:07)
Offline
Is there a way that I can fix the crashes until a root cause is found?
I guess deactivating the external output before S3/detach from dock/wake would prevent the problems?
(this could possibly be automated into a workaround)
Offline
I guess deactivating the external output before S3/detach from dock/wake would prevent the problems? (this could possibly be automated into a workaround)
I can try that, do you have suggestions for hooks to plug deactivation of the external display into? I mean a hook for pre-S3 and pre detach from dock?
Offline
https://wiki.archlinux.org/title/Power_ … stem-sleep
"pre detach from dock" would require a precognitive AI and despite all the hype, FatGDP still struggles w/ the basics
https://bbs.archlinux.org/viewtopic.php … 3#p2091193
https://bbs.archlinux.org/viewtopic.php … 7#p2082027
https://bbs.archlinux.org/viewtopic.php … 2#p2082782
Offline
https://wiki.archlinux.org/title/Power_ … stem-sleep
"pre detach from dock" would require a precognitive AI and despite all the hype, FatGDP still struggles w/ the basics
https://bbs.archlinux.org/viewtopic.php … 3#p2091193
https://bbs.archlinux.org/viewtopic.php … 7#p2082027
https://bbs.archlinux.org/viewtopic.php … 2#p2082782
Haha That would be nice
I've been trying the following:
Using linux-zen kernel
Replace HDMI with DP (as I've read in the forums that it can be a bit unreliable)
So far I haven't been having issues but will update again in a couple of days to confirm, as the crashes are triggered by long suspends.
Offline
After using linux-zen and DP instead of HDMI, the frequency of the issue has drastically decreased. I don't have the GPU timeout and reset regularly after suspend.
However it still happens, seems to be related to GPU usage and (according to drm/amd issue) the voltage changes.
At this point, I'm not sure if there's sth that I can do, since it seems to be a long ongoing issue with on amd's backlog.
But just thought I share my latest learnings and latest journal ctl log with the GPU timeout, reset and freeze.
Log file: https://pastebin.com/fVykFnii
Thank you to seth and v1del for your support, I'll post here if I have any updates on the issue.
Offline
amdgpu.dpm=0 amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 amdgpu.sg_display=0
https://wiki.archlinux.org/title/Kernel_parameters
amdgpu.dpm=0 might break the boot and amdgpu.sg_display=0 is moot if the problem also affects the LTS kernel
Offline
amdgpu.dpm=0 amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 amdgpu.sg_display=0
https://wiki.archlinux.org/title/Kernel_parameters
amdgpu.dpm=0 might break the boot and amdgpu.sg_display=0 is moot if the problem also affects the LTS kernel
Yup you called it amdgpu.dpm=0 breaks the boot.
I've left the other parameters just in case.
Also another update is that TLP somehow makes things worse. Crashes are way more frequent.
Offline
Just cross referencing this here: https://gitlab.freedesktop.org/drm/amd/ … te_1933267
I found a way to reliably reproduce the issue, see the above comment.
Offline