You are not logged in.
Hi.
For some reasons my Arch distro happens to "randomly" log out of session and going back to SDDM when I try to use 3D-powered applications like Windows games (with Wine) or Blender.
I know my hardware is able to handle it because... welp it's still kinda high-end/middle-end hardware (Radeon RX 5700 XT, Ryzen 5 3600) and I didn't have this problem before.
I tried to watch global journalctl to see if there's something that could help me in there but I couldn't get anything if not :
avril 19 09:22:39 pchost-1 sddm-helper[770]: pam_unix(sddm-greeter:session): session closed for user sddm
avril 19 09:22:39 pchost-1 sddm-helper[770]: [PAM] Closing session
avril 19 09:22:39 pchost-1 sddm-helper[770]: [PAM] Ended.
avril 19 09:22:39 pchost-1 sddm[660]: Auth: sddm-helper exited successfully
avril 19 09:22:39 pchost-1 sddm[660]: Greeter stopped.
avril 19 09:22:39 pchost-1 systemd[1]: session-c1.scope: Deactivated successfully.
avril 19 09:22:39 pchost-1 systemd[1]: session-c1.scope: Consumed 53.799s CPU time.
avril 19 09:22:39 pchost-1 systemd-logind[630]: Session c1 logged out. Waiting for processes to exit.
... the logout debug notification.
I'll provide any information that you need, just ask for it. Thanks in advance.
Last edited by byjove01 (2023-05-13 17:03:08)
Offline
Check dmesg and general kernel messages prior to getting logged out, you are getting logged out "cleanly" assuming this has anything to do with load potentiall the OOM killer killing your session. Post the full
sudo journalctl -b
after getting kicked: https://wiki.archlinux.org/title/List_o … n_services
FWIW symptoms sound like the following on going thread: https://bbs.archlinux.org/viewtopic.php?id=282511 - but that is a hodgepot of combined repositories and potentially conflicting installation.
Offline
Here are the links to the output logs you asked me to give.
dmesg => http://0x0.st/H8Us.txt
journalctl -b (after getting kicked) => https://0x0.st/H8UH.txt
Offline
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=173815, emitted seq=173817
avril 20 09:34:04 pchost-1 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process blender pid 11053 thread blender:cs0 pid 11096
so yeah blender crashing your amdgpu.
Sadly this can happen in a variety of ways and for variety of reasons and there are a bunch of bug reports on this issue e.g. https://gitlab.freedesktop.org/drm/amd/-/issues/1974 which suggests you might want to try and put the GPU into high performance mode, as some guesses are in the direction of power/voltage being insufficient, so try some of the feature masks mentioned there and maybe explicitly enabling high performance mode like mentioned in https://wiki.archlinux.org/title/AMDGPU … cy_problem for example.
OT: Your gitea is crashing hundreds of times, might want to fix the perms on that config file or simply disable and stop the gitea service since it isn't coming up anyway
Last edited by V1del (2023-04-20 08:40:58)
Offline
Knowing I didn't had these problems before, I feel like putting my GPU on a high performance mode would be a bit... mmh, avoiding the heart of the problem and admit my defeat... welp I mean that sounds awful to be forced to change my GPU config just to avoid crashes I didn't had before. Maybe I'm misunderstanding that idea, but won't that lower my GPU's power?
I'll try reinstalling my video drivers.
I noticed about Gitea, thanks for pointing it out anyways, I'll solve that.
Offline
No setting it to high performance mode will make it perform at a higher clock frequency in order to prevent stalls from faulty power management attempts, it should/would draw more power but be generally more performant.
When was "working before"? Specific kernel version? You can also test e.g. the LTS kernel.
Offline
OP is missing the GCVM_L2_PROTECTION_FAULT_STATUS that got relevant for the other thread and some more.
when I try to use 3D-powered applications like Windows games (with Wine) or Blender
would/should™ move to the performance mode anyway and regardless of the desired behavior, testing the impact of the HP mode on the situation will help to understand the nature of it.
Pink elephant in the room is that apparently the X11 server crashes for the (successful) GPU reset (induced by blender)
=> Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General and in doubt get rid of xf86-video-amdgpu
Offline
Here is the Xorg log despite of the delay.
https://pastebin.com/4SHv5hzT
Got rid of xf86 and it didn't solve anything.
Offline
Please post a "clean" journal (w/o teh gitea noise) covering the latest crash and if it's still recorded in that, also the Xorg.0.log.old
Offline
journalctl -b => https://pastes.io/2xypx7i4mx
Xorg.0.log => https://pastebin.com/c3EY1x3D
Xorg.0.log.OLD => https://pastebin.com/5aSjmUVC
Here we are.
Last edited by byjove01 (2023-04-24 07:59:17)
Offline
https://pastebin.com/Tqc2fkpn is 404, Xorg.0.log.old just stops (w/ the last message that also trails the current log)
Offline
Edited journalctl link. Also, I couldn't help for the Xorg.0.log.old file because I just did Ctrl+A+copypaste its content to the pastebin.
Offline
Ftr
sudo journalctl -b | curl -F 'file=@-' 0x0.st
But we are where we were:
avril 24 09:44:18 pchost-1 kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset(2) succeeded!
avril 24 09:44:18 pchost-1 plasmashell[1052]: amdgpu: amdgpu_cs_query_fence_status failed.
avril 24 09:44:18 pchost-1 plasmashell[2354]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
avril 24 09:44:18 pchost-1 plasmashell[2354]: amdgpu: The process will be terminated.
avril 24 09:44:18 pchost-1 plasmashell[2354]: Freeing memory after the leak detector has run. This can happen when using static variables in C++ that are defined outside of functions. To fix this error, use the 'construct on first use' idiom.
avril 24 09:44:18 pchost-1 plasmashell[2354]: Freeing memory after the leak detector has run. This can happen when using static variables in C++ that are defined outside of functions. To fix this error, use the 'construct on first use' idiom.
avril 24 09:44:18 pchost-1 plasmashell[2354]: blender: ../libepoxy/src/dispatch_common.c:872: epoxy_get_proc_address: Assertion `0 && "Couldn't find current GLX or EGL context.\n"' failed.
avril 24 09:44:18 pchost-1 plasmashell[2354]: Freeing memory after the leak detector has run. This can happen when using static variables in C++ that are defined outside of functions. To fix this error, use the 'construct on first use' idiom.
avril 24 09:44:18 pchost-1 dolphin[1681]: The X11 connection broke (error 1). Did the X11 server die?
avril 24 09:44:18 pchost-1 konsole[1902]: The X11 connection broke (error 1). Did the X11 server die?
avril 24 09:44:18 pchost-1 xscreensaver[1206]: X connection to :0 broken (explicit kill or server shutdown).
The GPU resets (under the pressure of blender) and then the session crashes because apparently Xorg did, but there's no record of that in the xorg log - so we're kinda stuck.
Attach gdb to the Xorg process, trigger the crash and hopefully we'll get better information this way.
nb. that to attach gdb you'll have to
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
and you want to do that from a different VT because you'll have to "continue" to, well, continue the processing of the display server.
You can likewise
gdb --pid $(pidof Xorg) 2>&1 | tee /tmp/xorg.gdb
to log the session.
Offline
I did that, but gdb returned a "30 return SYSCANCEL" or something like that, didn't remember most of the line. It did put me back to the gdb terminal and I felt forced to restart the computer.
Isn't there a hacky way to get several VT on the same display? Because it's actually frustating to be switching the display each time I want to type a part of the line.
Offline
I felt forced to restart the computer.
Because it's actually frustating to be switching the display each time I want to type a part of the line.
I'm not sure what you're talking about, but you can ssh into the system and run gdb in the ssh shell on the other system.
Offline
I'm the one not getting it sorry. I don't have another system to make a SSH process from, so I can't do what you're inviting me to.
I was talking about a potential hacky way to get several VTs on the same display, so I could follow your instructions without wreck up the Xorg process.
Offline
You're gonna crash Xorg anyway.
* Start and X11 session
* deactivate the ptrace scope limitation
* head over to a different VT (ctrl+alt+f2,f3,…)
* login
* gdb --pid $(pidof Xorg) 2>&1 | tee /tmp/xorg.gdb
* type "continue"
* head back to the Xorg session
* make it crash
* move back the the VT w/ gdb
* type "bt", then "detach" and "quit"
* cat /tmp/xorg.gdb | curl -F 'file=@-' 0x0.st
If you have issues memorizing the commands for the console login, somebody invented pencil and paper some centuries ago
Offline
I did it successfully until the "head back to the Xorg session" part ; the screen just froze when I pressed Ctrl+Alt+F1 and I was forced to restart my computer because there was no signs of Xorg working correctly.
Offline
Did you "continue" in gdb?
Could you also not head back to the VT w/ gdb?
If the Xorg process is owned by the root user, you'll have to run gdb as root as well.
You can maybe try to kinda remote-control this
sleep 5; gdb --pid $(pidof Xorg) -ex continue 2>&1 | tee /tmp/Xorg.gdb
then return to the X11 terminal before gdb starts, hope that it works, crash X11, return to gdb and "bt" etc. there.
Also try to suspend the compositor (SHIFT+Alt+F12) before any of this in case that is what's stopping the output and rendering black.
Offline
1) Yes.
2) No, I'd like to but it just freezes each time I try to switch VT.
3) I did it yet.
Nope... It just does not work. My screen freezes out at the moment GDB starts (even if I go back to the X11 session) and I don't have the time to reproduce the Xorg crash by myself.
Last edited by byjove01 (2023-04-26 08:01:27)
Offline
I don't have another system to make a SSH process from
Any chance to change anything about that anytime soon?
Offline
I don't have another system to make a SSH process from
Any chance to change anything about that anytime soon?
Probably not. I can try but I'm not sure to succeed. Isn't there a way to get the reason behind the screen freeze problem I'm describing?
Offline
Isn't there a way to get the reason behind the screen freeze problem I'm describing?
Not w/o understanding why the screen even "freezes".
You could try to see whether this is a plasma problem by running blender on an openbox session.
Offline
Isn't there a way to get the reason behind the screen freeze problem I'm describing?
Not w/o understanding why the screen even "freezes".
You could try to see whether this is a plasma problem by running blender on an openbox session.
Same issue. Openbox crashes and I go back to SDDM. Probably not a Plasma thing, but really a Xorg one.
Also, I was talking about the screen freeze I meet after trying to put GDB to analyze the Xorg process, sorry, I should've been more precise.
Last edited by byjove01 (2023-04-27 07:47:09)
Offline
Also, I was talking about the screen freeze I meet after trying to put GDB to analyze the Xorg process
Me likewise.
This is somewhat of a hen-egg problem - we'd need to know why the output freezes when trying to debug Xorg but can't, because we can't debug Xorg w/o freezing the output.
Remote access would be somewhat helpful. There's ssh clients for android, if that helps you anything.
Offline