You are not logged in.
See this image for how the crash looks: https://imgur.com/4XRtdx3.png (It is not static in reality, it flickers with those line-thingies)
I have to force reboot, although sometimes it reboots by itself. This has been happening for 2-3 months intermittently, before that everything worked perfectly.
System information: https://imgur.com/WPDJIYR.png
pacman -Q: https://pastebin.com/4EtuuTKk
journalctl -b -1: https://pastebin.com/RkCa2wdu (The last boot that just crashed)
I ran a ssh session from another computer and tail -F all the log files. Xorg logs, /sys/class/drm/card1/error didn't say anything just before the crash
journalctl -f from SSH gave
kernel: perf: interrupt took too long (4054 > 4045), lowering kernel.perf_event_max_sample_rate to 49200
I also ran `watch sensors` from an SSH session, and the temperatures were all well below the high and critical values leading to the crash.
The SSH connection also got dropped as soon as it crashed, which means that networking also goes down.
I have tried with modesetting and with xf86-video-intel
I have tried with and without vulkan.
All face the same issue
I use libva-intel-driver as I have a haswell chip.
Please let me know how I can fix this, thanks!
Note:
Very very rarely, this also happens *randomly*, without me needing to play any video.
Once, when I was working in vscode.
Once, when I opened a PNG image in mpv.
Once, when I was restoring from a timeshift backup.
Last edited by porridgewithraisins (2023-10-07 08:17:26)
Offline
Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General and in doubt get rid of xf86-video-intel
Offline
Hi, I run rootless Xorg and make it so the Xorg logs come to journald itself with the startx: prefix
So filter by startx: In the journalctl I sent
Offline
Oct 03 22:48:35 aio brave-browser.desktop[8945]: MESA-INTEL: warning: Haswell Vulkan support is incomplete
Oct 03 23:51:18 aio brave-browser.desktop[14222]: MESA-INTEL: warning: Haswell Vulkan support is incomplete
…
Oct 04 02:12:56 aio startx[21582]: MESA-INTEL: warning: Haswell Vulkan support is incomplete
# the journal ends here
"Browser" is always brave? Do FF or chromium trigger the same?
Do you also get this using lavapipe?
https://wiki.archlinux.org/title/Vulkan … :_lavapipe
And/or when disabling https://wiki.archlinux.org/title/Xfwm#Composite_manager ?
Offline
I will try with firefox.
Then, separately, I will disable the XFCE compositor and try that as well.
Since I haven't been able to find a minimal reproducible example (the full screen video crash is also not after a uniform amount of time), and I have to essentially wait for it to crash, I'll get back to you on this as it happens.
FWIW:
- It doesn't crash when I am doing other stuff on the GPU i.e ffmpeg encoding, screen recording (I make sure to use H264_VAAPI whenever I do these things)
- It happened even outside the browser a few times like I mentioned in the original post.
- About vulkan, I don't think the browser is using it, as it is disabled on linux by default and I didn't enable it in about:flags. about:gpu shows that it is disabled as well
- I have included all the information in about:gpu here if it might be useful to you: https://pastebin.com/caaz5BNN
Last edited by porridgewithraisins (2023-10-04 15:42:58)
Offline
@Seth
I tried disabling the compositor first, and it worked! I turned off compositing, then left a video on overnight, and tail -f'd the log from another device.
I scheduled a manual shutdown at 2AM and it all went through smoothly.
Does this mean I can't use compositing at all? Or can I tweak xfwm4 so it won't make it crash. Because things like xfce4-screenshooter doesn't work well without a compositor, and I like a bit of transparency
If you don't know of a tweak, I can ask it in the xfce gitlab.
Offline
I'd first see whether this is specific to the xfwm4 compositor or compositing in general by trying the behavior of eg. xcompmgr and/or picom.
Offline
Cool, I will try with picom, and get back to you.
Offline
So it crashes with xfwm's compositor, and doesn't with picom
Offline
xfwm4's compositor looks derived from xcompmgr and uses xrender, afaik picom still defaults to xrender, but also supports GLX rendering - did you use a picom config that activated the latter?
picom also allows to use GLX to sync to vblank, which xfwm4 apparently only controls by commandline switch.
picom also allows to configure unredirection of FS windows, I don't see a way for xfwm4 to control that (though it seems to do so) - given the circumstances of the bug, that's a big question.
Otherwise, if backend and vsync support are the same => https://gitlab.xfce.org/xfce/xfwm4/-/issues
In doubt post your picom config.
Offline
Yes, I activated the glx backend for picom. I also turned off vsync
picom.conf: https://pastebin.com/AAtYBWQi
Regarding unredirection, xfwm4 has a "Display fullscreen overlay windows directly" option that is default enabled in the compositor options in Window manager tweaks
Last edited by porridgewithraisins (2023-10-06 09:09:28)
Offline
Try picom w/ the xrender backend and gxl vsync to mimic the default behavior of xfwm4
Offline
I tried it with those settings. While enabling vsync causes moving windows to be jittery, there is NO crash.
Should I go and create an issue in the xfwm4 gitlab, or do you have something else for me to try? Also, since it seems as though it's an xfwm4 issue, I will try running it in DEBUG mode and seeing if it logs something.
For learning purposes: How did you know that compositors can cause such issues? With my knowledge, I kept pounding at the gpu driver, Xorg drivers, and various kernel parameters since it looked like a typical "hardware issue", while you got the correct issue in your first try.
Last edited by porridgewithraisins (2023-10-06 17:31:15)
Offline
If picom+xrender+vsync doesn't cause this but xfwm4 does, it's most likely a bug in the latter, or that gets triggered by the latter (only) when unredirecting the window.
Did you btw. try to disable that in xfwm4?
How did you know that compositors can cause such issues?
Experience - they're a big factor in the visual stack and the first thing to get out of the way when looking into graphical corruptions.
As hinted above, I'm not even sure that it's actually a bug in the compositor (I mean, it could still try to paint an invalidated Picture ID, who knows) but with the specific circumstances it might just expose a bug in the server that eg. picom sidesteps by delaying the unredirection.
Offline
Hi, In Xfwm4, it seems that setting /general/vblank_mode to "off" from the default value of "auto" - which the only docs I could find say maps to "glx" - fixes the issue.
This might be the case of a single lucky test. I will run a test again tomorrow and find out.
The test in question is just playing a full screen video on loop for 1hr+
However, moving windows is jittery with this off, and I would like to avoid it.
> Experience - they're a big factor in the visual stack and the first thing to get out of the way when looking into graphical corruptions.
Thanks!
Last edited by porridgewithraisins (2023-10-06 20:06:16)
Offline
I will also try the `xpresent` setting and see if it crashes with that.
----
What is interesting is, this was never a problem until few months ago - I will recollect exactly when. Must be some complex issue with how the gpu driver and the x modesetting driver and xfwm4's compositor all interact for my specific haswell era gpu because one of them changed since then.
Last edited by porridgewithraisins (2023-10-06 20:08:56)
Offline
I tried it with xpresent, and no crashes! As such, the issue is solved.
I will make an issue in the xfwm4 gitlab with logs with vblank_mode set to "glx" and to "xpresent"
Last edited by porridgewithraisins (2023-10-07 08:17:09)
Offline