You are not logged in.
Deep in the ~/.local/share/spotify-launcher/install/usr/share/spotify/libcef.so upstream binary and in libxul.so
There's 64GB swap, so it's not a sudden OOM, the proximity might hint at a temperature issue, do you monitor that?
You're using plasma/kwin + picom? Why? Try to kill picom.
Also, on a limb, there've been multiple reported issues w/ the 545xx nvidia drivers, you might try the 535xx versions of nvidia-utils and nvidia-dkms from the ALA
Also, can you get F to crash w/ "-safe-mode" (make sure to kill all FF processes before you spawn one w/ that switch), notably by disabling HW accleration?
For spotify "--disable-gpu" *might* work (ie. being interpreted by electron)
It is the fat plus-sized "RAMly positive" things that crash.
If you simply allocate a lot of RAM, does that process die?
(You can use heads and tails for that, https://unix.stackexchange.com/question … ree-memory )
Offline
Deep in the ~/.local/share/spotify-launcher/install/usr/share/spotify/libcef.so upstream binary and in libxul.so
There's 64GB swap, so it's not a sudden OOM, the proximity might hint at a temperature issue, do you monitor that?
Temperatures are usually between 40-50° C while browsing, so that shouldn't be a problem.
You're using plasma/kwin + picom? Why? Try to kill picom.
More settings than the plasma compositor, like rounded corners and shadow control. I'll give that a try and report back, though IIRC I first ran Plasma without Picom and had the same issue.
Edit: Killed picom and Firefox crashed not a minute later, so that's not it.
Also, on a limb, there've been multiple reported issues w/ the 545xx nvidia drivers, you might try the 535xx versions of nvidia-utils and nvidia-dkms from the ALA
Will give that a try as well, thanks.
Edit: Downgraded nvidia-dkms, nvidia-utils and lib32-nvidia-utils to 535xx, rebooted and a tab just crashed again. Sigh... This is probably the most frustrating issue I've ever had with an electronic device
Also, can you get F to crash w/ "-safe-mode" (make sure to kill all FF processes before you spawn one w/ that switch), notably by disabling HW accleration?
For spotify "--disable-gpu" *might* work (ie. being interpreted by electron)
I already ran Firefox in safe mode and disabled HW acceleration in the settings, but it still crashes.
It is the fat plus-sized "RAMly positive" things that crash.
If you simply allocate a lot of RAM, does that process die?
(You can use heads and tails for that, https://unix.stackexchange.com/question … ree-memory )
I've allocated about 25GB with head/tail, so that only 2GB were free, and it didn't crash (I let it run for about 1-2 minutes)
Last edited by NoisyFlake (2024-01-05 09:56:23)
Offline
Can you try to reproduce this on an openbox session, get rid of xdg-desktop-portal-kde, xdg-desktop-portal and rtkit and "export NO_AT_BRIDGE=1" in /etc/profile.d/noa11y.sh
The backtraces are all over the place, but the affected clients seem not, so let's try to move them outside their usual habitat…
Offline
Can you try to reproduce this on an openbox session, get rid of xdg-desktop-portal-kde, xdg-desktop-portal and rtkit and "export NO_AT_BRIDGE=1" in /etc/profile.d/noa11y.sh
Did all that, both Discord and Firefox crashed within an hour.
I did some more googling for Firefox segfaulting in libxul.so, and in most cases it was a RAM issue. But as I said, I've replaced the complete kit with a brand new, certified one. Is it possible that the RAM bank of the mainboard might simply malfunction? Or the CPU?
Edit: When the whole browser crashed a few minutes ago, there wasn't a segfault, instead this is what I found in journalctl:
Jan 08 18:17:15 arch plasmashell[94317]: ExceptionHandler::GenerateDump cloned child
Jan 08 18:17:15 arch plasmashell[209903]: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
Jan 08 18:17:15 arch plasmashell[94317]: 209903
Jan 08 18:17:15 arch plasmashell[94317]: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
Jan 08 18:17:16 arch plasmashell[162596]: Exiting due to channel error.
Jan 08 18:17:16 arch plasmashell[209275]: Exiting due to channel error.
Jan 08 18:17:16 arch plasmashell[209321]: Exiting due to channel error.
Jan 08 18:17:16 arch plasmashell[209687]: Exiting due to channel error.
Jan 08 18:17:16 arch plasmashell[209108]: Exiting due to channel error.
Jan 08 18:17:16 arch 1password[93161]: ERROR 2024-01-08T17:17:16.056 tokio-runtime-worker(ThreadId(7)) [1P:native-messaging/op-native-core-integration/src/connection_handler.rs:62] message from b5x was None: EndConnection
Jan 08 18:17:16 arch 1password[93161]: ERROR 2024-01-08T17:17:16.056 tokio-runtime-worker(ThreadId(7)) [1P:native-messaging/op-native-core-integration/src/connection_handler.rs:31] Dropping connection with b5x due to error handling incoming message: EndConnection
Jan 08 18:17:16 arch plasmashell[208887]: Exiting due to channel error.
Last edited by NoisyFlake (2024-01-08 17:24:24)
Offline
FF showing up as plasmashell is a weird KDE related thing, likely due to things being started as user services.
The error is fom FF and means that a child process (web renderer or so) notice that the controlling process vanished.
Did you ever try to remove all abut one DIMMs?
Try to disable zswap and THP
https://wiki.archlinux.org/title/Zswap
https://bbs.archlinux.org/viewtopic.php … 7#p2110357 (nb. that the unit there has a typo, stray "]")
Offline
Did you ever try to remove all abut one DIMMs?
Yeah, I even did it again a few days ago after I installed the new RAM, same result though.
Try to disable zswap and THP
https://wiki.archlinux.org/title/Zswap
https://bbs.archlinux.org/viewtopic.php … 7#p2110357 (nb. that the unit there has a typo, stray "]")
Applied both and rebooted, Firefox crashed a few minutes later. And then a bit later, for the first time ever, the whole plasmashell crashed, so it doesn't seem to be just Browser/Electron-related anymore.
I'm fairly certain that it isn't a hardware issue though, but rather a software/Arch issue. Out of curiosity, I ran a Debian 12 live with several Firefox and Chrome tabs playing YouTube videos (which seems to trigger crashes more often, usually only takes a few minutes for the crash to happen) for several hours, and everything worked well. I then tried the same with an EndeavourOS live (which is basically just plain Arch with a fancy GUI installer), and FF crashed not 5 minutes into the test.
I have no idea what to do with this information, but maybe someone knows why this doesn't happen on Debian.
Last edited by NoisyFlake (2024-01-10 12:42:57)
Offline
Applied *how*?
sysfs is transient, whetever you change there is gone after a reboot.
Offline
Applied *how*?
sysfs is transient, whetever you change there is gone after a reboot.
I added zswap.enabled=0 and transparent_hugepage=never to my kernel parameters (and also added the disable thp service file and enabled it).
I really like Arch and don't want to switch to another Distro, but stability is necessary for a daily driver. Any idea why it doesn't crash on Debian?
Last edited by NoisyFlake (2024-01-10 15:30:52)
Offline
Which kernel on debian?
Offline
Which kernel on debian?
6.1.0-15
I already tried the linux-lts kernel (which was 6.1 until two days ago) on Arch though, and it still crashed.
Last edited by NoisyFlake (2024-01-10 15:57:31)
Offline
I had hoped that it was the kernel config, those two items specifically
Did you use the proprietary nvidia drivers or nouveau on debian?
Offline
Did you use the proprietary nvidia drivers or nouveau on debian?
I used the default that ships with the Debian 12 live image, so nouveau. Do you want me to try the nouveau driver on Arch? Mind that this is not something I could actually use for a long period of time since I do a lot of gaming on Arch as well.
Offline
It would certainly be interesting to know whether it's the relevant factor as that would tell us where to look for mitigations.
Ie. yes, you should absolutely test that.
Offline
I'm having problems switching to Nouveau. I removed nvidia-dkms and nvidia-utils (and the 32-bit version) and created a 20-nouveau.conf in /etc/X11/xorg.conf.d/, but after a reboot, no graphical userinterface boots up. Instead, this is what I get from journalctl:
Jan 10 18:46:53 arch sddm[860]: Display server starting...
Jan 10 18:46:53 arch sddm[860]: Writing cookie to "/run/sddm/xauth_caPbwB"
Jan 10 18:46:53 arch sddm[860]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_caPbwB -noreset -displayfd 16
Jan 10 18:46:53 arch sddm[860]: Failed to read display number from pipe
Jan 10 18:46:53 arch sddm[860]: Display server stopping...
Jan 10 18:46:53 arch sddm[860]: Attempt 3 starting the Display server on vt 2 failed
Jan 10 18:46:53 arch sddm[860]: Could not start Display server on vt 2
Last edited by NoisyFlake (2024-01-10 17:52:52)
Offline
What did you add to the nouveau conf, and rather you likely just want to remove that and run without any explicit config.
Offline
pre-edit you stated that nvidia was still loaded, probably because it's in the initramfs?
=> "mkinitcpio -P"
Also please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
Offline
What did you add to the nouveau conf, and rather you likely just want to remove that and run without any explicit config.
True, the config wasn't necessary.
pre-edit you stated that nvidia was still loaded, probably because it's in the initramfs?
=> "mkinitcpio -P"
Also please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
Ah yes, of course, that did the trick, thanks. I'm running nouveau now and will report back soon. The performance is absolutely terrible, but I'll let FF run with a few YouTube tabs again.
Last edited by NoisyFlake (2024-01-10 20:33:40)
Offline
Also please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
just to make sure you didn't end up w/ vesa or the swrast driver now
Offline
Here you go: https://pastebin.com/ctPvakS7
Although it contains errors about loading nouveau, it later said that it was loaded and "lspci -v" also says "Kernel driver in use: nouveau". But the performance is really terrible. On the Debian live image, everything was working just as good as with the nvidia driver, but here the cursor is lagging really bad all the time.
Offline
No, that's fine - the xf86-video-nouveau ddx driver isn't required.
but here the cursor is lagging really bad all the time
It's in the log:
[ 24.012] (EE) event6 - Logitech Wireless Mouse PID:407f Mouse: client bug: event processing lagging behind by 42ms, your system is too slow
Either xf86-video-nouveau yields better performance or this is down to the desktop environment/compositor and output resolution(s)
Offline
Either xf86-video-nouveau yields better performance or this is down to the desktop environment/compositor and output resolution(s)
You're right, it was the compositor. After killing picom, everything was pretty smooth.
But unfortunately, the nvidia driver wasn't the issue, Firefox just crashed again.
Offline
Fuck.
Try "maxcpus=1" as kernel parameter (this will limit you to one core and disable SMP)
Offline
Try "maxcpus=1" as kernel parameter (this will limit you to one core and disable SMP)
Nope, unfortunately still crashing.
Just wanted to take the time and say that I really appreciate all the effort y'all are putting into this. Keep the suggestions coming, there *has* to be a solution, otherwise Debian wouldn't run fine.
Edit: It seems that when running Firefox through strace, it takes significantly longer for a crash to happen. Usually when running around 10 YouTube videos at the same time, it takes about 5-10 minutes to crash. But when ran through strace, it took almost an hour. Of course this could be a coincidence, but it might indicate a timing problem, doesn't it?
Last edited by NoisyFlake (2024-01-11 11:28:37)
Offline
Someone on Lemmy (probably) figured it out! I had to disable PBO (Precision Boost Overdrive) and Core Performance Boost in the UEFI/BIOS settings.
I didn't have a single segfault in the last two days, and since I usually wouldn't last more than a few hours without a crash, I think it's okay to assume that it is fixed
Thanks everyone who helped me troubleshoot!
Last edited by NoisyFlake (2024-02-07 16:48:24)
Offline
Let's hope that's been it, though, caveat: Doesn't explain "why this doesn't happen on Debian"…
Offline