You are not logged in.

#1 2025-08-04 09:45:19

Whoracle
Member
Registered: 2010-11-02
Posts: 202

Random full system freeze

Hello everyone

for quite a while now I'm experiencing full system freezes at random. The only common cause I can find is that it's (almost always) when I'm in an active Teams Meeting in Vivaldi. The whole system just locks up - No more audio, no reaction to keypresses at all, no reaction to CTRL+ALT+DEL, only a poweroff via the power button does anything. Can't switch ttys, I just have the last frame on-screen and that's it.

This doesn't happen predictably - sometims I get 3 of these in a day, sometimes two in a row within minutes of each other, sometimes it's days between crashes.

I don't think it's load, because I do not experience any lock ups when stress testing the system or playing a game or whatever.

journalctl -b -1 right after the crash doesn't show anything:

Aug 04 11:13:20 lynxcore sudo[50920]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan
Aug 04 11:13:20 lynxcore sudo[50920]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:20 lynxcore sudo[50920]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:20 lynxcore sudo[50924]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan --device=nvme
Aug 04 11:13:20 lynxcore sudo[50924]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:20 lynxcore sudo[50924]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:20 lynxcore sudo[50927]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:20 lynxcore sudo[50929]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:20 lynxcore sudo[50928]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:20 lynxcore sudo[50930]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:20 lynxcore sudo[50927]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:20 lynxcore sudo[50929]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:20 lynxcore sudo[50928]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:20 lynxcore sudo[50930]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:20 lynxcore sudo[50928]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:20 lynxcore sudo[50927]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:20 lynxcore sudo[50930]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:20 lynxcore sudo[50929]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:30 lynxcore sudo[50940]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan
Aug 04 11:13:30 lynxcore sudo[50940]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:30 lynxcore sudo[50940]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:30 lynxcore sudo[50945]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan --device=nvme
Aug 04 11:13:30 lynxcore sudo[50945]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:30 lynxcore sudo[50945]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:30 lynxcore sudo[50951]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:30 lynxcore sudo[50950]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:30 lynxcore sudo[50949]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:30 lynxcore sudo[50948]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --info --health --attributes --tolerance=verype>
Aug 04 11:13:30 lynxcore sudo[50951]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:30 lynxcore sudo[50950]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:30 lynxcore sudo[50949]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:30 lynxcore sudo[50948]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=959)
Aug 04 11:13:30 lynxcore sudo[50948]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:30 lynxcore sudo[50951]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:30 lynxcore sudo[50949]: pam_unix(sudo:session): session closed for user root
Aug 04 11:13:30 lynxcore sudo[50950]: pam_unix(sudo:session): session closed for user root

Full journal from journalctl -b -1 after the last crash: https://0x0.st/8hsn.txt

I'm logging temps for both my GPU and the rest of my system via telegraf to my homelab, and I don't see a rise in temps over the telegraf interval (10 seconds), so if it's a thermal thing, it happens FAST. See this image of the minute during which the crash occured - metrics collected by telegraf and sent to remote server: https://img.lynxcore.org/250804_114222.png

Specs:
AMD Ryzen 7800X3D
NVidia RTX 3080FE
64GB Corsair Vengeance DDR5@6200MT/s

Most likely useless NeoFetch output:

OS: Arch Linux x86_64 
Kernel: 6.15.9-arch1-1 
Uptime: 24 mins 
Packages: 1929 (pacman), 19 (flatpak) 
Shell: zsh 5.9 
Resolution: 1200x1920, 3440x1440, 2560x1440 
WM: awesome 
Theme: Adwaita-dark [GTK2/3] 
Icons: Obsidian-Mint [GTK2/3] 
Terminal: urxvt 
Terminal Font: Source Code Pro 
CPU: AMD Ryzen 7 7800X3D (16) @ 5.053GHz 
GPU: NVIDIA GeForce RTX 3080 
GPU: AMD ATI 0d:00.0 Raphael 
Memory: 12218MiB / 63427MiB 

amdgpu blacklisted because I don't use the on-board GPU.

Things I didn't yet try:
- REISUB/SysRq
- SSHing into the machine

Because I'm in active meetings when this occurs I usually don't have time to try those two. Will try when it happens again in one of the more useless meetings.

Definitely open programs when this occurs:
- 1 Instance of Vivaldi running the teams meeting

Usually, but not always open programs:
- 1 additional Vivaldi instance
- 1 instance of google chrome
- 1 or 2 instances of rxvt-unicode
- sublime-text
- sometimes I'm connected to the company VPN via OpenVPN, but not always

Any pointers welcome.

Last edited by Whoracle (2025-08-04 10:13:58)

Offline

#2 2025-08-04 15:21:41

xerxes_
Member
Registered: 2018-04-29
Posts: 1,028

Re: Random full system freeze

Do you use Vivaldi from Arch repo or from AUR or from elsewhere? Did you tried run Teams Meeting in other browsers (Chromium, Firefox) from Arch repo?

Whoracle wrote:

Things I didn't yet try:
- REISUB/SysRq
- SSHing into the machine

Definitely try!

Also see: https://wiki.archlinux.org/title/Ryzen

Offline

#3 2025-08-04 15:42:32

Whoracle
Member
Registered: 2010-11-02
Posts: 202

Re: Random full system freeze

Vivaldi from extra. I had the same issue in Chromium and Chrome a few months back before I switched my work profile over to Vivaldi. Never tried Firefox, don't have that installed. Will go over the Ryzen article, but I doubt I'll find the solution in there since the issue is so... specialized? But I'll see.

Offline

#4 2025-08-06 14:20:35

Whoracle
Member
Registered: 2010-11-02
Posts: 202

Re: Random full system freeze

Happened again just now. SSH did not work, and neither did REISUB. Looks like my wireless KB didn't even get through to the dongle anymore.

journalctl -b -1: http://0x0.st/8hWo.txt

Offline

#5 2025-08-07 20:53:49

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 73,632

Re: Random full system freeze

…
Aug 06 16:13:10 lynxcore sudo[124790]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan --device=nvme
Aug 06 16:13:20 lynxcore sudo[124807]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan
Aug 06 16:13:20 lynxcore sudo[124811]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan --device=nvme
Aug 06 16:13:30 lynxcore sudo[124846]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan
Aug 06 16:13:30 lynxcore sudo[124850]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan --device=nvme
Aug 06 16:13:40 lynxcore sudo[124870]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan
Aug 06 16:13:40 lynxcore sudo[124874]: telegraf : PWD=/ ; USER=root ; COMMAND=/usr/bin/smartctl --scan --device=nvme

wtf is that?
https://aur.archlinux.org/packages/telegraf-bin
roll
https://wiki.archlinux.org/title/S.M.A. … self-tests

Disable that, just to reduce noise and interference.

Next to https://wiki.archlinux.org/title/Ryzen#Troubleshooting you might be facing https://wiki.archlinux.org/title/Solid_ … leshooting

If you can somewhat trigger this, make sure to keep a visible terminal emulator running dmesg -w on screen.
Maybe it allows you to catch some last messages from the kernel before things go south.

Offline

#6 2025-08-08 06:15:31

Whoracle
Member
Registered: 2010-11-02
Posts: 202

Re: Random full system freeze

seth wrote:

wtf is that?

SMART plugin for telegraf. Disabled now.

Possible, but I'd expect to see some of the mentioned events in some log or other, no? (Also, I have none of these vendors' drives - mine are all Crucial. Doesn't mean much, granted).

If you can somewhat trigger this, make sure to keep a visible terminal emulator running dmesg -w on screen.
Maybe it allows you to catch some last messages from the kernel before things go south.

100% not reproducible at will, but I've got enough monitors to keep a dmesg open at all times, so I'll try. I think I remember having one open a while back, when I was getting 3 crashes in a row, and it not displaying anything, but worth making sure and documenting the outcome here.

Gonna plug in an additional, wired keyboard for REISUB, too.

Offline

#7 2025-08-08 06:57:38

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 73,632

Re: Random full system freeze

I'd expect to see some of the mentioned events in some log or other, no?

You'd only get the MCE errors on spontaneous reboots, everything else will be lost w/ the power button.

SMART plugin for telegraf. Disabled now.

Just to be clear, the comment was rather on the "interesting" implementation here - I wasn't faulting you for using it (and actually knew where this is kinda coming from)
I was just too tired and baffled for a more measured comment than "wtf" wink

Offline

#8 2025-08-08 07:00:28

Whoracle
Member
Registered: 2010-11-02
Posts: 202

Re: Random full system freeze

No worries. I'm not happy with it either, but it's the only way I have found to get the disk fan speed and temps into influx.

Now we wait another few days for the next crash to happen.

Offline

Board footer

Powered by FluxBB