You are not logged in.
Pages: 1
I recently got a new machine and have set up both Arch and Windows in dual boot. I want to preface everything below by saying I have yet to experience this issue in Windows, so I suspect it is not a hardware issue.
I often experience full system freezes such that I have to hold the power button to restart. For now, that computer is essentially unusable on Arch as I keep having to restart every 30 mins to a few hours.
The screen freezes, but the mouse can move for a minute or so and then it disappears. The keyboard light doesn't turn on when I press Caps Lock, and I'm unable to switch to another tty. Under suspicion that it was just a graphical freeze, I have tried switching to another tty, logging in, and running commands (eg writing to a file, shutting down) blindly but that hasn't worked.
Looking at the journal, the freeze seems to happen at th esame time as the line "kernel: sched: RT throttling activated". here is a GitHub repository with journals and X server logs from multiple instances of this issue: rmeno12/freezing-issue
A lot of other freezing issue seem to be related to being OOM, but I've had a system moniotr open during a freeze and the memory usage was not abnormal and no swap was used at all. I am using an Nvidia GPU with proprietary drivers, and there do also seem to be some messages from nvidia in the journal around the freezes, but I'm not sure what to make of them.
I'm not sure what the next steps I should be taking to find the root of this issue are, or if there are known solutions. Would be happy to provide any other information needed to help.
Setup information:
Hardware:
- CPU: AMD Ryzen 7 7700X
- Memory: 32GB DDR5-6000 with 12GB swap
- GPU: Gigabyte GeForce RTX 2080 Windforce OC 8G
Software:
- DE: GNOME on X11
- drivers: currently latest nvidia-lts (530), but has happened on non-lts as well
- kernel: currently latest linux-lts from repos, but has happened on non-lts as well
uname -a:
Linux hoth 6.1.25.1-lts #1 SMP PREEMPT_DYNAMIC Thu, 20 Apr 2023 14:01:39 +0000 x86_64 GNU/LinuxLast edited by rmeno12 (2023-05-06 15:36:36)
Offline
Update:
I've figured out that the whole system doesn't actually freeze, as I can still SSH in. Also, switching to Wayland doesn't solve the problem. Running `top -bin 1` via SSH shows me this:
top - 10:29:42 up 1:54, 3 users, load average: 7.41, 3.56, 1.60
Tasks: 357 total, 3 running, 353 sleeping, 0 stopped, 1 zombie
%Cpu(s): 6.8 us, 6.4 sy, 0.0 ni, 86.5 id, 0.0 wa, 0.4 hi, 0.0 si, 0.0 st
MiB Mem : 31248.7 total, 14733.2 free, 3561.2 used, 12954.3 buff/cache
MiB Swap: 12288.0 total, 12288.0 free, 0.0 used. 27034.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1202 rahul 20 0 24.2g 104104 53476 R 106.7 0.3 3:28.20 Xorg
581 root -51 0 0 0 0 R 100.0 0.0 4:45.00 irq/115-nvidiaOffline
Hello, try reinstalling the GNOME desktop, maybe some components are missing and you can't enter the desktop, if it doesn't work, try installing the nvidia driver, see wiki:https://wiki.archlinuxcn.org/wiki/NVIDIA
Offline
Hello, try reinstalling the GNOME desktop, maybe some components are missing and you can't enter the desktop, if it doesn't work, try installing the nvidia driver, see wiki:https://wiki.archlinuxcn.org/wiki/NVIDIA
Thanks for the suggestion. I am able to enter the desktop fine. It's only after entering the desktop and doing things for a while (anywhere from less than 30 minutes to a few hours) that I have this freezing issue. I have reinstalled both gnome and the nvidia drivers, but that hasn't helped.
Offline
Arch and Windows in dual boot
3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
I've figured out that the whole system doesn't actually freeze, as I can still SSH in.
Splendid. Please post your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.stAlso, switching to Wayland doesn't solve the problem.
And also post your Xorg log, https://wiki.archlinux.org/title/Xorg#General
Offline
kernel: sched: RT throttling activated
It sounds like some Realtime processes are eating your cycles and memory. From the your logs there seems to be a lot of issues. This particularly catches my attention:
Apr 21 08:38:57 hoth /usr/lib/gdm-x-session[1197]: (II) XINPUT: Adding extended input device "Soundcore Life Q30 (AVRCP)" (type: KEYBOARD, id 16)
Apr 21 08:38:57 hoth /usr/lib/gdm-x-session[1197]: (II) event26 - Soundcore Life Q30 (AVRCP): is tagged by udev as: Keyboard
Apr 21 08:38:57 hoth /usr/lib/gdm-x-session[1197]: (II) event26 - Soundcore Life Q30 (AVRCP): device is a keyboardOn a brief search on the internet, Soundcore Life Q30 is a headphone. Now this might be the issue since playing sound is realtime processing but being registered in udev as keyboard which is event driven could be the conflict that drives this throttling.
You can try and configure udev to recognize your headphone properly but it seems you might have other issues.
I would suggest once again reinstalling bare minimum gnome without third party add-ons and removing all gnome configuration files in $HOME. Then try the system with bare minimum of device peripherals attached to see if you still get the issue. What soundcard are you using?
If that works then you can continue to configure udev to recognize your headphone properly.
Offline
Fast boot has been disabled, should have mentioned that earlier.
Complete journal and X logs are here: https://github.com/rmeno12/freezing-issue
The most recent ones are under the 3/ folder.
Offline
On a brief search on the internet, Soundcore Life Q30 is a headphone. Now this might be the issue since playing sound is realtime processing but being registered in udev as keyboard which is event driven could be the conflict that drives this throttling.
It certainly is odd that it is being registered as a keyboard, but I have used these headphones with multiple other systems with a similar setup and it registers as a keyboard there as well.
Offline
xorg logs in /3 are fromm may 2nd, you#re probably running on wayland since then?
[ 6.379] (--) PCI:*(1@0:0:0) 10de:1e87:1458:37a7 rev 161, Mem @ 0xfb000000/16777216, 0xfcc0000000/268435456, 0xfcd0000000/33554432, I/O @ 0x0000f000/128, BIOS @ 0x????????/524288
[ 6.379] (--) PCI: (16@0:0:0) 1002:164e:1462:7d78 rev 195, Mem @ 0xfce0000000/268435456, 0xfcf0000000/2097152, 0xfca00000/524288, I/O @ 0x0000d000/256But it's an hybrid graphics system, w/
May 06 13:35:39 hoth (udev-worker)[431]: Error running install command '/usr/bin/false' for module amdgpu: retcode 1 amdgpu blacklisted.
Can you disable the AMD chip in the firmware (BIOS)?
There're three ssh logins in the journal, when did the system pass out by your perception?
May 06 10:27:36 hoth sshd[59902]: pam_unix(sshd:session): session opened for user rahul(uid=1000) by (uid=0)
May 06 10:28:38 hoth sshd[60029]: pam_unix(sshd:session): session opened for user rahul(uid=1000) by (uid=0)
May 06 10:48:06 hoth sshd[62118]: pam_unix(sshd:session): session opened for user rahul(uid=1000) by (uid=0)
Offline
xorg logs in /3 are fromm may 2nd, you#re probably running on wayland since then?
That is odd that those logs from /var/log are out of date. I switched back to X from Wayland today as I there were some graphical issues. I uploaded the log from .local/share/xorg which is from May 6. I'm not sure why the logging location changed so recently, as the link you provided earlier suggests that it should have been logging to .local/share for quite a while now.
amdgpu blacklisted.
Can you disable the AMD chip in the firmware (BIOS)?
I haven't seen an option to disable the iGPU in the BIOS, but I may have missed something. Will look through it again. I originally blacklisted amdgpu because it caused black screen issues.
There're three ssh logins in the journal, when did the system pass out by your perception?
The first SSH you listed was right after the freeze.
Offline
They're actually in the journal (as that's where GDM puts them) - sorry, forgot to clean up after reading on.
May 06 10:25:26 hoth google-chrome.desktop[59578]: libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
May 06 10:25:44 hoth google-chrome.desktop[30273]: [30314:30314:0506/102544.347855:ERROR:shared_image_manager.cc(217)] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.
May 06 10:25:44 hoth google-chrome.desktop[30273]: [30314:30314:0506/102544.348062:ERROR:shared_image_manager.cc(217)] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.
May 06 10:26:09 hoth google-chrome.desktop[30273]: [59523:28:0506/102609.584985:ERROR:stun_port.cc(119)] Binding request timed out from 100.115.209.x:42654 (tailscale0)
May 06 10:26:51 hoth kernel: NVRM: GPU at PCI:0000:01:00: GPU-2eeafbaa-8e3c-dd88-f44c-db398a5826de
May 06 10:26:51 hoth kernel: NVRM: Xid (PCI:0000:01:00): 8, pid=30314, name=chrome, Channel 00000010After trying to play a video in chrome, nvidia throws XID8, https://docs.nvidia.com/deploy/xid-erro … ml#topic_4
Is this a reliable trigger for the problem?
Offline
They're actually in the journal (as that's where GDM puts them) - sorry, forgot to clean up after reading on.
Ah yea, I did see that also in the link you sent.
After trying to play a video in chrome, nvidia throws XID8, https://docs.nvidia.com/deploy/xid-erro … ml#topic_4
Is this a reliable trigger for the problem?
No. I am able to play videos without the freezing issue and the freezing issue occurs when not playing videos. However, it has happened that the freeze occurred when loading a video (from YouTube). In this latest instance, I was opening a Google Sheets page and already had a video open and playing.
Last edited by rmeno12 (2023-05-06 19:36:15)
Offline
The XID isn't in the other logs, but the RT throtteling indeed is in each.
So possibly red herring (it's just 14s before the RT throtteling kicks in in this instance)
Just a hunch, but try to reproduce this w/o tailscale.
Offline
I believe the 0/ logs were recorded before tailscale was installed, but I can try to reproduce again if you think that would give any more information.
EDIT:
Confirmed. According to my pacman logs, tailscale was installed on 2023-04-11 and 0/ was recorded on 2023-04-10.
[2023-04-11T22:06:06-0500] [PACMAN] Running 'pacman -S --config /etc/pacman.conf -- community/tailscale'Last edited by rmeno12 (2023-05-06 19:49:37)
Offline
They also happen to have
Apr 10 13:51:21 hoth kernel: NVRM: GPU at PCI:0000:01:00: GPU-2eeafbaa-8e3c-dd88-f44c-db398a5826de
Apr 10 13:51:21 hoth kernel: NVRM: Xid (PCI:0000:01:00): 8, pid=3020, name=chrome, Channel 00000008
Apr 10 13:51:32 hoth kernel: sched: RT throttling activated
Apr 10 13:51:41 hoth google-chrome.desktop[2981]: [3162:1:0410/135141.389153:ERROR:command_buffer_proxy_impl.cc(325)] GPU state invalid after WaitForGetOffsetInRange.Why is there "acpi_enforce_resources=lax"?
You could try the 470xx drivers, https://wiki.archlinux.org/title/NVIDIA#Installation (you'll need dkms and nb. the ibt=off requirement)
Offline
I had set "acpi_enforce_resources=lax" in order to try to get lm_sensors to detect my case fans, but that didn't work. Unrelated to this issue, afaik, and I forgot to remove it.
I can try the 470xx drivers.
(you'll need dkms and nb. the ibt=off requirement)
Sorry, what does "nb." mean in this context? Also, according to the wiki the "ibt=off" requirement seems to apply to newer intel CPUS. I have an AMD CPU, so would this still apply to me?
Offline
"nota bene"
No, shouldn't apply - I just threw that in from a textblock. Sorry.
Offline
Froze again with the 470xx drivers. Have uploaded it as 4/ to the same repo. Again had a video playing in Chrome.
Offline
Since you've two GPUs, try to rule ot nvidia being the cause.
Re-activate the AMD GPU, disable the nvidia one (in doubt blacklist nvidia next to nouveau) and see whether the problem remains.
If so, does windows run on the AMD or the nvidia GPU (or in hybrid mode)?
Offline
I will try to use just the AMD GPU, but I will be away for a few days so it will have to wait a little. Will update once I've tried.
If so, does windows run on the AMD or the nvidia GPU (or in hybrid mode)?
Honestly unsure how Windows works internally, but only the Nvidia GPU has anything plugged into it directly and it shows usage in Windows Task Manager.
EDIT:
To clarify, I never explicitly disabled the AMD GPU in Windows.
Last edited by rmeno12 (2023-05-07 13:43:32)
Offline
You could check https://en.wikipedia.org/wiki/GPU-Z
The nvidia GPU has to be active to serve as VGA hop - doesn't mean it's necessarily the rendering device.
Offline
Pages: 1