You are not logged in.
Pages: 1
Hello,
my pc started to freeze about a week ago - the monitors freeze thus forcing me to reboot.
I am using the following:
kernel 6.8.2-zen2-1-zen
nvidia-dkms 550.67-1
journal has this: kernel: NVRM: Xid (PCI:0000:01:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000003 0000017c 00000000 00000003 00000000
checked here: https://docs.nvidia.com/deploy/xid-errors/index.html
56: Display Engine error
This is super frustrating - does anyone know how we can resolve this?
Offline
XID 56 is hardware or driver, so either your GPU is underpowered, overheating, mis-seated or moving to a serverfarm upstate.
Or it's a driver issue.
Please don't post errors out of context
sudo journalctl -b | curl -F 'file=@-' 0x0.st
and try the behavior w/ the LTS kernel and possibly an older version of nvidia-dkms (and utils) from the ALA (535xx or 545xx won't build w/ newer kernels unless patched to work around GPL issues)
Online
here's more of the journal if you want to see.
power/temps look good - i can reseat it to make sure but i'm leaning towards a driver issue
Last edited by insidesources (2024-03-31 01:32:07)
Offline
That segment still starts at the XID56 and therefore tells nothing about the conditions leading up to it. Or the general setup.
As random pickup, there's
vmnetBridge[1962]: RTM_NEWLINK: name:wlan0 index:3 flags:0x00001002
Mar 30 02:13:54 q4pt99x vmnet-natd
so any kind of VM passthrough efforts might be a factor here, but that's spculation based on virtually no information.
Online
That was the start of the log when i checked, i didn't have my PC on for very long before that. I can capture another one when it freezes again.
Is there anyone else having this issue?
Offline
sudo journalctl -b
is gonna show you the log for the entire boot and you can go back in time by making it "-b -1", "-b -2" …
Since this is plausibly a hardware problem, asking for echo is not a promising strategy.
Next to showing us the journal of the incident you could try to downgrade the driver (but mind you that older versions like 535xx, 545xx) will only build against the LTS kernel - you'll still likely figure whether it's a HW or SW issue.
Online
it hasn't crashed in awhile - and now it crashed again at 5:51pm - here's the log - before you say there's no info, the lines above what i pasted are just repeating the same thing about failed to lookup and polkit etc
Last edited by insidesources (2024-04-07 22:27:43)
Offline
The system journal is supposed to start w/ the DMI messages, kernel, initramfs…
We're still at
nothing about the conditions leading up to it. Or the general setup.
As random pickup, there'svmnetBridge[1962]: RTM_NEWLINK: name:wlan0 index:3 flags:0x00001002 Mar 30 02:13:54 q4pt99x vmnet-natd
so any kind of VM passthrough efforts might be a factor here, but that's spculation based on virtually no information.
The XID56 errors in that segment start immediately after
Apr 07 17:51:03 q4pt99x vmnetBridge[1963]: RTM_NEWLINK: name:wlan0 index:3 flags:0x00001002
Apr 07 17:51:03 q4pt99x vmnet-natd[2037]: RTM_NEWLINK: name:wlan0 index:3 flags:0x00001002
Apr 07 17:51:03 q4pt99x vmnetBridge[1963]: RTM_NEWLINK: name:wlan0 index:3 flags:0x00001002
Apr 07 17:51:03 q4pt99x NetworkManager[1909]: <info> [1712526663.4848] device (wlan0): set-hw-addr: set MAC address to 92:AF:A9:99:0D:42 (scanning)
Apr 07 17:51:03 q4pt99x vmnet-natd[2037]: RTM_NEWLINK: name:wlan0 index:3 flags:0x00001002
confirming that pattern, but w/ the partial logs it's impossible to say whether that's a fluke and you had frequent vmnet-natd messages in a long running VM before w/o causing any nvidia-related issues or whether you're trying to do sth. special w/ the nvidia GPU.
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st # for the entire journal of the previous boot
Online
I tried to get the entire boot this time, it is 800mb so sorry for the size.
I crashed at 3:26PM during a work meeting.
I have noticed one thing, most of the crashes i have are only if my browser is focused and i'm watching a video *most* of the time, and sometimes i'm not even at my desk.
Last edited by insidesources (2024-04-19 06:11:45)
Offline
800MB??
I suppose most of that is rtkit?
https://www.reddit.com/r/archlinux/comm … trollably/
https://github.com/heftig/rtkit/issues/ … 1321085246
You can treat the journal w/
grep -vE 'rtkit-daemon.*: (Failed to look up client|Warning)'
my pc started to freeze about a week ago
What happened at that point? Did you upgrade to the 550xx nvidia drivers?
Online
800MB??
I suppose most of that is rtkit?
https://www.reddit.com/r/archlinux/comm … trollably/
https://github.com/heftig/rtkit/issues/ … 1321085246You can treat the journal w/
grep -vE 'rtkit-daemon.*: (Failed to look up client|Warning)'
thank you - i do have to lessen the logging on that, it's ridiculous
my pc started to freeze about a week ago
What happened at that point? Did you upgrade to the 550xx nvidia drivers?
I update my pc almost all of the time, i have to see when the last driver update was before that or around that time. And 1 update back from that.
This issue happened a long time ago(over a year) then eventually went away, and now it's back again.
Offline
Report back w/ the insights from the pacman log and a hopefully journal-sized journal
Online
i cleaned it up a bit, hope this helps
Offline
xdg-desktop-portal crashes a lot in glib2, https://bbs.archlinux.org/viewtopic.php … 3#p2164563
vmnet-natd seems a fluke, shows only up 6h before the XID56/32 burst and a minute afterwards.
The nvidia failure is rather isolated w/ > 1h gap before in the journal.
a) does this only happen w/ the zen kernel?
b) does this only happen when the GPU is idle (yo're not doing anything and there's also no GPGPU job (ollama) in the background?
c) do you use https://archlinux.org/packages/extra/x8 … lama-cuda/ ? Is it running while this happens? Temperature issue?
e) try to disable
pcie_aspm=off nvidia.NVreg_DynamicPowerManagement=0x00
https://wiki.archlinux.org/title/Kernel_parameters
https://download.nvidia.com/XFree86/Lin … ement.html
Online
a) does this only happen w/ the zen kernel?
only one i've used recently - but the first time this happened i was using the regular arch kernel
b) does this only happen when the GPU is idle (yo're not doing anything and there's also no GPGPU job (ollama) in the background?
most of the time yes - only when i'm web browsing mostly or if i go away from my pc for awhile
c) do you use https://archlinux.org/packages/extra/x8 … lama-cuda/ ? Is it running while this happens? Temperature issue?
I don't use, and my temps are good - i have a full watercooling loop. 80-110 F depending on what i'm doing
e) try to disable
pcie_aspm=off nvidia.NVreg_DynamicPowerManagement=0x00
https://wiki.archlinux.org/title/Kernel_parameters
https://download.nvidia.com/XFree86/Lin … ement.htmlthank you
Offline
another crash tonight, happened in the last 2 hours, i left my browser selected and went AFK - came back to black monitors and pc wouldnt wake them up, i SSH'd into my pc and to check it was still awake and then rebooted
Offline
Apr 16 20:12:24 q4pt99x zerotier-one[2140]: connect: Connection timed out
Apr 16 20:12:30 q4pt99x zerotier-one[2140]: connect: Connection timed out
Apr 16 20:12:34 q4pt99x kernel: NVRM: GPU at PCI:0000:01:00: GPU-c7ae409a-8791-b846-1e8f-b101104d3ed1
Apr 16 20:12:34 q4pt99x kernel: NVRM: Xid (PCI:0000:01:00): 56, pid='<unknown>', name=<unknown>, CMDre 00000003 00000000 00000000 00000001 00000000
Apr 16 20:12:34 q4pt99x kernel: NVRM: Xid (PCI:0000:01:00): 32, pid='<unknown>', name=<unknown>, Channel ID 00000003 intr 00004000
Apr 16 20:12:34 q4pt99x kernel: NVRM: Xid (PCI:0000:01:00): 32, pid='<unknown>', name=<unknown>, Channel ID 00000003 intr 00004000
Apr 16 20:12:34 q4pt99x kernel: NVRM: Xid (PCI:0000:01:00): 45, pid='<unknown>', name=<unknown>, Ch 00000000
…
and all downhill from there.
This is already w/ "pcie_aspm=off nvidia.NVreg_DynamicPowerManagement=0x00"?
(The journal starts late)
https://www.nvidia.com/en-us/geforce/fo … ery-in-ev/ - they first speculate it's maybe an unsupported freesync output and the tail has a stable sytem on 528.49 (this is all on windows)
If you want to try to downgrade to eg. the 535xx series from teh ALA you'll have to use the LTS kernel as the older drivers are incompatible w/ GPL restrictions in the latests kernels.
(You'd use the nvidia-dkms and 535xx-version matching nvidia-utils, https://wiki.archlinux.org/title/Arch_Linux_Archive )
Online
This is already w/ "pcie_aspm=off nvidia.NVreg_DynamicPowerManagement=0x00"?
(The journal starts late)I just set it right now, so let's see what happens.
https://www.nvidia.com/en-us/geforce/fo … ery-in-ev/ - they first speculate it's maybe an unsupported freesync output and the tail has a stable sytem on 528.49 (this is all on windows)
Does this apply if i have g sync monitors?
If you want to try to downgrade to eg. the 535xx series from teh ALA you'll have to use the LTS kernel as the older drivers are incompatible w/ GPL restrictions in the latests kernels.
(You'd use the nvidia-dkms and 535xx-version matching nvidia-utils, https://wiki.archlinux.org/title/Arch_Linux_Archive )Thank you. Maybe give this kernel para some time and i will do that eventually.
Offline
Does this apply if i have g sync monitors?
G-Sync would be certified for nvidia, but oc. could be falsely been certified.
You could therefore also try to trigger this w/ G-Sync disabled (in monitor and driver)
On aformal note, please try to not wrap your own reply into quote tags.
Online
seth wrote:This is already w/ "pcie_aspm=off nvidia.NVreg_DynamicPowerManagement=0x00"?
(The journal starts late)I just set it right now, so let's see what happens.
As soon as i opened this tab to view your reply, bam crashed. 10:53am - waited a few minutes and manually rebooted my pc at 10:57am
got a 53 then 32 32 - https://pastebin.com/B0xZDXKQ with the kernel paras above ^
Offline
In order to rule out a hardware issue you should test the behavior either w/ https://archlinux.org/packages/extra/x8 … idia-open/ or (possibly better) w/ the LTS kernel and possibly an older version of nvidia-dkms (and utils) from the ALA (535xx or 545xx won't build w/ newer kernels unless patched to work around GPL issues)
Online
Pages: 1