You are not logged in.

#1 2019-05-13 22:03:12

Deliverance
Member
Registered: 2018-02-10
Posts: 4

NVIDIA Xid errors - Bad hardware or software issue?

This issue has been plaguing me for a while now. I've Googled everything I know to Google, and sought help from several sources. I thought I resolved my problem, but it's coming back now. I'm getting a bunch of Xid errors in my system log when playing basically any game. Specifically, 12, 13, 31, and 69. Whatever game I'm playing will sputter and freeze, eventually the the point where it's unplayable. The freezes correspond to the Xid errors in the log. If I restart my computer, it runs fine for an arbitrary amount of time until the same errors re-occur and the system is unusable. It's very frustrating.

A few months ago I came across a post suggesting it could be bad RAM. I fired up memtest86 and immediately was flooded with errors. I breathed a sigh of relief that I had solved my problem. I had two 8GB sticks. Swapping them out revealed one running clean in memtest, one with tons of errors. I purchased an additional 32GB of RAM (2x16GB) and installed that. To my dismay, one of these sticks tests bad as well. Bad luck? I don't know. I ruled out bad DIMM slots. The "good" sticks of RAM test fine in the same slots that the "bad" sticks failed in. I shrugged this off and just ran my system with one 8GB stick and one 16GB stick. This tested clean and my problems virtually disappeared for a few months, but now they're back and just as bad, except this time, memtest86 does not return any errors.

Is it possible this is a software issue, or do I have more bad hardware? What troubleshooting can I do to rule this out? The "obvious" next step is suggesting a bad GPU, but I don't want to purchase another GPU just to find out there was nothing wrong with the one I had. Could it be a motherboard or CPU issue? How would I tell? I don't have any known goods to swap with.

Sample of the errors I see:
https://pastebin.com/fTNHmgzB

My system:
- NVIDIA GTX1050 Ti
- Kernel 5.0.13
- NVIDIA driver 418.74

Things I've tried:
- https://docs.nvidia.com/deploy/xid-errors/index.html
- Various games, both native and wine/proton
- unigine-heaven
- nvidia and nvidia-lts drivers
- linux-lts kernel
- Reseating GPU
- Different PCI slots for GPU
- Reseating RAM
- Changing DIMM configurations
- nomodeset
- nvidia modeset, drm
- Swearing at it

Offline

#2 2019-05-14 06:29:23

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: NVIDIA Xid errors - Bad hardware or software issue?

From your past experience I'd say the system is underpowered or overclocked or both.
Try whether going w/ the most conservative BIOS settings (CPU, BUS & RAM clocks) stabilizes it.
Asymmetric RAM setups are usually not a very good idea - also, how long did you run recent memtest cycles? Minutes, hours or days?

Offline

#3 2019-05-14 19:39:51

Deliverance
Member
Registered: 2018-02-10
Posts: 4

Re: NVIDIA Xid errors - Bad hardware or software issue?

seth wrote:

From your past experience I'd say the system is underpowered or overclocked or both.
Try whether going w/ the most conservative BIOS settings (CPU, BUS & RAM clocks) stabilizes it.
Asymmetric RAM setups are usually not a very good idea - also, how long did you run recent memtest cycles? Minutes, hours or days?

So apparently my BIOS has a "helpful" feature where it automatically overclocks my system on demand. I've set its scheduler to "power save" so I'll give it a while and see if there's any effect. As for memtest, on the "bad" sticks, errors appeared instantly. On the clean tests I let it run for a little over an hour.

Last edited by Deliverance (2019-05-14 19:40:13)

Offline

#4 2019-05-14 20:24:52

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: NVIDIA Xid errors - Bad hardware or software issue?

Meaningful stresstests using memtest are measured in days (though 24h should do)  … but first check the behavior on the conservative timings.

Offline

Board footer

Powered by FluxBB