You are not logged in.

#1 2016-02-16 09:44:28

SickSight
Member
Registered: 2016-02-16
Posts: 4

[SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

Hello everyone,

I hope someone can help me or has possibly can solve the same problem. Currently, I could play a game (no matter what) on Steam only 2-10 minutes. After that, the screen turns off. Sound and the rest seems to continue. It only helps to manually turn off the computer.

journalctl outputs...

NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Feb 16 00:42:26 arch kernel: NVRM: A GPU crash dump has been created. If possible, please run
                             NVRM: nvidia-bug-report.sh as root to collect this data before
                             NVRM: the NVIDIA kernel module is unloaded.
Feb 16 00:42:26 arch kernel: NVRM: GPU at PCI:0000:01:00: GPU-0fa44107-3322-fa72-9b9d-cedc07c469db
Feb 16 00:42:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 48, An uncorrectable double bit error (DBE) has been detected on GPU in the L2 cache at cache 2, slice 0.
Feb 16 00:42:51 arch kernel: nvidia-modeset: ERROR: GPU:0: Failed to query EVO channel state: 0x0000917c:0:0:0x0000000f
Feb 16 00:42:51 arch kernel: nvidia-modeset: ERROR: GPU:0: Failed to query EVO channel state: 0x0000917c:0:0:0x0000000f
Feb 16 00:42:51 arch kernel: nvidia-modeset: ERROR: GPU:0: Failed to query EVO channel state: 0x0000917c:0:0:0x0000000f
...

Installed nvidia...

pacman -Ss nvidia | grep Installiert
extra/libvdpau 1.1.1-2 [Installiert]
extra/nvidia 361.28-1 [Installiert]
extra/nvidia-libgl 361.28-5 [Installiert]
extra/nvidia-utils 361.28-5 [Installiert]
multilib/lib32-nvidia-libgl 361.28-4 [Installiert]
multilib/lib32-nvidia-utils 361.28-4 [Installiert]

I tried nvidia persistence activate, it did no good.

Would be glad if there someone had an eye for! thx

--- EDIT:

... and the steam console output

http://pastebin.com/UnCQC56E

Last edited by SickSight (2016-02-16 22:16:41)

Offline

#2 2016-02-16 11:45:37

WorMzy
Administrator
From: Scotland
Registered: 2010-06-16
Posts: 13,570
Website

Re: [SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

Try reseating the graphics card.

Mod note: Moving to Kernel+Hardware.


Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD

Making lemonade from lemons since 2015.

Offline

#3 2016-02-16 11:49:30

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: [SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

Feb 16 00:42:26 arch kernel: NVRM: Xid (PCI:0000:01:00): 48, An uncorrectable double bit error (DBE) has been detected on GPU in the L2 cache at cache 2, slice 0.

Overheating? Bad hardware? Do you always get this message?

Offline

#4 2016-02-16 12:05:29

SickSight
Member
Registered: 2016-02-16
Posts: 4

Re: [SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

I think I've read all topics of the last 5 years ^^ I'll get me another Nvidia card on the day to exclude a hardware defect.

No overheating, Only problems with Steam (Video playback, Desktop, ... works finde).

Offline

#5 2016-02-16 16:48:33

nstgc
Member
Registered: 2014-03-17
Posts: 393

Re: [SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

I agree with Mich. It looks like some kind of hardware error, specifically a memory error that the ECC couldn't correct. Changes are the reason it doesn't seem to show up outside of games is that the card isn't used enough to trigger it. If you have 100 times the memory operations per second in games, then it's only natural that it should pop up much more frequently.

Offline

#6 2016-02-16 19:32:13

SickSight
Member
Registered: 2016-02-16
Posts: 4

Re: [SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

Ok... removed my GTX 660, cleaned the card and back - same issue. Before I install my old 9800 GTX+ (and other driver) I test with unigine-heaven (AUR) to check out, that it is not up to steam....... By the way, what tool do you use for benchmarks? Is there a way to test/reproduce a hardware defect?

Offline

#7 2016-02-16 22:14:49

SickSight
Member
Registered: 2016-02-16
Posts: 4

Re: [SOLVED] Nvidia GPU at 0000:01:00.0 has fallen off the bus.

Did I say: No overheating...? At 100 degrees celsius I have stoped benchmarking. big_smile big_smile big_smile

Issue: The connector for the graphics card fan had a loose connection. Bad hardware AND overheating. I fixed it and everything runs at 60 degrees at high load clean!!!! Sometimes you are just blind. -.-

Best regards

Offline

Board footer

Powered by FluxBB