You are not logged in.
I have an old NVIDIA Geforce GTX 1060 3GiB graphic card. About a few months back, I started seeing this problem. The screen would turn off, completely blank like the computer has shutdown. Except for the fact that computer is still running and I can even ssh into it.
But the screen and keyboard/mouse remain unresponsive.
After a hard reset, I ran
journalctl -xb -1
to trace the problem and those line would appear:
Jun 17 04:03:41 archlinux kernel: NVRM: GPU at PCI:0000:08:00: GPU-c0b31bf4-5aaf-2c80-0b87-b4218586ab36
Jun 17 04:03:41 archlinux kernel: NVRM: GPU Board Serial Number:
Jun 17 04:03:41 archlinux kernel: NVRM: Xid (PCI:0000:08:00): 79, pid=1187, GPU has fallen off the bus.
Jun 17 04:03:41 archlinux kernel: NVRM: GPU 0000:08:00.0: GPU has fallen off the bus.
Jun 17 04:03:41 archlinux kernel: NVRM: GPU 0000:08:00.0: GPU is on Board .
Jun 17 04:03:41 archlinux kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Has been seeing this problem for a while now. I did some googling and most of them lead to hardware dying issues. However, I still may want to know is there anything else to try? Buying a replace graphic card during this global shortage is impossible for me.
Last edited by pntruongan (2021-06-22 10:48:50)
Offline
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Did you do this?
Offline
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.Did you do this?
Oh yes. The data nvidia-bug-report.sh collected is quite large so I put in on the web here:
https://truongan.name.vn/wp-content/upl … report.txt
This is not done immediately after the crash, but a few boot later though. Is it crucial that I must run nvidia-bug-report right after the crash?
Last edited by pntruongan (2021-06-18 00:39:28)
Offline
Is it crucial that I must run nvidia-bug-report right after the crash?
Yes.
Could just be https://bbs.archlinux.org/viewtopic.php?id=265563 - does it correlate w/ the update?
Offline
Seem unlikely. My problem has been going on for a few months, across various nvidia driver and kernel update. I just keep ignoring it at first becuase it so random. Just this week the frequency that GPU fallen off the bus has risen to 2-3 times a day, unberable for me.
Offline
I see.
Try to pass "rcutree.rcu_idle_gp_delay=1 pcie_aspm=off" to the kernel.
Offline
It could be a simple hardware problem, In the past it happened to me with an old 9800.
The PCI-E Card was literally fallen off the bus; it was not properly mounted into the slot, check it.
Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !
Offline
It could be a simple hardware problem, In the past it happened to me with an old 9800.
The PCI-E Card was literally fallen off the bus; it was not properly mounted into the slot, check it.
I have reseated the card several times before. Took it out, clean the slot and the card with isopropyl alcohol, wait for the alcohol to dry and put it back in. Several times in the past few months, still this problem keep showing up
Offline
I see.
Try to pass "rcutree.rcu_idle_gp_delay=1 pcie_aspm=off" to the kernel.
I will try this out on Sunday. Since I need this machine to work from home (COVID lock down going on here), I've been switching to a spare, much older GTX 750. The GTX750 seem to work fine in the last two days (which make my hope for the 1060 even slimmer).
It's a pain to watching GTX750 try to cope with a 4K monitor but I really need the machine for Teams meeting.
Offline
I surrender to the god of bad luck. It's must be a hardware issue then, Then GTX 750 has run stable for 2 days in a row, no crashing.
Offline