You are not logged in.

#1 2022-08-05 17:27:25

Brudhu
Member
Registered: 2022-08-04
Posts: 1

GPU has fallen off the bus

Hi!

I bought a System76 Gazelle laptop (with a RTX 3060 6GB GPU) about a month ago. It came with Ubuntu and the first thing I did was installing a larger SSD unit in the second slot and installing Arch Linux in it (I used the archinstall script).

The thing is: around twice a day the UI freezes, the fans go full speed and I have to long press the power button to reset it. It doesn't work trying to switch to one another virtual console with CTRL + ALT + F2. It doesn't matter if the GPU is super loaded or not (I ran some GPU benchmarks from the Arch Wiki (Benchmarking) and it never froze while running them).

I checked the logs and I get the following message:

Aug 04 11:59:41 archlinux kernel: NVRM: A GPU crash dump has been created. If possible, please run
                                  NVRM: nvidia-bug-report.sh as root to collect this data before
                                  NVRM: the NVIDIA kernel module is unloaded.
Aug 04 11:59:41 archlinux kernel: [133B blob data]
Aug 04 11:59:41 archlinux kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Aug 04 11:59:41 archlinux kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Aug 04 11:59:41 archlinux kernel: NVRM: GPU at PCI:0000:01:00: GPU-1e54eaed-9458-efa9-3b9b-d332a9300303

When it happens it says to run nvidia-bug-report.sh as root to collect debug information - which I did and attached to this post.

I'm using my NVIDIA GPU only, not using the Intel one. I set it up by following the Arch Wiki (NVIDIA_Optimus).

System Info:

Operating System: Arch Linux
KDE Plasma Version: 5.25.4
KDE Frameworks Version: 5.96.0
Qt Version: 5.15.5
Kernel Version: 5.18.16-arch1-1 (64-bit)
Graphics Platform: X11
Processors: 20 × 12th Gen Intel® Core™ i7-12700H
Memory: 62,7 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3060 Laptop GPU/PCIe/SSE2
Manufacturer: System76
Product Name: Gazelle
System Version: gaze17-3060-b

Kernel Parameters:
(I know some of them may not make sense, but I'm a desperate Kernel newbie trying to make my laptop stop crashing)

[luvizotto@archlinux ~]$ cat /proc/cmdline
initrd=\intel-ucode.img initrd=\initramfs-linux.img root=PARTUUID=accf2e61-13f2-4dae-b824-4c5856c99913 rw intel_pstate=no_hwp rootfstype=ext4 ibt=off rcutree.rcu_idle_gp_delay=2 intel_idle.max_cstate=1 pcie_aspm=off nvidia-drm.modeset=1

NVidia Info:
Driver Version: 515.65.01

[luvizotto@archlinux ~]$ systemctl list-unit-files | grep nvidia
nvidia-hibernate.service                                                      enabled         disabled
nvidia-persistenced.service                                                   enabled         disabled
nvidia-powerd.service                                                         disabled        disabled
nvidia-resume.service                                                         enabled         disabled
nvidia-suspend.service                                                        disabled        disabled

What I've tried so far:

  • Using linux-lts + nvidia-lts

  • Multiple combinations of kernel parameters (the current config is the best I got.. before that I could get a crash every 2 - 3 hours)

  • Limitting GPU clocks with

    nvidia-smi -lgc 300,1500

Important information:
It works if I use the original Ubuntu that came installed in the laptop, which should mean it's not a hardware problem (?), but I love Arch and really want this to be fixed.

Please, any help debugging this is highly appreciated. Any idea on what could be the problem? Let me know if I missed any important information please.

Attachments:
journalctl log example 1
journalctl log example 2
nvidia-bug-report.sh example 1
nvidia-bug-report.sh example 2

Last edited by Brudhu (2022-08-05 17:31:22)

Offline

Board footer

Powered by FluxBB