You are not logged in.

#1 2025-06-29 02:05:00

techmanwalker
Member
Registered: 2025-06-29
Posts: 34

[SOLVED] NVIDIA GPU completely disappears out of nowhere

Hi, this is my first post here.

Either after resuming from a long suspend, or after some time of playing and when I close and try to reopen the game, the game will never show up and when I check out in Steam, it shows that the game is "already running" and it gets completely stuck there, and the NVIDIA GPU does not even appear anymore on nvtop, as if it was somehow "disconnected", therefore leaving me completely unable to play but my main desktop works as the iGPU is still alive. Logging out and in doesn't fix the issue, I have to fully shutdown and boot up again to get NVIDIA graphics working again (rebooting does not work as it just freezes the laptop and forces me to do a forced shutdown anyway). At this point, it's quite frustrating due to how so often this happens.

dmesg shows these two entries repeated over and over:

[52383.059448] pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
[52383.059485] nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
[52383.059488] nvidia 0000:01:00.0:   device [10de:25a2] error status/mask=00000040/0000a000
[52383.059490] nvidia 0000:01:00.0:    [ 6] BadTLP  
[152697.045014] nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
[160390.019798] nvidia 0000:01:00.0: can't suspend (nv_pmops_runtime_suspend [nvidia] returned -5)
[160402.389588] NVRM: Error in service of callback

journalctl logged this:

nvidia 0000:01:00.0: can't suspend (nv_pmops_runtime_suspend [nvidia] returned -5)

I've already followed what this guy told me to do on Reddit but still no luck.

I can't find any reliable way to reproduce this bug.

Any fixes? Thank you in advance.
PD: I'm on a laptop, integrated Radeon 680M Graphics (iGPU) and RTX 3050 (dGPU).

Picture: Fresh boot, normal condition
Picture: NVIDIA GPU completely gone out of existence. Only way to bring it back is to fully shutdown and boot up again (no, restarting does not work in this state as it just freezes the laptop)

Last edited by techmanwalker (2025-07-11 06:28:03)

Offline

#2 2025-06-29 07:30:48

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,448

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

Please replace the oversized images w/ links (the board has a 250²px² limit to keep our mouse wheels cool) and post your complete system journal for a boot after losing the device, eg.

sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st

for the previous (-1) boot
The AER isn't good but not necessarily fatal and nv_pmops_runtime_suspend probably only indicates the loss that has happened before.
Does the GPU still show up in "lspci -k"?

Online

#3 2025-06-29 09:30:34

techmanwalker
Member
Registered: 2025-06-29
Posts: 34

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

http://0x0.st/8UAc.txt
Checking the end of the file quickly shows a lot of NVIDIA-related kernel errors, game coredumps...

As for the lspci -k thing, I will issue that command when I hit the issue again and if the logs I've already provided aren't enough

Offline

#4 2025-06-29 09:57:23

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,448

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

jun 27 20:45:23 malasdecisiones kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
jun 27 20:46:02 malasdecisiones kernel: NVRM: Error in service of callback 

https://wiki.archlinux.org/title/NVIDIA … er_suspend

jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: pciehp: Slot(0): Link Down
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: pciehp: Slot(0): Card not present
jun 28 23:30:49 malasdecisiones kernel: NVRM: Attempting to remove device 0000:01:00.0 with non-zero usage count!
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU at PCI:0000:01:00: GPU-f5bf4a7e-a547-233a-8aec-19e2319d484f
jun 28 23:30:49 malasdecisiones kernel: NVRM: Xid (PCI:0000:01:00): 79, pid=359, name=nv_queue, GPU has fallen off the bus.
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000.
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU0 RPC history (CPU -> GSP):
jun 28 23:30:49 malasdecisiones kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
jun 28 23:30:49 malasdecisiones kernel: NVRM:      0    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00d5c1ebc 0x0000000000000000          y
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -1    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00d0dfeb0 0x000638b00d0dfffb    331us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -2    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00cbfde70 0x000638b00cbfdfb6    326us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -3    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00c71be3f 0x000638b00c71bfaf    368us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -4    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00c239e7c 0x000638b00c239fd9    349us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -5    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00bd57e69 0x000638b00bd57fca    353us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -6    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00b875e9b 0x000638b00b875fe4    329us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -7    76   GSP_RM_CONTROL        0x000000002080a7d7 0x0000000000000002 0x000638b00b393e72 0x000638b00b39408a    536us  
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
jun 28 23:30:49 malasdecisiones kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
jun 28 23:30:49 malasdecisiones kernel: NVRM:      0    4108 UCODE_LIBOS_PRINT     0x0000000000000000 0x0000000000000000 0x000638b00a968855 0x000638b00a968856      1us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -1    4108 UCODE_LIBOS_PRINT     0x0000000000000000 0x0000000000000000 0x000638b00a968714 0x000638b00a96871c      8us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -2    4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000638b00a967442 0x000638b00a967446      4us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -3    4098 GSP_RUN_CPU_SEQUENCER 0x000000000000061c 0x0000000000003fe2 0x000638b00a95ec48 0x000638b00a95fedf   4759us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -4    4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x000638b00a37ec8e 0x000638b00a37ec91      3us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -5    4108 UCODE_LIBOS_PRINT     0x0000000000000000 0x0000000000000000 0x000638b009e7744e 0x000638b009e7744f      1us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -6    4108 UCODE_LIBOS_PRINT     0x0000000000000000 0x0000000000000000 0x000638b009e77308 0x000638b009e77314     12us  
jun 28 23:30:49 malasdecisiones kernel: NVRM:     -7    4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000638b009e7607d 0x000638b009e76081      4us  
jun 28 23:30:49 malasdecisiones kernel: CPU: 9 UID: 0 PID: 359 Comm: nv_queue Tainted: P        W  OE       6.15.3-zen1-1-zen #1 PREEMPT(full)  b8f045b7443e3a8abd7ec6d3d7b42085ea2e9c00
jun 28 23:30:49 malasdecisiones kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
jun 28 23:30:49 malasdecisiones kernel: Hardware name: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.310 06/17/2022
jun 28 23:30:49 malasdecisiones kernel: Call Trace:
jun 28 23:30:49 malasdecisiones kernel:  <TASK>
jun 28 23:30:49 malasdecisiones kernel:  dump_stack_lvl+0x5d/0x80
jun 28 23:30:49 malasdecisiones kernel:  _nv013768rm+0x378/0x720 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv013678rm+0xe2/0x880 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv053604rm+0x594/0x770 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv053136rm+0xd4/0x1f0 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv052821rm+0xd6/0x1b0 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv055004rm+0x3f5/0x500 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv015696rm+0x469/0x680 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv052961rm+0x29/0x30 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  ? _nv055007rm+0x60/0x60 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv000719rm+0x60/0xa1 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jun 28 23:30:49 malasdecisiones kernel:  _nv058185rm+0x3e/0x159 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  ? _nv058032rm+0x110/0x110 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv016124rm+0x2c/0x50 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv058240rm+0x1a/0x40 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _nv016125rm+0x20/0x50 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  rm_execute_work_item+0x141/0x1f0 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  os_execute_work_item+0x68/0x90 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  _main_loop+0x93/0x150 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jun 28 23:30:49 malasdecisiones kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel:  kthread+0xfc/0x240
jun 28 23:30:49 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jun 28 23:30:49 malasdecisiones kernel:  ret_from_fork+0x34/0x50
jun 28 23:30:49 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jun 28 23:30:49 malasdecisiones kernel:  ret_from_fork_asm+0x1a/0x30
jun 28 23:30:49 malasdecisiones kernel:  </TASK>
jun 28 23:30:49 malasdecisiones kernel: NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)

https://wiki.archlinux.org/title/NVIDIA … P_firmware

Online

#5 2025-06-29 10:19:00

techmanwalker
Member
Registered: 2025-06-29
Posts: 34

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

Ok, I've hit this issue right now so I issued lspci -k and it shows up there:

01:00.0 3D controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device 107d
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
NVreg_PreserveVideoMemoryAllocations=1

was already on. I'll disable the GSP firmware as you said and reboot the laptop, and reply here if the GPU still dissappears after a while. Thank you very much for your help :]

Offline

#6 2025-06-29 10:20:21

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,448

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

The parameter is set but the relevant services don't seem to be enabled?

Online

#7 2025-06-29 10:31:21

techmanwalker
Member
Registered: 2025-06-29
Posts: 34

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

I was just about to miss those- enabled. Should I enable nvidia-powerd as well?

Offline

#8 2025-06-29 10:32:53

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,448

Re: [SOLVED] NVIDIA GPU completely disappears out of nowhere

https://wiki.archlinux.org/title/CPU_fr … dia-powerd - but it's not relevant in this context (from what can be told from the journal - so far wink)

Online

Board footer

Powered by FluxBB