You are not logged in.
Hi, this is my first post here.
Either after resuming from a long suspend, or after some time of playing and when I close and try to reopen the game, the game will never show up and when I check out in Steam, it shows that the game is "already running" and it gets completely stuck there, and the NVIDIA GPU does not even appear anymore on nvtop, as if it was somehow "disconnected", therefore leaving me completely unable to play but my main desktop works as the iGPU is still alive. Logging out and in doesn't fix the issue, I have to fully shutdown and boot up again to get NVIDIA graphics working again (rebooting does not work as it just freezes the laptop and forces me to do a forced shutdown anyway). At this point, it's quite frustrating due to how so often this happens.
dmesg shows these two entries repeated over and over:
[52383.059448] pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
[52383.059485] nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
[52383.059488] nvidia 0000:01:00.0: device [10de:25a2] error status/mask=00000040/0000a000
[52383.059490] nvidia 0000:01:00.0: [ 6] BadTLP
[152697.045014] nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
[160390.019798] nvidia 0000:01:00.0: can't suspend (nv_pmops_runtime_suspend [nvidia] returned -5)
[160402.389588] NVRM: Error in service of callback
journalctl logged this:
nvidia 0000:01:00.0: can't suspend (nv_pmops_runtime_suspend [nvidia] returned -5)
I've already followed what this guy told me to do on Reddit but still no luck.
I can't find any reliable way to reproduce this bug.
Any fixes? Thank you in advance.
PD: I'm on a laptop, integrated Radeon 680M Graphics (iGPU) and RTX 3050 (dGPU).
Last edited by techmanwalker (2025-07-11 06:28:03)
Offline
Please replace the oversized images w/ links (the board has a 250²px² limit to keep our mouse wheels cool) and post your complete system journal for a boot after losing the device, eg.
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st
for the previous (-1) boot
The AER isn't good but not necessarily fatal and nv_pmops_runtime_suspend probably only indicates the loss that has happened before.
Does the GPU still show up in "lspci -k"?
Offline
http://0x0.st/8UAc.txt
Checking the end of the file quickly shows a lot of NVIDIA-related kernel errors, game coredumps...
As for the lspci -k thing, I will issue that command when I hit the issue again and if the logs I've already provided aren't enough
Offline
jun 27 20:45:23 malasdecisiones kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
jun 27 20:46:02 malasdecisiones kernel: NVRM: Error in service of callback
https://wiki.archlinux.org/title/NVIDIA … er_suspend
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: pciehp: Slot(0): Link Down
jun 28 23:30:49 malasdecisiones kernel: pcieport 0000:00:01.1: pciehp: Slot(0): Card not present
jun 28 23:30:49 malasdecisiones kernel: NVRM: Attempting to remove device 0000:01:00.0 with non-zero usage count!
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU at PCI:0000:01:00: GPU-f5bf4a7e-a547-233a-8aec-19e2319d484f
jun 28 23:30:49 malasdecisiones kernel: NVRM: Xid (PCI:0000:01:00): 79, pid=359, name=nv_queue, GPU has fallen off the bus.
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000.
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU0 RPC history (CPU -> GSP):
jun 28 23:30:49 malasdecisiones kernel: NVRM: entry function data0 data1 ts_start ts_end duration actively_polling
jun 28 23:30:49 malasdecisiones kernel: NVRM: 0 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00d5c1ebc 0x0000000000000000 y
jun 28 23:30:49 malasdecisiones kernel: NVRM: -1 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00d0dfeb0 0x000638b00d0dfffb 331us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -2 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00cbfde70 0x000638b00cbfdfb6 326us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -3 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00c71be3f 0x000638b00c71bfaf 368us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -4 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00c239e7c 0x000638b00c239fd9 349us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -5 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00bd57e69 0x000638b00bd57fca 353us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -6 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00b875e9b 0x000638b00b875fe4 329us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -7 76 GSP_RM_CONTROL 0x000000002080a7d7 0x0000000000000002 0x000638b00b393e72 0x000638b00b39408a 536us
jun 28 23:30:49 malasdecisiones kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
jun 28 23:30:49 malasdecisiones kernel: NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc
jun 28 23:30:49 malasdecisiones kernel: NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000638b00a968855 0x000638b00a968856 1us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -1 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000638b00a968714 0x000638b00a96871c 8us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -2 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000638b00a967442 0x000638b00a967446 4us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -3 4098 GSP_RUN_CPU_SEQUENCER 0x000000000000061c 0x0000000000003fe2 0x000638b00a95ec48 0x000638b00a95fedf 4759us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -4 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x000638b00a37ec8e 0x000638b00a37ec91 3us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -5 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000638b009e7744e 0x000638b009e7744f 1us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -6 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000638b009e77308 0x000638b009e77314 12us
jun 28 23:30:49 malasdecisiones kernel: NVRM: -7 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000638b009e7607d 0x000638b009e76081 4us
jun 28 23:30:49 malasdecisiones kernel: CPU: 9 UID: 0 PID: 359 Comm: nv_queue Tainted: P W OE 6.15.3-zen1-1-zen #1 PREEMPT(full) b8f045b7443e3a8abd7ec6d3d7b42085ea2e9c00
jun 28 23:30:49 malasdecisiones kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
jun 28 23:30:49 malasdecisiones kernel: Hardware name: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.310 06/17/2022
jun 28 23:30:49 malasdecisiones kernel: Call Trace:
jun 28 23:30:49 malasdecisiones kernel: <TASK>
jun 28 23:30:49 malasdecisiones kernel: dump_stack_lvl+0x5d/0x80
jun 28 23:30:49 malasdecisiones kernel: _nv013768rm+0x378/0x720 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv013678rm+0xe2/0x880 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv053604rm+0x594/0x770 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv053136rm+0xd4/0x1f0 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv052821rm+0xd6/0x1b0 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv055004rm+0x3f5/0x500 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv015696rm+0x469/0x680 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: ? __pfx__main_loop+0x10/0x10 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv052961rm+0x29/0x30 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: ? _nv055007rm+0x60/0x60 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv000719rm+0x60/0xa1 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jun 28 23:30:49 malasdecisiones kernel: _nv058185rm+0x3e/0x159 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: ? _nv058032rm+0x110/0x110 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv016124rm+0x2c/0x50 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv058240rm+0x1a/0x40 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _nv016125rm+0x20/0x50 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: rm_execute_work_item+0x141/0x1f0 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: os_execute_work_item+0x68/0x90 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: _main_loop+0x93/0x150 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jun 28 23:30:49 malasdecisiones kernel: ? __pfx__main_loop+0x10/0x10 [nvidia 045776a7b0db30d85d0abbc7a1886b955e595f4a]
jun 28 23:30:49 malasdecisiones kernel: kthread+0xfc/0x240
jun 28 23:30:49 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jun 28 23:30:49 malasdecisiones kernel: ret_from_fork+0x34/0x50
jun 28 23:30:49 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jun 28 23:30:49 malasdecisiones kernel: ret_from_fork_asm+0x1a/0x30
jun 28 23:30:49 malasdecisiones kernel: </TASK>
jun 28 23:30:49 malasdecisiones kernel: NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
Offline
Ok, I've hit this issue right now so I issued lspci -k and it shows up there:
01:00.0 3D controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1)
Subsystem: ASUSTeK Computer Inc. Device 107d
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
NVreg_PreserveVideoMemoryAllocations=1
was already on. I'll disable the GSP firmware as you said and reboot the laptop, and reply here if the GPU still dissappears after a while. Thank you very much for your help :]
Offline
The parameter is set but the relevant services don't seem to be enabled?
Offline
I was just about to miss those- enabled. Should I enable nvidia-powerd as well?
Offline
https://wiki.archlinux.org/title/CPU_fr … dia-powerd - but it's not relevant in this context (from what can be told from the journal - so far )
Offline