You are not logged in.
I'm using the latest kernel and the latest nvidia-dkms, and at random, usually when disconnecting the AC power, nvidia would just straight up hang. Running nvidia-smi gets stuck, can't even do Ctrl-C, and I cannot launch any games that uses the gpu, etc. And when looking journalctl, it shows
Jul 30 17:29:22 arch kernel: INFO: task nv_queue:690 blocked for more than 491 seconds.
Jul 30 17:29:22 arch kernel: Tainted: P OE 6.15.8-arch1-1 #1
Jul 30 17:29:22 arch kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 17:29:22 arch kernel: task:nv_queue state:D stack:0 pid:690 tgid:690 ppid:2 task_flags:0x208040 flags:0x00004000
Jul 30 17:29:22 arch kernel: Call Trace:
Jul 30 17:29:22 arch kernel: <TASK>
Jul 30 17:29:22 arch kernel: __schedule+0x409/0x1320
Jul 30 17:29:22 arch kernel: ? sysvec_reschedule_ipi+0x28/0xf0
Jul 30 17:29:22 arch kernel: ? os_execute_work_item+0x40/0x90 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: schedule+0x27/0xd0
Jul 30 17:29:22 arch kernel: schedule_preempt_disabled+0x15/0x30
Jul 30 17:29:22 arch kernel: rwsem_down_write_slowpath+0x1f4/0x6e0
Jul 30 17:29:22 arch kernel: down_write+0x5a/0x60
Jul 30 17:29:22 arch kernel: os_acquire_rwlock_write+0x2b/0x40 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: _nv051520rm+0x10/0x40 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: _nv053004rm+0x28c/0x360 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: _nv059758rm+0x63/0x230 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: ? __pfx__main_loop+0x10/0x10 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: rm_execute_work_item+0x66/0x1f0 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: os_execute_work_item+0x68/0x90 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: _main_loop+0x93/0x150 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jul 30 17:29:22 arch kernel: ? __pfx__main_loop+0x10/0x10 [nvidia fa5ebea0038b3beddc32112899bd83cb6d0676ed]
Jul 30 17:29:22 arch kernel: kthread+0xfc/0x240
Jul 30 17:29:22 arch kernel: ? __pfx_kthread+0x10/0x10
Jul 30 17:29:22 arch kernel: ret_from_fork+0x34/0x50
Jul 30 17:29:22 arch kernel: ? __pfx_kthread+0x10/0x10
Jul 30 17:29:22 arch kernel: ret_from_fork_asm+0x1a/0x30
Jul 30 17:29:22 arch kernel: </TASK>
Jul 30 17:29:22 arch kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warningsI tried switching to the open driver, but now the GPU becomes very slow only when the bug would normally occur. You can notice this, for example, by running nvidia-smi. It takes soo long to print the output. So instead of a full GPU hang like with the closed driver, it just slows down, which is better, but still an issue.
This doesn't seem to effect 550x driver.
This is my nvidia kernel parameter
# disable gsp & preserve video memory
options nvidia NVreg_EnableGpuFirmware=0 NVreg_PreserveVideoMemoryAllocations=1
# ensure it's initialized
# enabled by default now
# options nvidia_drm modeset=1 fbdev=1
# disable deepcolor
options nvidia_modeset hdmi_deepcolor=0And seems like I can find other people having this issue but got underlooked
https://forums.developer.nvidia.com/t/5 … 330513/141
Last edited by barra (2025-08-01 01:56:29)
Offline
GSP issue?
=> https://bbs.archlinux.org/viewtopic.php?id=306623
Otherwise *maybe* related to https://bbs.archlinux.org/viewtopic.php … 2#p2254252 or do you also run daemons like asusd ?
Offline
Oh, sorry for the very late reply, I think I haven't turn on any email notification.
Either way, I still don't know what causing it, and no, I dont have asusd, and I dont have GSP turned on. but for now I disabled nvidia-powerd beause I dont really use them much (dynamic boost). And if I absolutely need to I would just manually start the service.
Thanks for your very quick reply btw
Offline
nb. that since a couple of versions GSP is active by default, you'll have to actively disable it now.
Offline