You are not logged in.
Pages: 1
I have recently started noticing that my laptop doesn't shutdown properly and I am seeing this message displaying every couple of seconds before I finally hit the power button
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state
Here is a dump of "journal -k -b-1" that I took of the logs from yesterday https://0x0.st/X34S.txt
I haven't yet noticed any other issues, but I did see these messages in my kernel logs this morning
Sep 19 08:54:47 kernel: NVRM: failed to allocate vmap() page descriptor table!
Sep 19 08:54:47 kernel: NVRM: osMapSystemMemory: failed to create system memory kernel mapping!
Sep 19 08:54:47 kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from memdescMap(*ppMemdescRadix3, 0, allocSize, NV_TRUE, NV_PROTECT_WRITEABLE, &pVaKernel, &pPrivKernel) @ kernel_gsp.c:4188
Sep 19 08:54:47 kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspCreateRadix3(pGpu, pKernelGsp, &pKernelGsp->pSRRadix3Descriptor, NULL, NULL, gspfwSRMeta.sizeOfSuspendResumeData) @ kernel_gsp_tu102.c:1052
Sep 19 08:54:47 kernel: NVRM: nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspSavePowerMgmtState_HAL(pGpu, pKernelGsp) @ gpu_suspend.c:114
Sep 19 08:54:47 kernel: nvidia 0000:01:00.0: can't suspend (nv_pmops_runtime_suspend [nvidia] returned -5)
Sep 19 09:04:53 kernel: NVRM: Error in service of callback
I am currently using 560.35.03 of the nvidia Open Kernel Module, but I also saw the same results using the nvidia-dkms package.
I have a NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
Kernel command line is
kernel: Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root="UUID=efa38477-7366-4343-8c9c-c83d0f1aedd2" rw
Has anyone seen this sort of behaviour before? Or knows how I can set about debugging this?
Offline
Here is a dump of "journal -k -b-1"
That's just a random selection or messages.
Sep 18 17:05:08 kernel: pcieport 0000:03:00.0: not ready 1023ms after resume; giving up
Sep 18 17:05:08 kernel: pcieport 0000:00:07.0: pciehp: Slot(0): Card not present
Sep 18 17:05:08 kernel: pcieport 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 18 17:05:08 kernel: pcieport 0000:04:04.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 18 17:05:08 kernel: pcieport 0000:04:03.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 18 17:05:08 kernel: pcieport 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 18 17:05:08 kernel: pcieport 0000:04:01.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 18 17:05:08 kernel: pcieport 0000:04:02.0: Unable to change power state from D3cold to D0, device inaccessible
Sep 18 17:05:08 kernel: pcieport 0000:04:03.0: Runtime PM usage count underflow!
Sep 18 17:05:08 kernel: pcieport 0000:04:02.0: Runtime PM usage count underflow!
Sep 18 17:05:08 kernel: pcieport 0000:04:01.0: Runtime PM usage count underflow!
and multiple devices seem to act up? (Though the context might matter)
Please post your complete system journal for the boot:
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st
But in doubt you're facing https://bbs.archlinux.org/viewtopic.php … 7#p2181317
You can just add "nvidia.NVreg_EnableGpuFirmware=0" to the https://wiki.archlinux.org/title/Kernel_parameters to test this.
Online
But in doubt you're facing https://bbs.archlinux.org/viewtopic.php … 7#p2181317
You can just add "nvidia.NVreg_EnableGpuFirmware=0" to the https://wiki.archlinux.org/title/Kernel_parameters to test this.
Thought this had fixed it as the issue seemed to have gone away for a bit, but it returned the other day. Here are the logs from that boot https://0x0.st/Xgjf.txt
Offline
a) the kernel parameter isn't in that journal
b) do you (still) get the same issue w/ the binary driver intead of nvidia-open
Sep 27 08:33:29 4TELLT129 kernel: ------------[ cut here ]------------
Sep 27 08:33:29 4TELLT129 kernel: WARNING: CPU: 13 PID: 1167 at include/linux/rwsem.h:80 follow_pte+0x1de/0x200
Sep 27 08:33:29 4TELLT129 kernel: Modules linked in: rfcomm xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE bridge stp llc nf_conntrack_netlink xfrm_user xfrm_algo ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype evdi(OE) nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core dimlib cifs_md4 dns_resolver netfs cmac algif_hash algif_skcipher af_alg ip6table_filter ip6_tables iptable_filter overlay bnep tun nvidia_drm(OE) nvidia_modeset(OE) vfat fat snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine r8153_ecm cdc_ether usbnet joydev snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common intel_uncore_frequency
Sep 27 08:33:29 4TELLT129 kernel: intel_uncore_frequency_common intel_tcc_cooling iwlmvm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_scodec_component crct10dif_pclmul mac80211 crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 libarc4 snd_hda_intel sha256_ssse3 ptp snd_intel_dspcfg sha1_ssse3 snd_intel_sdw_acpi pps_core uvcvideo aesni_intel snd_hda_codec btusb videobuf2_vmalloc uvc crypto_simd btrtl snd_hda_core videobuf2_memops cryptd videobuf2_v4l2 btintel snd_hwdep iwlwifi iTCO_wdt videodev btbcm hid_multitouch mei_pxp snd_pcm r8169 intel_pmc_bxt mei_hdcp ee1004 vboxnetflt(OE) btmtk iTCO_vendor_support rapl vboxnetadp(OE) intel_cstate cfg80211 bluetooth r8152 ucsi_acpi videobuf2_common realtek snd_timer vboxdrv(OE) i2c_i801 spi_nor mdio_devres mei_me mii intel_lpss_pci snd typec_ucsi i2c_smbus intel_lpss mousedev mc intel_uncore psmouse pcspkr typec libphy mtd i2c_mux mei i2c_hid_acpi thunderbolt idma64 soundcore rfkill
Sep 27 08:33:29 4TELLT129 kernel: intel_pmc_core roles i2c_hid nvidia_uvm(OE) intel_vsec intel_hid pmt_telemetry sparse_keymap pmt_class pinctrl_tigerlake acpi_pad mac_hid nvidia(OE) sg crypto_user dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec hid_generic usbhid i915 serio_raw i2c_algo_bit drm_buddy sdhci_pci atkbd nvme ttm cqhci libps2 intel_gtt sdhci vivaldi_fmap nvme_core mxm_wmi spi_intel_pci drm_display_helper xhci_pci mmc_core crc32c_intel spi_intel xhci_pci_renesas nvme_auth cec i8042 video serio wmi
Sep 27 08:33:29 4TELLT129 kernel: CPU: 13 PID: 1167 Comm: nv_queue Tainted: G OE 6.10.10-arch1-1 #1 e28ee6293423e91d57555c4cc06eb839714254b7
Sep 27 08:33:29 4TELLT129 kernel: Hardware name: Metabox Alpha-S NP50HP/NP5x_NP6x_NP7xHP, BIOS 1.07.04TMB1 07/19/2021
Sep 27 08:33:29 4TELLT129 kernel: RIP: 0010:follow_pte+0x1de/0x200
Sep 27 08:33:29 4TELLT129 kernel: Code: cc cc cc 48 81 e2 00 00 00 c0 48 09 c2 48 f7 d2 48 85 fa 75 20 e8 b2 f5 ff ff 48 8b 35 6b dd 5c 01 48 81 e6 00 00 00 c0 eb 8d <0f> 0b 48 3b 1f 0f 83 50 fe ff ff bd ea ff ff ff eb b6 49 8b 3c 24
Sep 27 08:33:29 4TELLT129 kernel: RSP: 0018:ffffa5af438f3b48 EFLAGS: 00010246
Sep 27 08:33:29 4TELLT129 kernel: RAX: 0000000000000000 RBX: 00007759cd0ab000 RCX: ffffa5af438f3b88
Sep 27 08:33:29 4TELLT129 kernel: RDX: ffffa5af438f3b80 RSI: 00007759cd0ab000 RDI: ffff9a10c2115e30
Sep 27 08:33:29 4TELLT129 kernel: RBP: ffffa5af438f3bc8 R08: ffffa5af438f3d20 R09: 0000000000000000
Sep 27 08:33:29 4TELLT129 kernel: R10: 0000000000000001 R11: 0000000000000003 R12: ffffa5af438f3b88
Sep 27 08:33:29 4TELLT129 kernel: R13: ffffa5af438f3b80 R14: ffff9a108d5ddd80 R15: 0000000000000000
Sep 27 08:33:29 4TELLT129 kernel: FS: 0000000000000000(0000) GS:ffff9a13fb480000(0000) knlGS:0000000000000000
Sep 27 08:33:29 4TELLT129 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 27 08:33:29 4TELLT129 kernel: CR2: 0000241822073180 CR3: 0000000128ba0006 CR4: 0000000000f70ef0
Sep 27 08:33:29 4TELLT129 kernel: PKRU: 55555554
Sep 27 08:33:29 4TELLT129 kernel: Call Trace:
Sep 27 08:33:29 4TELLT129 kernel: <TASK>
Sep 27 08:33:29 4TELLT129 kernel: ? follow_pte+0x1de/0x200
Sep 27 08:33:29 4TELLT129 kernel: ? __warn.cold+0x8e/0xe8
Sep 27 08:33:29 4TELLT129 kernel: ? follow_pte+0x1de/0x200
Sep 27 08:33:29 4TELLT129 kernel: ? report_bug+0xff/0x140
Sep 27 08:33:29 4TELLT129 kernel: ? handle_bug+0x3c/0x80
Sep 27 08:33:29 4TELLT129 kernel: ? exc_invalid_op+0x17/0x70
Sep 27 08:33:29 4TELLT129 kernel: ? asm_exc_invalid_op+0x1a/0x20
Sep 27 08:33:29 4TELLT129 kernel: ? follow_pte+0x1de/0x200
Sep 27 08:33:29 4TELLT129 kernel: follow_phys+0x49/0x110
Sep 27 08:33:29 4TELLT129 kernel: untrack_pfn+0x55/0x120
Sep 27 08:33:29 4TELLT129 kernel: unmap_single_vma+0xa6/0xe0
Sep 27 08:33:29 4TELLT129 kernel: zap_page_range_single+0x122/0x1d0
Sep 27 08:33:29 4TELLT129 kernel: unmap_mapping_range+0x116/0x140
Sep 27 08:33:29 4TELLT129 kernel: ? __pfx__main_loop+0x10/0x10 [nvidia 60797da24455cf8cecb9ac63336161f36efd1db6]
Sep 27 08:33:29 4TELLT129 kernel: nv_revoke_gpu_mappings+0x67/0xb0 [nvidia 60797da24455cf8cecb9ac63336161f36efd1db6]
Sep 27 08:33:29 4TELLT129 kernel: RmHandleIdleSustained+0x3b/0x140 [nvidia 60797da24455cf8cecb9ac63336161f36efd1db6]
Sep 27 08:33:29 4TELLT129 kernel: ? gpumgrGetGpu+0x69/0xa0 [nvidia 60797da24455cf8cecb9ac63336161f36efd1db6]
Sep 27 08:33:29 4TELLT129 kernel: rm_execute_work_item+0xda/0x150 [nvidia 60797da24455cf8cecb9ac63336161f36efd1db6]
Sep 27 08:33:29 4TELLT129 kernel: _main_loop+0x95/0x150 [nvidia 60797da24455cf8cecb9ac63336161f36efd1db6]
Sep 27 08:33:29 4TELLT129 kernel: kthread+0xcf/0x100
Sep 27 08:33:29 4TELLT129 kernel: ? __pfx_kthread+0x10/0x10
Sep 27 08:33:29 4TELLT129 kernel: ret_from_fork+0x31/0x50
Sep 27 08:33:29 4TELLT129 kernel: ? __pfx_kthread+0x10/0x10
Sep 27 08:33:29 4TELLT129 kernel: ret_from_fork_asm+0x1a/0x30
Sep 27 08:33:29 4TELLT129 kernel: </TASK>
Sep 27 08:33:29 4TELLT129 kernel: ---[ end trace 0000000000000000 ]---
or when disabling zswap?
Looks like https://bbs.archlinux.org/viewtopic.php?id=290126 which is about resuming from sleep but your GPU is a render device and will likely enter rtd3, you could try to disable that at the cost of increased battery drain: https://wiki.archlinux.org/title/PRIME# … Management
Online
a) the kernel parameter isn't in that journal
I added it to /etc/modprobe.d/nvidia.conf
options nvidia "NVreg_EnableGpuFirmware=0"
I have rerun mkinitramfs -P since doing that
b) do you (still) get the same issue w/ the binary driver intead of nvidia-open
I haven't tried yet. I will try disabling zswap with the nvidia-open driver first, then try with binary driver
or when disabling zswap?
Didn't know this was a thing, will give it a try.
Looks like https://bbs.archlinux.org/viewtopic.php?id=290126 which is about resuming from sleep but your GPU is a render device and will likely enter rtd3, you could try to disable that at the cost of increased battery drain: https://wiki.archlinux.org/title/PRIME# … Management
I don't allow my laptop to sleep or hibernate, those targets are masked on my system and the nvidia power management units are disabled. Is this still likely to be an option?
$ systemctl list-unit-files | grep nvidia
nvidia-hibernate.service disabled disabled
nvidia-persistenced.service disabled disabled
nvidia-powerd.service disabled disabled
nvidia-resume.service disabled disabled
nvidia-suspend.service disabled disabled
I am also using a thunderbolt dock to drive external monitors (the monitors are connected via thunderbolt -> DisplayPort adapters), could this be related?
Offline
There seem no outputs attached to the nvidia GPU.
Another thing would be to enable https://wiki.archlinux.org/title/NVIDIA … de_setting - use the "nvidia_drm.modeset=1" kernel parameter (modprobe.conf won't do!) to get rid of the simpledrm device and also expose the nvidia attached outputs (iff any) to the drm subsystem.
Online
There seem no outputs attached to the nvidia GPU.
This would be correct. The thunderbolt port is, I believe, separate to the nvidia GPU. So I suppose it could be going into a low power mode because it isn't being used?
$ lspci
~~snip~~
00:07.0 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #0 (rev 05)[/b]
~~snip~~
01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
~~snip~~
use the "nvidia_drm.modeset=1" kernel parameter
I will give this a go too.
Offline
So I suppose it could be going into a low power mode because it isn't being used?
Yup (I kinda skipped your last line - I've a reflex to click on logs
I don't expect kms to help you out here, though. It's just generally a good idea.
You might alternatively want to try https://aur.archlinux.org/packages/nvidia-535xx-dkms (nvidida had severe issues w/ the 55yxx drivers; they seem to have been fixed w/ the 560xx ones, but maybe the problem was only shifted somewhere else)
Likewise you can expect issues w/ 6.11 (though a patch for that is pending) and generally may want to test the behavior of the LTS kernel here.
Online
Pages: 1