You are not logged in.
I can't find a reliable way to reproduce this but this has happened 4 times as for now:
The first one happened apparently with no cause
After trying to play a video using VLC
While browsing a folder with Gwenview
After running beesd to deduplicate my disk
So when it happens, if I try to launch e.g. the browser, it just spawns the placeholder window but it never really loads, then I spawn a terminal (which does launch) and type a command like nvtop, htop or whatever command, it just stays on hold -does not output anything but also doesn't return to the command line, it just stays there like when you type cat, and ^C doesn't do anything. Then I try to soft-reboot or fully shutdown but the system halts, as you could see it in the logs. Have to force the shutdown and power back on to get it running again.
The most highlighted entries in the journal I can see are: dbus-broker-launch failed requests, Plasma and program coredumps, and pipewire errors.
This is happening very often lately and I would like to know if it's some kind of hardware problem -which I think it's not because it has happened to me in other laptops.
first time coredump log
coredump files
latest journal log
Last edited by techmanwalker (2025-07-03 16:54:23)
Offline
Bump. It happened again while playing a video with VLC but this time I co some more logs.
Hope to find a fix for this issue soon ty in advance
jul 03 18:47:45 malasdecisiones kernel: INFO: task nv_queue:342 blocked for more than 368 seconds.
jul 03 18:47:45 malasdecisiones kernel: Tainted: P OE 6.15.4-zen2-1-zen #1
jul 03 18:47:45 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 03 18:47:45 malasdecisiones kernel: task:nv_queue state:D stack:0 pid:342 tgid:342 ppid:2 task_flags:0x208040 flags:0x000>
jul 03 18:47:45 malasdecisiones kernel: Call Trace:
jul 03 18:47:45 malasdecisiones kernel: <TASK>
jul 03 18:47:45 malasdecisiones kernel: __schedule+0x451/0x2380
jul 03 18:47:45 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:47:45 malasdecisiones kernel: ? timerqueue_add+0x73/0xd0
jul 03 18:47:45 malasdecisiones kernel: schedule_preempt_disabled+0x2e/0xe0
jul 03 18:47:45 malasdecisiones kernel: rwsem_down_write_slowpath+0x1ed/0x6c0
jul 03 18:47:45 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:47:45 malasdecisiones kernel: down_write+0x5a/0x60
jul 03 18:47:45 malasdecisiones kernel: os_acquire_rwlock_write+0x2b/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: _nv051520rm+0x10/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: _nv053004rm+0x28c/0x360 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: _nv059758rm+0x63/0x230 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: ? __pfx__main_loop+0x10/0x10 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: rm_execute_work_item+0x66/0x1f0 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: os_execute_work_item+0x68/0x90 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: _main_loop+0x93/0x150 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:47:45 malasdecisiones kernel: ? __pfx__main_loop+0x10/0x10 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel: kthread+0xfc/0x240
jul 03 18:47:45 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 03 18:47:45 malasdecisiones kernel: ret_from_fork+0x34/0x50
jul 03 18:47:45 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 03 18:47:45 malasdecisiones kernel: ret_from_fork_asm+0x1a/0x30
jul 03 18:47:45 malasdecisiones kernel: </TASK>
jul 03 18:49:47 malasdecisiones kernel: INFO: task kworker/15:1:227 blocked for more than 491 seconds.
jul 03 18:49:47 malasdecisiones kernel: Tainted: P OE 6.15.4-zen2-1-zen #1
jul 03 18:49:47 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 03 18:49:47 malasdecisiones kernel: task:kworker/15:1 state:D stack:0 pid:227 tgid:227 ppid:2 task_flags:0x4208060 flags:0x00>
jul 03 18:49:47 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 03 18:49:47 malasdecisiones kernel: Call Trace:
jul 03 18:49:47 malasdecisiones kernel: <TASK>
jul 03 18:49:47 malasdecisiones kernel: __schedule+0x451/0x2380
jul 03 18:49:47 malasdecisiones kernel: ? ttwu_queue_wakelist+0xf7/0x110
jul 03 18:49:47 malasdecisiones kernel: schedule_preempt_disabled+0x2e/0xe0
jul 03 18:49:47 malasdecisiones kernel: rwsem_down_write_slowpath+0x1ed/0x6c0
jul 03 18:49:47 malasdecisiones kernel: ? ep_autoremove_wake_function+0x16/0x60
jul 03 18:49:47 malasdecisiones kernel: ? number+0x4ac/0x5c0
jul 03 18:49:47 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:49:47 malasdecisiones kernel: down_write+0x5a/0x60
jul 03 18:49:47 malasdecisiones kernel: os_acquire_rwlock_write+0x2b/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel: _nv051520rm+0x10/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel: _nv053004rm+0x28c/0x360 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel: _nv000839rm+0x27/0x70 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel: rm_acpi_notify+0xf1/0x280 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:49:47 malasdecisiones kernel: acpi_ev_notify_dispatch+0x4e/0x70
jul 03 18:49:47 malasdecisiones kernel: acpi_os_execute_deferred+0x1a/0x30
jul 03 18:49:47 malasdecisiones kernel: process_one_work+0x193/0x350
jul 03 18:49:47 malasdecisiones kernel: worker_thread+0x254/0x3a0
jul 03 18:49:47 malasdecisiones kernel: ? __pfx_worker_thread+0x10/0x10
jul 03 18:49:47 malasdecisiones kernel: kthread+0xfc/0x240
jul 03 18:49:47 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 03 18:49:47 malasdecisiones kernel: ret_from_fork+0x34/0x50
jul 03 18:49:47 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 03 18:49:47 malasdecisiones kernel: ret_from_fork_asm+0x1a/0x30
jul 03 18:49:47 malasdecisiones kernel: </TASK>
jul 03 19:03:31 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:03:31 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC15_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7cfe200
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000001ff0a240701
jul 03 19:03:31 malasdecisiones kernel:
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 19:03:31 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC16_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7cfe200
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x000001ff0a240701
jul 03 19:03:31 malasdecisiones kernel:
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce23c0
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x000001ff0a240700
jul 03 19:03:31 malasdecisiones kernel:
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2300
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600350f00, Syndrome: 0x000001ff0a240700
jul 03 19:03:31 malasdecisiones kernel:
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Offline
> modeset=1
Nonsense?
> preempt=full amd_iommu=on pcie.aspm.policy=default
Why?
Notably IOMMU and
jul 03 20:09:03 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC15_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel:
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 20:09:03 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC16_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel:
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel:
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600350f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel:
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
There're multiple of those and only after a couple of them you start to get hung tasks, but
jul 03 18:41:40 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 18:43:39 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
is pretty much the 2m gap suggesting that that error lead into the hung tasks.
Do you get this w/ "iommu=soft"?
Also
jul 03 17:36:04 archlinux kernel: smpboot: CPU0: AMD Ryzen 9 6900HX with Radeon Graphics (family: 0x19, model: 0x44, stepping: 0x1)
https://wiki.archlinux.org/title/Ryzen#Random_reboots
While not exactly the symptoms, I'd take a very close look there…
Though interestingly
jul 03 17:47:03 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 17:52:31 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 17:57:59 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:03:26 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:08:54 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:14:22 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:19:49 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:25:17 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:30:45 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:36:12 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:41:40 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:58:03 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:03:31 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:08:58 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:14:26 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:19:54 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:25:21 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:30:49 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:36:17 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:41:44 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:47:12 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:52:40 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:58:07 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 20:03:35 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 20:09:03 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
WIth the exception of the hung task flurry, they seem to hit every 5 minutes
These lead to the
jul 03 18:43:39 malasdecisiones kernel: INFO: task kworker/15:1:227 blocked for more than 122 seconds.
jul 03 18:43:39 malasdecisiones kernel: Tainted: P OE 6.15.4-zen2-1-zen #1
jul 03 18:43:39 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 03 18:43:39 malasdecisiones kernel: task:kworker/15:1 state:D stack:0 pid:227 tgid:227 ppid:2 task_flags:0x4208060 flags:0x00004000
jul 03 18:43:39 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 03 18:43:39 malasdecisiones kernel: Call Trace:
jul 03 18:43:39 malasdecisiones kernel: <TASK>
jul 03 18:43:39 malasdecisiones kernel: __schedule+0x451/0x2380
jul 03 18:43:39 malasdecisiones kernel: ? ttwu_queue_wakelist+0xf7/0x110
2 minutes later
jul 03 17:36:04 archlinux kernel: nvme0n1: p1 p2 p3 p4 p5 p6 p7
Is there a parallel windows installation?
> rd.luks.options=timeout=0,discard
Be very careful w/ discard, https://wiki.archlinux.org/title/Solid_ … nuous_TRIM
Online
> modeset=1
Nonsense?
I run an Nvidia card (3050 mobile)
> preempt=full amd_iommu=on pcie.aspm.policy=default
Why?
I had those enabled in my old laptop, so I carried those over (mostly because of the encrypted btrfs)
Do you get this w/ "iommu=soft"?
Yes, I got those exact same errors after playing a video in VLC, but I don't think that the fact that it was VLC is precisely related
https://wiki.archlinux.org/title/Ryzen#Random_reboots
While not exactly the symptoms, I'd take a very close look there…
There isn't any voltage control in my UEFI. It's an Asus Zenbook Pro 17 laptop...
Is there a parallel windows installation?
Windows 11 alongside, so yes
Be very careful w/ discard
When I was a bit newer in Linux I got a huge performance hit with my first LUKS encrypted system. I got like 15 MB/s write speed, so adding that flag solved the issue. I've had no other (performance related) disk issues since yet, also my current SSD is Samsung (SAMSUNG MZVL21T0HCLR-00B00)
Offline
I run an Nvidia card (3050 mobile)
This is still nonsense, the frequently advertised kernel parameter is "nvidia_drm.modeset=1" and the feature is enabled by default since 555xx or 565xx - the only remaining function is to block the simpledrm device.
I had those enabled in my old laptop
Do not enforce an IOMMU and it's pcie_aspm.policy and "default" is the "default", anyway https://wiki.archlinux.org/title/Power_ … Management
Windows 11 alongside, so yes
3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
There isn't any voltage control in my UEFI.
I take that the ellipsis is meant to say that "asus firmware is known to be unconfigurable shit", but there's nothing "precision boost overdrive" related? Can you adjust any clock rates (are you overclocking the system)
adding that flag solved the issue
See the linked wiki, consider a periodic trim and make sure your new(?) nvme actually supports this.
Online
"nvidia_drm.modeset=1" and the feature is enabled by default since 555xx or 565xx
Really? I remember reading back then that nvidia_drm.modeset=1 should be replaced with that option but I think I just misread that
Do not enforce an IOMMU and it's pcie_aspm.policy and "default" is the "default", anyway
Well well, removing it...
3rd link below. Mandatory.
Disable it
I disabled it a while ago
I take that the ellipsis is meant to say that "asus firmware is known to be unconfigurable shit", but there's nothing "precision boost overdrive" related? Can you adjust any clock rates (are you overclocking the system)
No, not at all. The UI is indeed very Fisher-Price coded with hardly the option to disable Secure Boot, so I'm a bit out of luck there
Last edited by techmanwalker (2025-07-05 11:22:53)
Offline
There's a difference between a module option in modprobe.d and as a kernel parameter - on the kernel commandline, modprobe still needs to somehow know what module the parameter to apply to.
I disabled it a while ago
Have you made sure that MS hasn't re-enabled it interim (because that infrequently happens w/ random updates)
Did you test iommu=soft and do you still get those MCEs?
Online
I have not even booted Windows since disabling it, so yes... and in fact I got the same errors and my system halted again
Offline
On iommu=soft?
Are you legitimately running OOM? (12 is ENOMEM) - you've 16GB RAM and 32GB swap, so this seems kinda unlikely.
Does your fisher-price BIOS allow you to disable internal devices like the webcam - or wifi? Bluetooth, the fingerprint reader etc et pp?
Edit: also
jul 03 17:36:04 archlinux kernel: DMI: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.310 06/17/2022
are there maybe bios updates available for the device?
Last edited by seth (2025-07-05 15:04:19)
Online
On iommu=soft?
yes
Does your fisher-price BIOS allow you to disable internal devices like the webcam - or wifi? Bluetooth, the fingerprint reader etc et pp?
are there maybe bios updates available for the device?
BIOS UM6702RC.320 02/21/2023
This is how far MyASUS got-
EDIT: Linux completely froze again while doing a high CPU consuming task (AV1 encoding), this time it printed a big yellow message at the end of the full journal log.
Last edited by techmanwalker (2025-07-06 02:10:59)
Offline
There're no MCE errors in the last on
jul 05 18:53:15 malasdecisiones kernel: BUG: unable to handle page fault for address: 0000002000000009
jul 05 18:53:15 malasdecisiones kernel: #PF: supervisor read access in kernel mode
jul 05 18:53:15 malasdecisiones kernel: #PF: error_code(0x0000) - not-present page
jul 05 18:53:15 malasdecisiones kernel: PGD 0 P4D 0
jul 05 18:53:15 malasdecisiones kernel: Oops: Oops: 0000 [#1] SMP NOPTI
jul 05 18:53:15 malasdecisiones kernel: CPU: 7 UID: 0 PID: 24033 Comm: kworker/7:0 Tainted: P OE 6.15.4-zen2-1-zen #1 PREEMPT(full) 1435e5a15a997e99695c9b8e649db9c0130dcff7
jul 05 18:53:15 malasdecisiones kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
jul 05 18:53:15 malasdecisiones kernel: Hardware name: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.320 02/21/2023
jul 05 18:53:15 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 05 18:53:15 malasdecisiones kernel: RIP: 0010:_nv055006rm+0x2c/0x1a0 [nvidia]
jul 05 18:53:15 malasdecisiones kernel: Code: 1f 00 41 57 41 56 41 55 49 89 f5 41 54 41 89 cc 53 48 83 ed 10 48 89 d3 48 85 d2 0f 84 0d 01 00 00 48 85 f6 0f 84 4c 01 00 00 <8b> 42 08 4c 8d 4a 40 a8 02 74 79 85 c9 0f 84 b9 00 00 00 4c 8b 7a
jul 05 18:53:15 malasdecisiones kernel: RSP: 0018:ffffd2a8caa4fca0 EFLAGS: 00010282
jul 05 18:53:15 malasdecisiones kernel: RAX: 0000000000000000 RBX: 0000002000000001 RCX: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: RDX: 0000002000000001 RSI: ffff8c481ea55ea8 RDI: ffffffffc2d89200
jul 05 18:53:15 malasdecisiones kernel: RBP: ffff8c481ea55d60 R08: 0000000000000000 R09: ffffffffc26c75da
jul 05 18:53:15 malasdecisiones kernel: R10: ffff8c49e90d9840 R11: fffff2af4ba43640 R12: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: R13: ffff8c481ea55ea8 R14: ffff8c481ea55ea8 R15: ffff8c481ea55e68
jul 05 18:53:15 malasdecisiones kernel: FS: 0000000000000000(0000) GS:ffff8c4b9aaa9000(0000) knlGS:0000000000000000
jul 05 18:53:15 malasdecisiones kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 05 18:53:15 malasdecisiones kernel: CR2: 0000002000000009 CR3: 00000003949f4000 CR4: 0000000000f50ef0
jul 05 18:53:15 malasdecisiones kernel: PKRU: 55555554
jul 05 18:53:15 malasdecisiones kernel: Call Trace:
jul 05 18:53:15 malasdecisiones kernel: <TASK>
jul 05 18:53:15 malasdecisiones kernel: _nv055004rm+0x212/0x500 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: _nv015696rm+0x424/0x680 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: _nv052961rm+0x29/0x30 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: ? _nv055007rm+0x60/0x60 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: rm_acpi_notify+0x126/0x280 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 05 18:53:15 malasdecisiones kernel: acpi_ev_notify_dispatch+0x4e/0x70
jul 05 18:53:15 malasdecisiones kernel: acpi_os_execute_deferred+0x1a/0x30
jul 05 18:53:15 malasdecisiones kernel: process_one_work+0x193/0x350
jul 05 18:53:15 malasdecisiones kernel: worker_thread+0x254/0x3a0
jul 05 18:53:15 malasdecisiones kernel: ? __pfx_worker_thread+0x10/0x10
jul 05 18:53:15 malasdecisiones kernel: kthread+0xfc/0x240
jul 05 18:53:15 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 05 18:53:15 malasdecisiones kernel: ret_from_fork+0x34/0x50
jul 05 18:53:15 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 05 18:53:15 malasdecisiones kernel: ret_from_fork_asm+0x1a/0x30
jul 05 18:53:15 malasdecisiones kernel: </TASK>
jul 05 18:53:15 malasdecisiones kernel: Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_soc_dmic snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_soc_acpi_amd_match snd_amd_sdw_acpi soundwire_amd mt7921e soundwire_generic_allocation mt7921_common soundwire_bus mt792x_lib snd_soc_sdca joydev mousedev mt76_connac_lib snd_hda_codec_realtek snd_hda_scodec_cs35l41_spi snd_soc_core mt76 snd_hda_codec_generic intel_rapl_msr amd_atl snd_compress ac97_bus intel_rapl_common snd_hda_scodec_component snd_hda_codec_hdmi snd_ctl_led snd_pcm_dmaengine snd_hda_intel snd_rpl_pci_acp6x snd_intel_dspcfg mac80211 uvcvideo snd_acp_pci snd_intel_sdw_acpi videobuf2_vmalloc snd_amd_acpi_mach uvc snd_hda_codec snd_acp_legacy_common snd_hda_scodec_cs35l41_i2c videobuf2_memops snd_hda_scodec_cs35l41 btusb snd_pci_acp6x
jul 05 18:53:15 malasdecisiones kernel: libarc4 snd_hda_core videobuf2_v4l2 btrtl snd_pci_acp5x snd_hda_cs_dsp_ctls kvm_amd videobuf2_common asus_nb_wmi btintel snd_rn_pci_acp3x snd_hwdep snd_soc_cs_amp_lib sp5100_tco hid_multitouch cfg80211 snd_acp_config asus_wmi btbcm ucsi_acpi snd_pcm snd_soc_cs35l41_lib videodev snd_soc_acpi platform_profile typec_ucsi btmtk cs_dsp i2c_piix4 snd_timer kvm bluetooth irqbypass mc rapl sparse_keymap pcspkr snd typec wmi_bmof thunderbolt k10temp soundcore rfkill snd_pci_acp3x i2c_smbus roles i2c_hid_acpi i2c_hid serial_multi_instantiate mac_hid amd_pmc acpi_tad i2c_dev sg crypto_user loop nfnetlink ip_tables x_tables dm_crypt encrypted_keys trusted asn1_encoder tee nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) dm_mod amdgpu amdxcp polyval_clmulni i2c_algo_bit polyval_generic drm_exec ghash_clmulni_intel sha512_ssse3 gpu_sched sdhci_pci sha256_ssse3 drm_suballoc_helper sdhci_uhs2 sha1_ssse3 drm_panel_backlight_quirks nvme sdhci aesni_intel drm_buddy crypto_simd cqhci drm_ttm_helper nvme_core
jul 05 18:53:15 malasdecisiones kernel: drm_display_helper cryptd ttm mmc_core ccp nvme_keyring cec video nvme_auth wmi serio_raw
jul 05 18:53:15 malasdecisiones kernel: CR2: 0000002000000009
jul 05 18:53:15 malasdecisiones kernel: ---[ end trace 0000000000000000 ]---
jul 05 18:53:15 malasdecisiones kernel: RIP: 0010:_nv055006rm+0x2c/0x1a0 [nvidia]
jul 05 18:53:15 malasdecisiones kernel: Code: 1f 00 41 57 41 56 41 55 49 89 f5 41 54 41 89 cc 53 48 83 ed 10 48 89 d3 48 85 d2 0f 84 0d 01 00 00 48 85 f6 0f 84 4c 01 00 00 <8b> 42 08 4c 8d 4a 40 a8 02 74 79 85 c9 0f 84 b9 00 00 00 4c 8b 7a
jul 05 18:53:15 malasdecisiones kernel: RSP: 0018:ffffd2a8caa4fca0 EFLAGS: 00010282
jul 05 18:53:15 malasdecisiones kernel: RAX: 0000000000000000 RBX: 0000002000000001 RCX: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: RDX: 0000002000000001 RSI: ffff8c481ea55ea8 RDI: ffffffffc2d89200
jul 05 18:53:15 malasdecisiones kernel: RBP: ffff8c481ea55d60 R08: 0000000000000000 R09: ffffffffc26c75da
jul 05 18:53:15 malasdecisiones kernel: R10: ffff8c49e90d9840 R11: fffff2af4ba43640 R12: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: R13: ffff8c481ea55ea8 R14: ffff8c481ea55ea8 R15: ffff8c481ea55e68
jul 05 18:53:15 malasdecisiones kernel: FS: 0000000000000000(0000) GS:ffff8c4b9aaa9000(0000) knlGS:0000000000000000
jul 05 18:53:15 malasdecisiones kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 05 18:53:15 malasdecisiones kernel: CR2: 0000002000000009 CR3: 00000003949f4000 CR4: 0000000000f50ef0
jul 05 18:53:15 malasdecisiones kernel: PKRU: 55555554
jul 05 18:53:15 malasdecisiones kernel: note: kworker/7:0[24033] exited with irqs disabled
jul 05 18:10:27 archlinux kernel: Memory: 15298888K/15988900K available (22464K kernel code, 2944K rwdata, 16252K rodata, 4784K init, 4744K bss, 667536K reserved, 0K cma-reserved)
jul 05 18:10:29 archlinux kernel: [drm] amdgpu: 512M of VRAM memory ready
jul 05 18:10:29 archlinux kernel: [drm] amdgpu: 7610M of GTT memory ready.
1. dmidecode
2. "amdgpu.gttsize=1024", https://wiki.archlinux.org/title/Kernel_parameters
3. https://wiki.archlinux.org/title/Stress … MemTest86+ (nb. the "at least 10 cycles" and my suggestion is to just run it "over night" (16+ h)
Although (ignoring the previous errors)
jul 05 18:10:31 archlinux kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:10:31 archlinux kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
jul 05 18:10:31 archlinux kernel: nvidia 0000:01:00.0: [drm] No compatible format found
jul 05 18:10:31 archlinux kernel: nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
jul 05 18:10:33 malasdecisiones systemd[1]: Starting NVIDIA Persistence Daemon...
jul 05 18:10:33 malasdecisiones nvidia-persistenced[699]: Started (699)
jul 05 18:10:33 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:10:33 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:10:33 malasdecisiones systemd[1]: Started NVIDIA Persistence Daemon.
jul 05 18:10:47 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:11:00 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: device [10de:25a2] error status/mask=00000040/0000a000
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: [ 6] BadTLP
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: device [10de:25a2] error status/mask=00000040/0000a000
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: [ 6] BadTLP
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: device [10de:25a2] error status/mask=00000040/0000a000
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: [ 6] BadTLP
jul 05 18:38:10 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:51:48 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:52:13 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:53:15 malasdecisiones kernel: RIP: 0010:_nv055006rm+0x2c/0x1a0 [nvidia]
jul 05 18:53:15 malasdecisiones kernel: _nv055004rm+0x212/0x500 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: _nv015696rm+0x424/0x680 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel: _nv052961rm+0x29/0x30 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
the device acts up frequently - what happens if you completely disable the persistenced? And do you get any of this when running KDE on X11?
Online
-
"amdgpu.gttsize=1024"
That seems to be very low, could I set it to something like 4096? (The default is 76xx something)
https://wiki.archlinux.org/title/Stress … MemTest86+
I'll run it overnight and share the results here
the device acts up frequently - what happens if you completely disable the persistenced? And do you get any of this when running KDE on X11?
Yes, I've completely disabled persistenced and now KDE in X11 just shot itself on the foot while being idle and locked (not suspended). It froze the whole computer again- will come back to Wayland with persistenced enabled just in case.
dmidecode
more logs - KDE in X11 + nvidia-persistenced disabled + iommu=soft
Offline
4 DIMMS, 4x4GB, 6400 MT/s
Apparently neither the persistenced nor the display server trigger this.
This time time the module didn't crash, but
jul 06 13:05:49 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel: Tainted: P OE 6.15.4-zen2-1-zen #1
jul 06 13:05:49 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 06 13:05:49 malasdecisiones kernel: task:kworker/6:1 state:D stack:0 pid:177 tgid:177 ppid:2 task_flags:0x4208060 flags:0x00004000
jul 06 13:05:49 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 06 13:05:49 malasdecisiones kernel: Call Trace:
jul 06 13:05:49 malasdecisiones kernel: <TASK>
jul 06 13:05:49 malasdecisiones kernel: __schedule+0x451/0x2380
jul 06 13:05:49 malasdecisiones kernel: schedule_preempt_disabled+0x2e/0xe0
jul 06 13:05:49 malasdecisiones kernel: rwsem_down_write_slowpath+0x1ed/0x6c0
jul 06 13:05:49 malasdecisiones kernel: down_write+0x5a/0x60
jul 06 13:05:49 malasdecisiones kernel: os_acquire_rwlock_write+0x2b/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel: _nv051520rm+0x10/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel: _nv053004rm+0x28c/0x360 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel: _nv000839rm+0x27/0x70 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel: rm_acpi_notify+0xf1/0x280 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel: ? srso_alias_return_thunk+0x5/0xfbef5
jul 06 13:05:49 malasdecisiones kernel: acpi_ev_notify_dispatch+0x4e/0x70
jul 06 13:05:49 malasdecisiones kernel: acpi_os_execute_deferred+0x1a/0x30
jul 06 13:05:49 malasdecisiones kernel: process_one_work+0x193/0x350
jul 06 13:05:49 malasdecisiones kernel: worker_thread+0x254/0x3a0
jul 06 13:05:49 malasdecisiones kernel: ? __pfx_worker_thread+0x10/0x10
jul 06 13:05:49 malasdecisiones kernel: kthread+0xfc/0x240
jul 06 13:05:49 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 06 13:05:49 malasdecisiones kernel: ret_from_fork+0x34/0x50
jul 06 13:05:49 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 06 13:05:49 malasdecisiones kernel: ret_from_fork_asm+0x1a/0x30
jul 06 13:05:49 malasdecisiones kernel: </TASK>
jul 06 13:05:49 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel: Tainted: P OE 6.15.4-zen2-1-zen #1
stall for IO
There's nothing in the journal 2m ahead of this to maybe explain it and if hangs forever
jul 06 13:05:49 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel: INFO: task kworker/7:3:10123 blocked for more than 122 seconds.
jul 06 13:07:52 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 245 seconds.
jul 06 13:07:52 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 245 seconds.
jul 06 13:07:52 malasdecisiones kernel: INFO: task kworker/7:3:10123 blocked for more than 245 seconds.
jul 06 13:09:55 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 368 seconds.
jul 06 13:09:55 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 368 seconds.
jul 06 13:09:55 malasdecisiones kernel: INFO: task kworker/7:3:10123 blocked for more than 368 seconds.
jul 06 13:11:58 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 491 seconds.
~1h earlier there's
jul 06 12:04:10 malasdecisiones wpa_supplicant[800]: wlp2s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=9999 txrate=292500
jul 06 12:04:10 malasdecisiones steam[2654]: pid 4973 != 4968, skipping destruction (fork without exec?)
jul 06 12:04:10 malasdecisiones kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
jul 06 12:04:10 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 06 12:04:10 malasdecisiones kernel: nvidia 0000:01:00.0: device [10de:25a2] error status/mask=00000040/0000a000
jul 06 12:04:10 malasdecisiones kernel: nvidia 0000:01:00.0: [ 6] BadTLP
jul 06 12:04:10 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jul 06 12:04:10 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jul 06 12:04:11 malasdecisiones steam[2654]: Game Recording - game stopped [gameid=322170]
but isolated.
Online
So what's next? Does this mean that there's no other solution?
Offline
That means the errors are kinda all over the place, at "best" it's somehow a bug in the nvidia kernel module but because of the MCEs you certainly want to check memtest86+ to rule out RAM defects (you might just have to downclock it a bit)
Online
Hi again
Offline
If we take that at face value, can you disable the nvidia GPU in the firmware (uefi)?
Otherwise add "pci_stub.ids=10de:25a2" to the https://wiki.archlinux.org/title/Kernel_parameters (which will hide the GPU from the rest of the OS)
If the error is coming from there, the system should™ stabilize.
Online
Woah, it stabilized. No more mce errors, no more kworker and nv hung tasks nor kernel taints, it doesn't halt, and finally a graceful shutdown. Nothing apart from the obvious performance hit for using a single low-power card and that games run at -2 FPS.
The only error I see now is this:
jul 08 01:29:15 malasdecisiones kernel: NVRM: GPU 0000:01:00.0 is already bound to pci-stub.
jul 08 01:29:15 malasdecisiones kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
jul 08 01:29:15 malasdecisiones kernel: NVRM: This can occur when another driver was loaded and
NVRM: obtained ownership of the NVIDIA device(s).
jul 08 01:29:15 malasdecisiones kernel: NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
jul 08 01:29:15 malasdecisiones kernel: NVRM: No NVIDIA devices probed.
jul 08 01:29:15 malasdecisiones kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 511
jul 08 01:29:16 malasdecisiones kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 511
I couldn't stay for too long with the NVcard disabled so I enabled it back, but at least this will point to the right direction.
Offline
'key so it's the nvidia GPU and/or driver.
Did this already happen w/ the 570xx or 565xx driver versions?
You could try
1. the non-zen, main kernel
2. https://wiki.archlinux.org/title/NVIDIA … P_firmware ("nvidia.NVreg_EnableGpuFirmware=0" if you want to add it as kernel parameter) - though there're not indications of the GSP being involved here (more like a bus error)
3. nvidia-open (do not disable the GSP w/ that, it critically relies on it!)
4. downgrade to nvidia-dkms 570xx
https://archive.archlinux.org/packages/ … kg.tar.zst
https://archive.archlinux.org/packages/ … kg.tar.zst
and you'll need the kernel headers, https://wiki.archlinux.org/title/Dynami … le_Support
Online
1. I'm typing this on my now-hanged system with the vanilla kernel.
2. Already disabled since the first time you pointed it out
3. Wouldn't that create a greater performance hit than all the troubleshooting steps I've did until now? My game (pixel gun 3d) now runs at 20 fps on battery
4. It made my GPU completely undetectable, updating back to 575 brought it back to life (I had the headers installed)
Strangely-not-strangely, checking in the logs right now shows no mce nor hung kworker tasks
Offline
1.
2. ah, https://bbs.archlinux.org/viewtopic.php … 3#p2248863
3. nvidia is moving towards nvidia-open for newer generations, this isn't nouveau - however:
4. What does/did
dkms status
say? Do oyu have the journal for that boot?
jul 10 14:51:27 malasdecisiones kernel: usercopy: Kernel memory exposure attempt detected from process stack (offset 18446744073709551324, size 12906)!
jul 10 14:51:27 malasdecisiones kernel: ------------[ cut here ]------------
jul 10 14:51:27 malasdecisiones kernel: kernel BUG at mm/usercopy.c:102!
jul 10 14:51:27 malasdecisiones kernel: Oops: invalid opcode: 0000 [#1] SMP NOPTI
jul 10 14:51:27 malasdecisiones kernel: CPU: 9 UID: 0 PID: 19203 Comm: kworker/9:3 Tainted: P OE 6.15.5-arch1-1 #1 PREEMPT(full) d4bfd61c9343a8a5cd6331b14f20b7f081e8650a
jul 10 14:51:27 malasdecisiones kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
jul 10 14:51:27 malasdecisiones kernel: Hardware name: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.320 02/21/2023
jul 10 14:51:27 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 10 14:51:27 malasdecisiones kernel: RIP: 0010:usercopy_abort+0x76/0x78
jul 10 14:51:27 malasdecisiones kernel: Code: c6 24 67 a8 a8 eb 0e 48 c7 c2 60 d6 aa a8 48 c7 c6 36 a1 a7 a8 41 50 48 c7 c7 f8 d7 9c a8 49 89 c0 41 53 41 52 e8 6a e0 fe ff <0f> 0b 48 89 d9 49 89 e8 48 2b 0a 31 f6 44 89 f2 48 c7 c7 83 3b a8
jul 10 14:51:27 malasdecisiones kernel: RSP: 0018:ffffd3084c1d7c18 EFLAGS: 00010246
jul 10 14:51:27 malasdecisiones kernel: RAX: 000000000000006f RBX: ffffd3084c1d7d5c RCX: 0000000000000000
jul 10 14:51:27 malasdecisiones kernel: RDX: 0000000000000000 RSI: ffff89822e65cbc0 RDI: ffff89822e65cbc0
jul 10 14:51:27 malasdecisiones kernel: RBP: 000000000000326a R08: 0000000000000000 R09: 00000000ffffefff
jul 10 14:51:27 malasdecisiones kernel: R10: ffffffffa9460f20 R11: ffffd3084c1d7ab8 R12: ffffd3084c1dafc6
jul 10 14:51:27 malasdecisiones kernel: R13: ffff897f37a2aea8 R14: 0000000000000001 R15: ffffd3084c1d7d5c
jul 10 14:51:27 malasdecisiones kernel: FS: 0000000000000000(0000) GS:ffff89828456f000(0000) knlGS:0000000000000000
jul 10 14:51:27 malasdecisiones kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 10 14:51:27 malasdecisiones kernel: CR2: 000007cc0437d000 CR3: 0000000107b10000 CR4: 0000000000f50ef0
jul 10 14:51:27 malasdecisiones kernel: PKRU: 55555554
jul 10 14:51:27 malasdecisiones kernel: Call Trace:
jul 10 14:51:27 malasdecisiones kernel: <TASK>
jul 10 14:51:27 malasdecisiones kernel: __check_object_size.cold+0x66/0xcb
jul 10 14:51:27 malasdecisiones kernel: os_memcpy_to_user+0x3a/0x80 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: _nv053022rm+0x78/0xb0 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: _nv055006rm+0x9c/0x1a0 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: _nv055004rm+0x212/0x500 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: _nv015696rm+0x469/0x680 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: _nv052961rm+0x29/0x30 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: ? _nv055007rm+0x60/0x60 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: rm_power_source_change_event+0x158/0x184 [nvidia 614753cb993aeab9cef497489458d078f467efc0]
jul 10 14:51:27 malasdecisiones kernel: ? acpi_ut_release_mutex+0xef/0x1b0
jul 10 14:51:27 malasdecisiones kernel: acpi_ev_notify_dispatch+0x4e/0x70
jul 10 14:51:27 malasdecisiones kernel: acpi_os_execute_deferred+0x1a/0x30
jul 10 14:51:27 malasdecisiones kernel: process_one_work+0x193/0x350
jul 10 14:51:27 malasdecisiones kernel: worker_thread+0x2d7/0x410
jul 10 14:51:27 malasdecisiones kernel: ? __pfx_worker_thread+0x10/0x10
jul 10 14:51:27 malasdecisiones kernel: kthread+0xfc/0x240
jul 10 14:51:27 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 10 14:51:27 malasdecisiones kernel: ret_from_fork+0x34/0x50
jul 10 14:51:27 malasdecisiones kernel: ? __pfx_kthread+0x10/0x10
jul 10 14:51:27 malasdecisiones kernel: ret_from_fork_asm+0x1a/0x30
jul 10 14:51:27 malasdecisiones kernel: </TASK>
jul 10 14:51:27 malasdecisiones kernel: Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_soc_dmic snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_soc_acpi_amd_match snd_amd_sdw_acpi soundwire_amd soundwire_generic_allocation mt7921e soundwire_bus snd_hda_codec_realtek mt7921_common snd_soc_sdca snd_hda_codec_generic mt792x_lib snd_hda_scodec_component snd_soc_core snd_hda_codec_hdmi mt76_connac_lib intel_rapl_msr amd_atl snd_compress mt76 joydev snd_hda_intel snd_hda_scodec_cs35l41_spi mousedev ac97_bus intel_rapl_common snd_intel_dspcfg snd_pcm_dmaengine uvcvideo snd_rpl_pci_acp6x snd_intel_sdw_acpi snd_acp_pci videobuf2_vmalloc mac80211 snd_amd_acpi_mach uvc snd_hda_codec snd_acp_legacy_common videobuf2_memops kvm_amd videobuf2_v4l2 snd_hda_core snd_hda_scodec_cs35l41_i2c btusb snd_pci_acp6x
jul 10 14:51:27 malasdecisiones kernel: libarc4 snd_ctl_led snd_hda_scodec_cs35l41 videobuf2_common snd_pci_acp5x snd_hwdep snd_hda_cs_dsp_ctls btrtl hid_multitouch kvm btintel sp5100_tco asus_nb_wmi cfg80211 snd_soc_cs_amp_lib snd_rn_pci_acp3x snd_pcm btbcm videodev asus_wmi ucsi_acpi snd_acp_config snd_soc_cs35l41_lib snd_soc_acpi snd_timer cs_dsp irqbypass btmtk i2c_piix4 typec_ucsi platform_profile bluetooth snd mc rapl sparse_keymap pcspkr typec wmi_bmof thunderbolt snd_pci_acp3x k10temp soundcore rfkill i2c_smbus roles i2c_hid_acpi serial_multi_instantiate i2c_hid acpi_tad amd_pmc mac_hid i2c_dev sg crypto_user loop nfnetlink ip_tables x_tables dm_crypt encrypted_keys trusted asn1_encoder tee nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) dm_mod amdgpu amdxcp polyval_clmulni i2c_algo_bit polyval_generic drm_exec ghash_clmulni_intel gpu_sched sha512_ssse3 nvidia(POE) sdhci_pci sha256_ssse3 drm_suballoc_helper sdhci_uhs2 nvme sha1_ssse3 drm_panel_backlight_quirks sdhci aesni_intel drm_buddy drm_ttm_helper nvme_core crypto_simd cqhci
jul 10 14:51:27 malasdecisiones kernel: ttm drm_display_helper cryptd nvme_keyring mmc_core ccp video cec nvme_auth wmi serio_raw
jul 10 14:51:27 malasdecisiones kernel: ---[ end trace 0000000000000000 ]---
jul 10 14:51:27 malasdecisiones kernel: RIP: 0010:usercopy_abort+0x76/0x78
jul 10 14:51:27 malasdecisiones kernel: Code: c6 24 67 a8 a8 eb 0e 48 c7 c2 60 d6 aa a8 48 c7 c6 36 a1 a7 a8 41 50 48 c7 c7 f8 d7 9c a8 49 89 c0 41 53 41 52 e8 6a e0 fe ff <0f> 0b 48 89 d9 49 89 e8 48 2b 0a 31 f6 44 89 f2 48 c7 c7 83 3b a8
jul 10 14:51:27 malasdecisiones kernel: RSP: 0018:ffffd3084c1d7c18 EFLAGS: 00010246
jul 10 14:51:27 malasdecisiones kernel: RAX: 000000000000006f RBX: ffffd3084c1d7d5c RCX: 0000000000000000
jul 10 14:51:27 malasdecisiones kernel: RDX: 0000000000000000 RSI: ffff89822e65cbc0 RDI: ffff89822e65cbc0
jul 10 14:51:27 malasdecisiones kernel: RBP: 000000000000326a R08: 0000000000000000 R09: 00000000ffffefff
jul 10 14:51:27 malasdecisiones kernel: R10: ffffffffa9460f20 R11: ffffd3084c1d7ab8 R12: ffffd3084c1dafc6
jul 10 14:51:27 malasdecisiones kernel: R13: ffff897f37a2aea8 R14: 0000000000000001 R15: ffffd3084c1d7d5c
jul 10 14:51:27 malasdecisiones kernel: FS: 0000000000000000(0000) GS:ffff89828456f000(0000) knlGS:0000000000000000
jul 10 14:51:27 malasdecisiones kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 10 14:51:27 malasdecisiones kernel: CR2: 000007cc0437d000 CR3: 0000000107b10000 CR4: 0000000000f50ef0
jul 10 14:51:27 malasdecisiones kernel: PKRU: 55555554
That's the moment of the freeze?
Did it happen when you pulled or plugged the power chord?
Online
Over the past year I faced this issue 6-10 times. Today, while browsing the internet, kworker processes were using all CPU resources for 1-3 minutes and then everything went back to normal.
kernel: 6.15.5-arch1-1
CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
GFX: Intel UHD Graphics 620
F2FS for root
KDE
vm.swappiness = 10
Last edited by massimo (2025-07-11 07:46:24)
Offline
nvidia is moving towards nvidia-open for newer generations, this isn't nouveau - however:
I'm gonna try it this evening (I was busy, sorry for the late response)
Do oyu have the journal for that boot?
Unfortunately not anymore, I did not remember how many boots ago that happened.
Did it happen when you pulled or plugged the power chord?
I mostly use my laptop plugged in, so independently of that, it hangs
Also, I'm getting a really worse performance on battery (went from 80 to 20 fps), maybe I could blame disabling the GSP firmware for that
Offline
It's unlikely that dropping the GSP had that kind of battery impact - the GPU still needs to perform pretty much the same job.
Do you have a journal for /that/ boot?
You might still want to attempt to downgrade to nvidia-dkms 570xx if nvidia-open doesn't help, I suspect something wen wrong w/ the dkms build.
Online
It's unlikely but it absolutely does impact the performance a lot, and to prove that here's the mangohud screenshot that shows that the game now reaches ~100 fps on battery with absolutely no slowdowns- this with nvidia-open, GSP and NTSync enabled- so the GSP being disabled maybe (most likely for me) caused issues with the game's fire effects and all that. And it has not hanged yet nor has any kworker hanged, surprisingly...
I forgot to tell you that downgrading to 570xx made the gpu completely disappear from the system and only upgrading back fixed the issue
Anyway, I'm going to try to hold this computer alive for more than three days (it usually doesn't last more than one and a half) and if it survives then the issue would be... fixed?
Offline