You are not logged in.

#1 2025-07-03 16:49:44

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

System stalled. Can't launch programs and Plasma becomes unresponsive.

I can't find a reliable way to reproduce this but this has happened 4 times as for now:

  • The first one happened apparently with no cause

  • After trying to play a video using VLC

  • While browsing a folder with Gwenview

  • After running beesd to deduplicate my disk

So when it happens, if I try to launch e.g. the browser, it just spawns the placeholder window but it never really loads, then I spawn a terminal (which does launch) and type a command like nvtop, htop or whatever command, it just stays on hold -does not output anything but also doesn't return to the command line, it just stays there like when you type cat, and ^C doesn't do anything. Then I try to soft-reboot or fully shutdown but the system halts, as you could see it in the logs. Have to force the shutdown and power back on to get it running again.

The most highlighted entries in the journal I can see are: dbus-broker-launch failed requests, Plasma and program coredumps, and pipewire errors.

This is happening very often lately and I would like to know if it's some kind of hardware problem -which I think it's not because it has happened to me in other laptops.

first time coredump log
coredump files
latest journal log

Last edited by techmanwalker (2025-07-03 16:54:23)

Offline

#2 2025-07-04 03:29:32

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

Bump. It happened again while playing a video with VLC but this time I co some more logs.
Hope to find a fix for this issue soon sad ty in advance

Blocked kernel threads
jul 03 18:47:45 malasdecisiones kernel: INFO: task nv_queue:342 blocked for more than 368 seconds.
jul 03 18:47:45 malasdecisiones kernel:       Tainted: P           OE       6.15.4-zen2-1-zen #1
jul 03 18:47:45 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 03 18:47:45 malasdecisiones kernel: task:nv_queue        state:D stack:0     pid:342   tgid:342   ppid:2      task_flags:0x208040 flags:0x000>
jul 03 18:47:45 malasdecisiones kernel: Call Trace:
jul 03 18:47:45 malasdecisiones kernel:  <TASK>
jul 03 18:47:45 malasdecisiones kernel:  __schedule+0x451/0x2380
jul 03 18:47:45 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:47:45 malasdecisiones kernel:  ? timerqueue_add+0x73/0xd0
jul 03 18:47:45 malasdecisiones kernel:  schedule_preempt_disabled+0x2e/0xe0
jul 03 18:47:45 malasdecisiones kernel:  rwsem_down_write_slowpath+0x1ed/0x6c0
jul 03 18:47:45 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:47:45 malasdecisiones kernel:  down_write+0x5a/0x60
jul 03 18:47:45 malasdecisiones kernel:  os_acquire_rwlock_write+0x2b/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  _nv051520rm+0x10/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  _nv053004rm+0x28c/0x360 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  _nv059758rm+0x63/0x230 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  rm_execute_work_item+0x66/0x1f0 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  os_execute_work_item+0x68/0x90 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  _main_loop+0x93/0x150 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:47:45 malasdecisiones kernel:  ? __pfx__main_loop+0x10/0x10 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:47:45 malasdecisiones kernel:  kthread+0xfc/0x240
jul 03 18:47:45 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 03 18:47:45 malasdecisiones kernel:  ret_from_fork+0x34/0x50
jul 03 18:47:45 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 03 18:47:45 malasdecisiones kernel:  ret_from_fork_asm+0x1a/0x30
jul 03 18:47:45 malasdecisiones kernel:  </TASK>

jul 03 18:49:47 malasdecisiones kernel: INFO: task kworker/15:1:227 blocked for more than 491 seconds.
jul 03 18:49:47 malasdecisiones kernel:       Tainted: P           OE       6.15.4-zen2-1-zen #1
jul 03 18:49:47 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 03 18:49:47 malasdecisiones kernel: task:kworker/15:1    state:D stack:0     pid:227   tgid:227   ppid:2      task_flags:0x4208060 flags:0x00>
jul 03 18:49:47 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 03 18:49:47 malasdecisiones kernel: Call Trace:
jul 03 18:49:47 malasdecisiones kernel:  <TASK>
jul 03 18:49:47 malasdecisiones kernel:  __schedule+0x451/0x2380
jul 03 18:49:47 malasdecisiones kernel:  ? ttwu_queue_wakelist+0xf7/0x110
jul 03 18:49:47 malasdecisiones kernel:  schedule_preempt_disabled+0x2e/0xe0
jul 03 18:49:47 malasdecisiones kernel:  rwsem_down_write_slowpath+0x1ed/0x6c0
jul 03 18:49:47 malasdecisiones kernel:  ? ep_autoremove_wake_function+0x16/0x60
jul 03 18:49:47 malasdecisiones kernel:  ? number+0x4ac/0x5c0
jul 03 18:49:47 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:49:47 malasdecisiones kernel:  down_write+0x5a/0x60
jul 03 18:49:47 malasdecisiones kernel:  os_acquire_rwlock_write+0x2b/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel:  _nv051520rm+0x10/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel:  _nv053004rm+0x28c/0x360 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel:  _nv000839rm+0x27/0x70 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel:  rm_acpi_notify+0xf1/0x280 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 03 18:49:47 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 03 18:49:47 malasdecisiones kernel:  acpi_ev_notify_dispatch+0x4e/0x70
jul 03 18:49:47 malasdecisiones kernel:  acpi_os_execute_deferred+0x1a/0x30
jul 03 18:49:47 malasdecisiones kernel:  process_one_work+0x193/0x350
jul 03 18:49:47 malasdecisiones kernel:  worker_thread+0x254/0x3a0
jul 03 18:49:47 malasdecisiones kernel:  ? __pfx_worker_thread+0x10/0x10
jul 03 18:49:47 malasdecisiones kernel:  kthread+0xfc/0x240
jul 03 18:49:47 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 03 18:49:47 malasdecisiones kernel:  ret_from_fork+0x34/0x50
jul 03 18:49:47 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 03 18:49:47 malasdecisiones kernel:  ret_from_fork_asm+0x1a/0x30
jul 03 18:49:47 malasdecisiones kernel:  </TASK>
Hardware errors
jul 03 19:03:31 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:03:31 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC15_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7cfe200
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000001ff0a240701
jul 03 19:03:31 malasdecisiones kernel: 
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 19:03:31 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC16_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7cfe200
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x000001ff0a240701
jul 03 19:03:31 malasdecisiones kernel: 
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce23c0
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x000001ff0a240700
jul 03 19:03:31 malasdecisiones kernel: 
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2300
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600350f00, Syndrome: 0x000001ff0a240700
jul 03 19:03:31 malasdecisiones kernel: 
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 19:03:31 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
Full log

Offline

#3 2025-07-04 20:52:35

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

> modeset=1
Nonsense?

> preempt=full amd_iommu=on pcie.aspm.policy=default
Why?
Notably IOMMU and

jul 03 20:09:03 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC15_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel: 
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 20:09:03 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC16_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel: 
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600250f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel: 
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Corrected error, no action required.
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: CPU:0 (19:44:1) MC18_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc204000000c011b
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Error Addr: 0x00000000f7ce2200
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: IPID: 0x0000009600350f00, Syndrome: 0x000001ff0a240700
jul 03 20:09:03 malasdecisiones kernel: 
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 12
jul 03 20:09:03 malasdecisiones kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

There're multiple of those and only after a couple of them you start to get hung tasks, but

jul 03 18:41:40 malasdecisiones kernel: mce: [Hardware Error]: Machine check events logged
jul 03 18:43:39 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

is pretty much the 2m gap suggesting that that error lead into the hung tasks.

Do you get this w/ "iommu=soft"?

Also

jul 03 17:36:04 archlinux kernel: smpboot: CPU0: AMD Ryzen 9 6900HX with Radeon Graphics (family: 0x19, model: 0x44, stepping: 0x1)

https://wiki.archlinux.org/title/Ryzen#Random_reboots
While not exactly the symptoms, I'd take a very close look there…

Though interestingly

jul 03 17:47:03 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 17:52:31 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 17:57:59 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:03:26 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:08:54 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:14:22 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:19:49 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:25:17 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:30:45 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:36:12 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:41:40 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 18:58:03 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:03:31 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:08:58 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:14:26 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:19:54 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:25:21 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:30:49 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:36:17 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:41:44 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:47:12 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:52:40 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 19:58:07 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 20:03:35 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed
jul 03 20:09:03 malasdecisiones kernel: mce_notify_irq: 2 callbacks suppressed

WIth the exception of the hung task flurry, they seem to hit every 5 minutes

These lead to the

jul 03 18:43:39 malasdecisiones kernel: INFO: task kworker/15:1:227 blocked for more than 122 seconds.
jul 03 18:43:39 malasdecisiones kernel:       Tainted: P           OE       6.15.4-zen2-1-zen #1
jul 03 18:43:39 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 03 18:43:39 malasdecisiones kernel: task:kworker/15:1    state:D stack:0     pid:227   tgid:227   ppid:2      task_flags:0x4208060 flags:0x00004000
jul 03 18:43:39 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 03 18:43:39 malasdecisiones kernel: Call Trace:
jul 03 18:43:39 malasdecisiones kernel:  <TASK>
jul 03 18:43:39 malasdecisiones kernel:  __schedule+0x451/0x2380
jul 03 18:43:39 malasdecisiones kernel:  ? ttwu_queue_wakelist+0xf7/0x110

2 minutes later

jul 03 17:36:04 archlinux kernel:  nvme0n1: p1 p2 p3 p4 p5 p6 p7

Is there a parallel windows installation?


> rd.luks.options=timeout=0,discard
Be very careful w/ discard, https://wiki.archlinux.org/title/Solid_ … nuous_TRIM

Offline

#4 2025-07-05 05:21:47

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

seth wrote:

> modeset=1
Nonsense?

I run an Nvidia card (3050 mobile)

> preempt=full amd_iommu=on pcie.aspm.policy=default
Why?

I had those enabled in my old laptop, so I carried those over (mostly because of the encrypted btrfs)

Do you get this w/ "iommu=soft"?

Yes, I got those exact same errors after playing a video in VLC, but I don't think that the fact that it was VLC is precisely related

https://wiki.archlinux.org/title/Ryzen#Random_reboots
While not exactly the symptoms, I'd take a very close look there…

There isn't any voltage control in my UEFI. It's an Asus Zenbook Pro 17 laptop...

Is there a parallel windows installation?

Windows 11 alongside, so yes

Be very careful w/ discard

When I was a bit newer in Linux I got a huge performance hit with my first LUKS encrypted system. I got like 15 MB/s write speed, so adding that flag solved the issue. I've had no other (performance related) disk issues since yet, also my current SSD is Samsung (SAMSUNG MZVL21T0HCLR-00B00)

Offline

#5 2025-07-05 06:30:52

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

I run an Nvidia card (3050 mobile)

This is still nonsense, the frequently advertised kernel parameter is "nvidia_drm.modeset=1" and the feature is enabled by default since 555xx or 565xx  - the only remaining function is to block the simpledrm device.

I had those enabled in my old laptop

Do not enforce an IOMMU and it's pcie_aspm.policy and "default" is the "default", anyway https://wiki.archlinux.org/title/Power_ … Management

Windows 11 alongside, so yes

3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.

There isn't any voltage control in my UEFI.

I take that the ellipsis is meant to say that "asus firmware is known to be unconfigurable shit", but there's nothing "precision boost overdrive" related? Can you adjust any clock rates (are you overclocking the system)

adding that flag solved the issue

See the linked wiki, consider a periodic trim and make sure your new(?) nvme actually supports this.

Offline

#6 2025-07-05 11:14:39

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

seth wrote:

"nvidia_drm.modeset=1" and the feature is enabled by default since 555xx or 565xx

Really? I remember reading back then that nvidia_drm.modeset=1 should be replaced with that option but I think I just misread that

Do not enforce an IOMMU and it's pcie_aspm.policy and "default" is the "default", anyway

Well well, removing it...

3rd link below. Mandatory.
Disable it

I disabled it a while ago big_smile

I take that the ellipsis is meant to say that "asus firmware is known to be unconfigurable shit", but there's nothing "precision boost overdrive" related? Can you adjust any clock rates (are you overclocking the system)

No, not at all. The UI is indeed very Fisher-Price coded with hardly the option to disable Secure Boot, so I'm a bit out of luck there

Last edited by techmanwalker (2025-07-05 11:22:53)

Offline

#7 2025-07-05 14:26:34

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

There's a difference between a module option in modprobe.d and as a kernel parameter - on the kernel commandline, modprobe still needs to somehow know what module the parameter to apply to.

I disabled it a while ago

Have you made sure that MS hasn't re-enabled it interim (because that infrequently happens w/ random updates)

Did you test iommu=soft and do you still get those MCEs?

Offline

#8 2025-07-05 14:28:55

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

I have not even booted Windows since disabling it, so yes... and in fact I got the same errors and my system halted again sad

Offline

#9 2025-07-05 15:03:10

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

On iommu=soft?
Are you legitimately running OOM? (12 is ENOMEM) - you've 16GB RAM and 32GB swap, so this seems kinda unlikely.

Does your fisher-price BIOS allow you to disable internal devices like the webcam - or wifi? Bluetooth, the fingerprint reader etc et pp?

Edit: also

jul 03 17:36:04 archlinux kernel: DMI: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.310 06/17/2022

are there maybe bios updates available for the device?

Last edited by seth (2025-07-05 15:04:19)

Offline

#10 2025-07-06 01:21:18

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

seth wrote:

On iommu=soft?

yes

Does your fisher-price BIOS allow you to disable internal devices like the webcam - or wifi? Bluetooth, the fingerprint reader etc et pp?

Yesn't

are there maybe bios updates available for the device?

BIOS UM6702RC.320 02/21/2023

This is how far MyASUS got-

EDIT: Linux completely froze again while doing a high CPU consuming task (AV1 encoding), this time it printed a big yellow message at the end of the full journal log.

Last edited by techmanwalker (2025-07-06 02:10:59)

Offline

#11 2025-07-06 06:49:30

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

There're no MCE errors in the last on

jul 05 18:53:15 malasdecisiones kernel: BUG: unable to handle page fault for address: 0000002000000009
jul 05 18:53:15 malasdecisiones kernel: #PF: supervisor read access in kernel mode
jul 05 18:53:15 malasdecisiones kernel: #PF: error_code(0x0000) - not-present page
jul 05 18:53:15 malasdecisiones kernel: PGD 0 P4D 0 
jul 05 18:53:15 malasdecisiones kernel: Oops: Oops: 0000 [#1] SMP NOPTI
jul 05 18:53:15 malasdecisiones kernel: CPU: 7 UID: 0 PID: 24033 Comm: kworker/7:0 Tainted: P           OE       6.15.4-zen2-1-zen #1 PREEMPT(full)  1435e5a15a997e99695c9b8e649db9c0130dcff7
jul 05 18:53:15 malasdecisiones kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
jul 05 18:53:15 malasdecisiones kernel: Hardware name: ASUSTeK COMPUTER INC. Zenbook UM6702RC_RM6702RC_BM6702RC UM6702RC_UM6702RC/UM6702RC, BIOS UM6702RC.320 02/21/2023
jul 05 18:53:15 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 05 18:53:15 malasdecisiones kernel: RIP: 0010:_nv055006rm+0x2c/0x1a0 [nvidia]
jul 05 18:53:15 malasdecisiones kernel: Code: 1f 00 41 57 41 56 41 55 49 89 f5 41 54 41 89 cc 53 48 83 ed 10 48 89 d3 48 85 d2 0f 84 0d 01 00 00 48 85 f6 0f 84 4c 01 00 00 <8b> 42 08 4c 8d 4a 40 a8 02 74 79 85 c9 0f 84 b9 00 00 00 4c 8b 7a
jul 05 18:53:15 malasdecisiones kernel: RSP: 0018:ffffd2a8caa4fca0 EFLAGS: 00010282
jul 05 18:53:15 malasdecisiones kernel: RAX: 0000000000000000 RBX: 0000002000000001 RCX: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: RDX: 0000002000000001 RSI: ffff8c481ea55ea8 RDI: ffffffffc2d89200
jul 05 18:53:15 malasdecisiones kernel: RBP: ffff8c481ea55d60 R08: 0000000000000000 R09: ffffffffc26c75da
jul 05 18:53:15 malasdecisiones kernel: R10: ffff8c49e90d9840 R11: fffff2af4ba43640 R12: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: R13: ffff8c481ea55ea8 R14: ffff8c481ea55ea8 R15: ffff8c481ea55e68
jul 05 18:53:15 malasdecisiones kernel: FS:  0000000000000000(0000) GS:ffff8c4b9aaa9000(0000) knlGS:0000000000000000
jul 05 18:53:15 malasdecisiones kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 05 18:53:15 malasdecisiones kernel: CR2: 0000002000000009 CR3: 00000003949f4000 CR4: 0000000000f50ef0
jul 05 18:53:15 malasdecisiones kernel: PKRU: 55555554
jul 05 18:53:15 malasdecisiones kernel: Call Trace:
jul 05 18:53:15 malasdecisiones kernel:  <TASK>
jul 05 18:53:15 malasdecisiones kernel:  _nv055004rm+0x212/0x500 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  _nv015696rm+0x424/0x680 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  _nv052961rm+0x29/0x30 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  ? _nv055007rm+0x60/0x60 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  rm_acpi_notify+0x126/0x280 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 05 18:53:15 malasdecisiones kernel:  acpi_ev_notify_dispatch+0x4e/0x70
jul 05 18:53:15 malasdecisiones kernel:  acpi_os_execute_deferred+0x1a/0x30
jul 05 18:53:15 malasdecisiones kernel:  process_one_work+0x193/0x350
jul 05 18:53:15 malasdecisiones kernel:  worker_thread+0x254/0x3a0
jul 05 18:53:15 malasdecisiones kernel:  ? __pfx_worker_thread+0x10/0x10
jul 05 18:53:15 malasdecisiones kernel:  kthread+0xfc/0x240
jul 05 18:53:15 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 05 18:53:15 malasdecisiones kernel:  ret_from_fork+0x34/0x50
jul 05 18:53:15 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 05 18:53:15 malasdecisiones kernel:  ret_from_fork_asm+0x1a/0x30
jul 05 18:53:15 malasdecisiones kernel:  </TASK>
jul 05 18:53:15 malasdecisiones kernel: Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_acp6x_pdm_dma snd_soc_acp6x_mach snd_soc_dmic snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_soc_acpi_amd_match snd_amd_sdw_acpi soundwire_amd mt7921e soundwire_generic_allocation mt7921_common soundwire_bus mt792x_lib snd_soc_sdca joydev mousedev mt76_connac_lib snd_hda_codec_realtek snd_hda_scodec_cs35l41_spi snd_soc_core mt76 snd_hda_codec_generic intel_rapl_msr amd_atl snd_compress ac97_bus intel_rapl_common snd_hda_scodec_component snd_hda_codec_hdmi snd_ctl_led snd_pcm_dmaengine snd_hda_intel snd_rpl_pci_acp6x snd_intel_dspcfg mac80211 uvcvideo snd_acp_pci snd_intel_sdw_acpi videobuf2_vmalloc snd_amd_acpi_mach uvc snd_hda_codec snd_acp_legacy_common snd_hda_scodec_cs35l41_i2c videobuf2_memops snd_hda_scodec_cs35l41 btusb snd_pci_acp6x
jul 05 18:53:15 malasdecisiones kernel:  libarc4 snd_hda_core videobuf2_v4l2 btrtl snd_pci_acp5x snd_hda_cs_dsp_ctls kvm_amd videobuf2_common asus_nb_wmi btintel snd_rn_pci_acp3x snd_hwdep snd_soc_cs_amp_lib sp5100_tco hid_multitouch cfg80211 snd_acp_config asus_wmi btbcm ucsi_acpi snd_pcm snd_soc_cs35l41_lib videodev snd_soc_acpi platform_profile typec_ucsi btmtk cs_dsp i2c_piix4 snd_timer kvm bluetooth irqbypass mc rapl sparse_keymap pcspkr snd typec wmi_bmof thunderbolt k10temp soundcore rfkill snd_pci_acp3x i2c_smbus roles i2c_hid_acpi i2c_hid serial_multi_instantiate mac_hid amd_pmc acpi_tad i2c_dev sg crypto_user loop nfnetlink ip_tables x_tables dm_crypt encrypted_keys trusted asn1_encoder tee nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) dm_mod amdgpu amdxcp polyval_clmulni i2c_algo_bit polyval_generic drm_exec ghash_clmulni_intel sha512_ssse3 gpu_sched sdhci_pci sha256_ssse3 drm_suballoc_helper sdhci_uhs2 sha1_ssse3 drm_panel_backlight_quirks nvme sdhci aesni_intel drm_buddy crypto_simd cqhci drm_ttm_helper nvme_core
jul 05 18:53:15 malasdecisiones kernel:  drm_display_helper cryptd ttm mmc_core ccp nvme_keyring cec video nvme_auth wmi serio_raw
jul 05 18:53:15 malasdecisiones kernel: CR2: 0000002000000009
jul 05 18:53:15 malasdecisiones kernel: ---[ end trace 0000000000000000 ]---
jul 05 18:53:15 malasdecisiones kernel: RIP: 0010:_nv055006rm+0x2c/0x1a0 [nvidia]
jul 05 18:53:15 malasdecisiones kernel: Code: 1f 00 41 57 41 56 41 55 49 89 f5 41 54 41 89 cc 53 48 83 ed 10 48 89 d3 48 85 d2 0f 84 0d 01 00 00 48 85 f6 0f 84 4c 01 00 00 <8b> 42 08 4c 8d 4a 40 a8 02 74 79 85 c9 0f 84 b9 00 00 00 4c 8b 7a
jul 05 18:53:15 malasdecisiones kernel: RSP: 0018:ffffd2a8caa4fca0 EFLAGS: 00010282
jul 05 18:53:15 malasdecisiones kernel: RAX: 0000000000000000 RBX: 0000002000000001 RCX: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: RDX: 0000002000000001 RSI: ffff8c481ea55ea8 RDI: ffffffffc2d89200
jul 05 18:53:15 malasdecisiones kernel: RBP: ffff8c481ea55d60 R08: 0000000000000000 R09: ffffffffc26c75da
jul 05 18:53:15 malasdecisiones kernel: R10: ffff8c49e90d9840 R11: fffff2af4ba43640 R12: 0000000000000000
jul 05 18:53:15 malasdecisiones kernel: R13: ffff8c481ea55ea8 R14: ffff8c481ea55ea8 R15: ffff8c481ea55e68
jul 05 18:53:15 malasdecisiones kernel: FS:  0000000000000000(0000) GS:ffff8c4b9aaa9000(0000) knlGS:0000000000000000
jul 05 18:53:15 malasdecisiones kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 05 18:53:15 malasdecisiones kernel: CR2: 0000002000000009 CR3: 00000003949f4000 CR4: 0000000000f50ef0
jul 05 18:53:15 malasdecisiones kernel: PKRU: 55555554
jul 05 18:53:15 malasdecisiones kernel: note: kworker/7:0[24033] exited with irqs disabled
jul 05 18:10:27 archlinux kernel: Memory: 15298888K/15988900K available (22464K kernel code, 2944K rwdata, 16252K rodata, 4784K init, 4744K bss, 667536K reserved, 0K cma-reserved)
jul 05 18:10:29 archlinux kernel: [drm] amdgpu: 512M of VRAM memory ready
jul 05 18:10:29 archlinux kernel: [drm] amdgpu: 7610M of GTT memory ready.

1. dmidecode
2. "amdgpu.gttsize=1024", https://wiki.archlinux.org/title/Kernel_parameters
3. https://wiki.archlinux.org/title/Stress … MemTest86+ (nb. the "at least 10 cycles" and my suggestion is to just run it "over night" (16+ h)

Although (ignoring the previous errors)

jul 05 18:10:31 archlinux kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:10:31 archlinux kernel: [drm] Initialized nvidia-drm 0.0.0 for 0000:01:00.0 on minor 0
jul 05 18:10:31 archlinux kernel: nvidia 0000:01:00.0: [drm] No compatible format found
jul 05 18:10:31 archlinux kernel: nvidia 0000:01:00.0: [drm] Cannot find any crtc or sizes
jul 05 18:10:33 malasdecisiones systemd[1]: Starting NVIDIA Persistence Daemon...
jul 05 18:10:33 malasdecisiones nvidia-persistenced[699]: Started (699)
jul 05 18:10:33 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:10:33 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:10:33 malasdecisiones systemd[1]: Started NVIDIA Persistence Daemon.
jul 05 18:10:47 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:11:00 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0:   device [10de:25a2] error status/mask=00000040/0000a000
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0:    [ 6] BadTLP                
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0:   device [10de:25a2] error status/mask=00000040/0000a000
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0:    [ 6] BadTLP                
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0:   device [10de:25a2] error status/mask=00000040/0000a000
jul 05 18:38:09 malasdecisiones kernel: nvidia 0000:01:00.0:    [ 6] BadTLP                
jul 05 18:38:10 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:51:48 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:52:13 malasdecisiones kernel: nvidia-modeset: WARNING: GPU:0: Correcting number of heads for current head configuration (0x00)
jul 05 18:53:15 malasdecisiones kernel: RIP: 0010:_nv055006rm+0x2c/0x1a0 [nvidia]
jul 05 18:53:15 malasdecisiones kernel:  _nv055004rm+0x212/0x500 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  _nv015696rm+0x424/0x680 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 05 18:53:15 malasdecisiones kernel:  _nv052961rm+0x29/0x30 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]

the device acts up frequently - what happens if you completely disable the persistenced? And do you get any of this when running KDE on X11?

Offline

#12 2025-07-06 20:41:04

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

-

seth wrote:

"amdgpu.gttsize=1024"

That seems to be very low, could I set it to something like 4096? (The default is 76xx something)

I'll run it overnight and share the results here

the device acts up frequently - what happens if you completely disable the persistenced? And do you get any of this when running KDE on X11?

Yes, I've completely disabled persistenced and now KDE in X11 just shot itself on the foot while being idle and locked (not suspended). It froze the whole computer again- will come back to Wayland with persistenced enabled just in case.

dmidecode
more logs - KDE in X11 + nvidia-persistenced disabled + iommu=soft

Offline

#13 2025-07-06 21:31:33

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

4 DIMMS, 4x4GB, 6400 MT/s
Apparently neither the persistenced nor the display server trigger this.
This time time the module didn't crash, but

jul 06 13:05:49 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel:       Tainted: P           OE       6.15.4-zen2-1-zen #1
jul 06 13:05:49 malasdecisiones kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jul 06 13:05:49 malasdecisiones kernel: task:kworker/6:1     state:D stack:0     pid:177   tgid:177   ppid:2      task_flags:0x4208060 flags:0x00004000
jul 06 13:05:49 malasdecisiones kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
jul 06 13:05:49 malasdecisiones kernel: Call Trace:
jul 06 13:05:49 malasdecisiones kernel:  <TASK>
jul 06 13:05:49 malasdecisiones kernel:  __schedule+0x451/0x2380
jul 06 13:05:49 malasdecisiones kernel:  schedule_preempt_disabled+0x2e/0xe0
jul 06 13:05:49 malasdecisiones kernel:  rwsem_down_write_slowpath+0x1ed/0x6c0
jul 06 13:05:49 malasdecisiones kernel:  down_write+0x5a/0x60
jul 06 13:05:49 malasdecisiones kernel:  os_acquire_rwlock_write+0x2b/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel:  _nv051520rm+0x10/0x40 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel:  _nv053004rm+0x28c/0x360 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel:  _nv000839rm+0x27/0x70 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel:  rm_acpi_notify+0xf1/0x280 [nvidia b93c875d3e6188b4636273d1b03d796377f592b8]
jul 06 13:05:49 malasdecisiones kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
jul 06 13:05:49 malasdecisiones kernel:  acpi_ev_notify_dispatch+0x4e/0x70
jul 06 13:05:49 malasdecisiones kernel:  acpi_os_execute_deferred+0x1a/0x30
jul 06 13:05:49 malasdecisiones kernel:  process_one_work+0x193/0x350
jul 06 13:05:49 malasdecisiones kernel:  worker_thread+0x254/0x3a0
jul 06 13:05:49 malasdecisiones kernel:  ? __pfx_worker_thread+0x10/0x10
jul 06 13:05:49 malasdecisiones kernel:  kthread+0xfc/0x240
jul 06 13:05:49 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 06 13:05:49 malasdecisiones kernel:  ret_from_fork+0x34/0x50
jul 06 13:05:49 malasdecisiones kernel:  ? __pfx_kthread+0x10/0x10
jul 06 13:05:49 malasdecisiones kernel:  ret_from_fork_asm+0x1a/0x30
jul 06 13:05:49 malasdecisiones kernel:  </TASK>
jul 06 13:05:49 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel:       Tainted: P           OE       6.15.4-zen2-1-zen #1

stall for IO
There's nothing in the journal 2m ahead of this to maybe explain it and if hangs forever

jul 06 13:05:49 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 122 seconds.
jul 06 13:05:49 malasdecisiones kernel: INFO: task kworker/7:3:10123 blocked for more than 122 seconds.
jul 06 13:07:52 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 245 seconds.
jul 06 13:07:52 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 245 seconds.
jul 06 13:07:52 malasdecisiones kernel: INFO: task kworker/7:3:10123 blocked for more than 245 seconds.
jul 06 13:09:55 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 368 seconds.
jul 06 13:09:55 malasdecisiones kernel: INFO: task nv_queue:333 blocked for more than 368 seconds.
jul 06 13:09:55 malasdecisiones kernel: INFO: task kworker/7:3:10123 blocked for more than 368 seconds.
jul 06 13:11:58 malasdecisiones kernel: INFO: task kworker/6:1:177 blocked for more than 491 seconds.

~1h earlier there's

jul 06 12:04:10 malasdecisiones wpa_supplicant[800]: wlp2s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-60 noise=9999 txrate=292500
jul 06 12:04:10 malasdecisiones steam[2654]: pid 4973 != 4968, skipping destruction (fork without exec?)
jul 06 12:04:10 malasdecisiones kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
jul 06 12:04:10 malasdecisiones kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
jul 06 12:04:10 malasdecisiones kernel: nvidia 0000:01:00.0:   device [10de:25a2] error status/mask=00000040/0000a000
jul 06 12:04:10 malasdecisiones kernel: nvidia 0000:01:00.0:    [ 6] BadTLP                
jul 06 12:04:10 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jul 06 12:04:10 malasdecisiones kernel: pcieport 0000:00:01.1: PME: Spurious native interrupt!
jul 06 12:04:11 malasdecisiones steam[2654]: Game Recording - game stopped [gameid=322170]

but isolated.

Offline

#14 2025-07-06 22:05:02

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

So what's next? Does this mean that there's no other solution?

Offline

#15 2025-07-06 22:08:33

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

That means the errors are kinda all over the place, at "best" it's somehow a bug in the nvidia kernel module but because of the MCEs you certainly want to check memtest86+ to rule out RAM defects (you might just have to downclock it a bit)

Offline

#16 Today 00:24:02

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

Offline

#17 Today 06:50:47

seth
Member
Registered: 2012-09-03
Posts: 65,450

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

If we take that at face value, can you disable the nvidia GPU in the firmware (uefi)?
Otherwise add "pci_stub.ids=10de:25a2" to the https://wiki.archlinux.org/title/Kernel_parameters (which will hide the GPU from the rest of the OS)
If the error is coming from there, the system should™ stabilize.

Offline

#18 Today 08:38:16

techmanwalker
Member
Registered: 2025-06-29
Posts: 25

Re: System stalled. Can't launch programs and Plasma becomes unresponsive.

Woah, it stabilized. No more mce errors, no more kworker and nv hung tasks nor kernel taints, it doesn't halt, and finally a graceful shutdown. Nothing apart from the obvious performance hit for using a single low-power card and that games run at -2 FPS.

The only error I see now is this:

jul 08 01:29:15 malasdecisiones kernel: NVRM: GPU 0000:01:00.0 is already bound to pci-stub.
jul 08 01:29:15 malasdecisiones kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
jul 08 01:29:15 malasdecisiones kernel: NVRM: This can occur when another driver was loaded and 
                                        NVRM: obtained ownership of the NVIDIA device(s).
jul 08 01:29:15 malasdecisiones kernel: NVRM: Try unloading the conflicting kernel module (and/or
                                        NVRM: reconfigure your kernel without the conflicting
                                        NVRM: driver(s)), then try loading the NVIDIA kernel module
                                        NVRM: again.
jul 08 01:29:15 malasdecisiones kernel: NVRM: No NVIDIA devices probed.
jul 08 01:29:15 malasdecisiones kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 511
jul 08 01:29:16 malasdecisiones kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 511

I couldn't stay for too long with the NVcard disabled so I enabled it back, but at least this will point to the right direction.

full unhung log

Offline

Board footer

Powered by FluxBB