You are not logged in.
Order of events:
I `pacman -Syu` ~15 hours ago which I know for sure upgraded the linux package and probably also upgraded nvidia package
Steam doesn't want to open a few hours later because I updated nvidia drivers, so I reboot
Boots okay, then whole computer freezes on me ~1 hour later. I hard reboot.
I get the error message posted below. It prints the whole thing to screen variably (changing which CPU core has been locked up, though) but between every 4 and every 30 seconds. I reboot again.
POST fails. Someone online said they were having memory issues and this error, another said it was a GPU issue. I took the GPU (NV 1050TI) out but it was actually one of my 4 memory sticks that went kaput.
I boot arch now (w/o GPU & w/o bad RAM stick) and still get the same error message from before (again posted below).
I have a pretty bare tty fedora install on another disk in the computer which worked/works before & after this event so I'm pretty confident my hardware issues have been resolved.
I think the next step in recovery would be to boot from a flash drive and try to uninstall or reinstall some packages/drivers? Possibly I can get in by changing some Grub config but I would need instructions in that case. I'm of course open to trying anything. I can't think of what to reinstall besides linux and nvidia packages but some part of me doubts that would magically fix everything. If you search online for "watchdog bug soft lockup cpu stuck for", people have all sorts of fixes from removing failing memory to adjusting the GPU in the PCI slot to changing grub variables to switching linux kernels. Which frankly is quite unhelpful to have many invasive options people all swore by.
I make this post to query you all on what I should try first, essentially. This doesn't seem like it has a universal fix like many other common issues.
Image of error on boot
https://i.imgur.com/03F3UOA.jpeg
OCR (it did a remarkably bad job on some of the memory addresses in the log):
{ 672.1873231 7 do_user_addr_fault+6xZ1a/0x690 ir - :
[ 672.1873261 7 exc_page_fault+Ox7e/Ox1a8 og a
{ 672.1673281 ia Coch bre alata ai
Saree ese eT Le eee Ce re Ce ee ee ee ee ee ee er ke 7
( 672.187333] RSP: 60Zb:60007ffc443696a8 EFLAGS: 60000246 OPCS css ce ch Re) Zz ‘
{ 672.187334] RAX: frerfrfrerffffida RBX: 600055a89Gd5c1d6 RCK: re Crane lira _ ,
RAC DI TIN ae ce eae aaa MND OM ag TON Tony Se AA ny ry i
MY CR TS PD PMLA Rg Se I MTs eS 7A) E _§
POV ARTES LO SCI cece eae ae eee Me POM aaa eae eg agg pi MED OAM aS LG LAM 3S 4 YA a -
MEE YR GTAS DSR PBGra O cele be hk Be ES lie tb sd) r i
{ 672.187341] </TASK> a --§
( 676.186044] watchdog: BUG: soft lockup - CPUS3 stuck for 626s? [(udeu-worker ):391] —— :
( 676.1860451 CPUS3 Utilization every 4s during lockup: . ‘
[| 676.186046)] o81: 160% system,o 6% softirg.o 6% hardirg,o 6% idle : |
en Ye OE A ae a ea Oa ee ae COC Fe 4%
Oy ee cS SP Cae a a a ae CC a
{ 676.186050] of4: 160% system,o 6% softirg,o @% hardirg,o 6% idle a iw
[ 676.186051] of5: 106% systen,o 6% softirg,o 1% hardirg,o 67% idle _ aaa a
[ 676.186052] Modules linked in: nvidia(POE+) wei_hdcp(+) iTCO_vendor_support mei_wdt(+) spi_intel e1000e iZc_smbus mei_pxp(+) snd_timer intel_cstate platform_profile sparse_keynmap psraouse omit eR eos if A RO a pesp
kr snd mei_me intel_oc_wdt(+) mc pps_core mei soundcore rfkill lpc_ich tpm_infineon mousedev joydev mac_hid iZc_dev crypto_user dm_mod loop nfnetlink zram 842_decompress 842_compress 1z¢hc_cowpet ye | i Beat
b_storage i915 iZc_algo_bit drm_buddy ttra intel_gtt drm_display_helper serio_raw cec video uni ye :
[ 676.186077] CPU: 3 UID: 6 PID: 391 Comm: (udev-worker) Tainted: P D LHR 6.16.Q-arch2-1 #1 PREEMPT(full) 37e47d7aef36aa7Z77ae10492993feB812fdZ0521 aw . as
{ 676.186000] Tainted: [P]=PROPRIETARY_MODULE, ([DI=DIE, (01=-O0T_MODULE, [E]=-UNSIGNED_MODULE, [L]=SOFTLOCKUP
{ 676.186061] Hardware name: Hewlett-Packard HP ProDesk 600 G1 SFF/18E7, BIOS L01 v02.78 02/20/2020
[ 676.186062) RIP: 6010:nat ive_queued_spin_lock_s lowpath+@x2c4/0xzZfe iat. d5- bed
[ 676.186065 Code: 83 c0 63 63 ce O1 46 cl e6 05 48 63 f6 48 65 cO BZ bi 99 48 63 64 £5 60 66 51 98 48 89 10 Bb 42 08 BS cO 75 69 £3 90 Bb 42 OB <BS> cO 74 £7 48 Gb 32 49 OS fo Of O4 bc ff fF ff OF 18 Os Ged Te
[ 676.186006)] RSP: 6018: ffffcfd400e7fadO EFLAGS: 666060246
WY Pel PM) MMC reco i eos sc ccs t Se eee eee BB belt bello
[ 676.186069] RDX: ffff69Zb47af3Z2cO RSI: eeeeeeeeeeoeeeese RDI: ffrrrrfrfrgygbaSdac
[ 676.186690] RBP: 00607fa2c44392Zf2 ROB: EGEeGEEeeCeG1EGeES REY: ffffE9Zbadfdbeeo
Yee csp R MD Sih cscs hts pm cos tics oR PAE eB hoe Ao TAMAS Bs,
Gan Yes PR Ge AMD SR MEER 4c Yel POSER Sip 40 774 we SRM tc oes)
YP RPM APM yer vse ew LC PN Eee eee ACG Te: pee pte acces colt oles
Yap ls ce POP MSOC MS ccs oe
[ 676.1866961] CRZ2: 06007efd9dfS4350 CR3: EGGGEOE1EEFbeee4 CR4: G60000000001706f0
{ 676.186897] Call Trace:
Yaseen c Be NY
{ 676.186699] _raw_spin_lock+0x23/0x30
{ 676.166162] idenpotent_init_module+0x1Z7/0x316
[ 676.186104] _ x64 sys_finit_module+Ox6d/Oxd0
{ 676.186106] 7? syscall_trace_enter+0x8d/6x1f0
ee Ya Pe eT OO Me Set ea era: )
{ 676.1861121 7 xfs_iunlock+Oxca/@x106 (xfs ccd607399f926047646e2cOcdeb26e7377 fcef86b ]
C 676.1862981 7 ufs_read+0x165/0x390
C 676.1863001 7 ufs_read+6x165,/0x390
{[ 676.186303] 7 __rseq_handle_notify_resume+0xa6/0x490
{ 676.1863051 7 switch_fpu_return+0x4e/0xd0
{C 676.1863081 7 do_syscal1_64+0x214/0x970
{ 676.1863111 7 alloc_fd+0x12e/0x190
{ 676.186313] 7 put_unused_fd+0x2a/0x70
YG rels bls ae OMEN U ME ot RPA) CVA odo)
{ 676.186319] 7 _ x64 sys _openat+0x61/0xa0
C 676.186321] 7 do_syscal1_64+0x81/0x970
{C 676.186323] ? do_user_addr_fault+0xZ1a/0x690
{( 676.186326] 7? exc_page_fault+0x7e/0x1a0
{ 676.1863291 entry_SYSCALL_64_after_hwframe+0x76/0x7e
etreeet RIP: 0033 :0x?fa2c3b1B76d
, 33] Code: ff c3 66 Ze Of 1f 84 00 00 00 60 00 90 f3 OF le fa 48 89 £8 48 89 f7 48 89 d6 48 89 ca 4d B89 c2 4d B9 cB 4c Bb 4c 24 OB OF OS <48> 3d 01 £O FF
( 676.186334] RSP: 002b:00007ffc443696aB EFLAGS: 00000246 ORIG_RAX: 0006000000000139 eee elk ce
{ 676.186336] RAX: ffffffffrfefffda RBX: 00005S5a898d16a70 RCX: 00007faZ2c3b1876d
: 676. 1863371 RDX: 0000000000000004 RSI: 00007fa2c4439Zf2 RDI: 900000000000001b
: 676 .1863381 RBP: 00007ffc44369740 ROB: 0900000000000000 REI: 600055a898d972a0
( 676.1863391 R10: 000G000000000000 R11: 0000000000000246 R12: 00007fa2c4439ZfZ
{ 676.186341] R13: 0600800000020000 R14: 000055a898da38fO R15: 000055a898d9c160
{ 676.186343] </TASK>Mod note: Replaced oversized image with link.
Last edited by schard (2025-08-19 06:55:15)
Offline
Re-installing likely won't help but the cause is probably the kernel update and you could try to install/boot linux-lts and nvidia-lts
The immediate problem seems to be xfs, but also see https://bbs.archlinux.org/viewtopic.php?id=307619
Possible causes from the context
1. filesystem corruption from the hard reboot
2. disk is falling apart, https://wiki.archlinux.org/title/SMART (the other partition isn't necessarily affected - yet)
Edit: or #3
https://bbs.archlinux.org/viewtopic.php?id=307627 - stack trace in the screenshots look similar, but no xfs (so likely victim of the exiting soft lockup)
Last edited by seth (2025-08-16 20:22:30)
Offline
(the other partition isn't necessarily affected - yet)
Fedora is on another drive entirely, actually. I didn't consider that it could have been a drive hardware issue on the Arch SSD. But I checked with a short scan and it reported itself OK. I doubt it's wrong - the drive has just over a year of runtime and is from a relatively reputable brand.
I mounted root and efi, swapon'd, arch-chrooted, and pacman -S linux-lts nvidia-lts. I got Arch to boot all the way although the systemd startup output certainly looked different. I'm 80% sure that even on this successful boot, the register dump and whatnot message similar to before was printed at least once but the system still booted. When I shutdown the system, though, it hung at the very end after the log says "powering off" or "rebooting". (It hung here for >10 minutes. I once again force killed the computer.) After this, I try to boot it and it fails to boot again with BUG: soft lockup as if it wasn't just running perfectly fine. I, after, reinstalled all my packages from the chroot, but removing linux-lts and nvidia-lts fully to return to the normal kernel. On the first boot after I got 'soft lockup', the second it booted fine but hung on shutdown/reboot. As for after, I can't get it past 'soft lockup' anymore.
The near-identical behavior on normal and lts kernel leads me to believe that the actual kernel update isn't causing my issues. But I'm not completely confident in my testing.
After-shutdown or after-reboot hang with no apparent error printed:
https://i.imgur.com/7VwlKGW.jpeg (full logs posted 2 messages below)
Side question. Would ext4 be more resilient against hard reboots than xfs? (I'm not saying that I believe this issue was caused by the hard reboot. But if it turned out to be fs corruption I might be tempted to switch to something more stable after all this, if there is a difference)
Last edited by superlex (2025-08-17 20:48:25)
Offline
But I checked with a short scan and it reported itself OK. I doubt it's wrong - the drive has just over a year of runtime and is from a relatively reputable brand.
Fwwi, passing short tests doesn't mean that much and disks from even "absolutely reputable brands" fail on occasion ![]()
After-shutdown or after-reboot hang with no apparent error printed:
You mean other than the tail kernel warning/oops waving at the top?
Remove the oversized screenshot, post the journal of that boot,
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.stfor the previous ("-1") one.
Would ext4 be more resilient against hard reboots than xfs?
Both a journaling, but ext4 is of course more common and probably better tested by this.
Offline
https://0x0.st/KrCu.txt - Journal from first successful boot when I had just installed lts kernel though I'm not sure the logs reflect that - I could be wrong (hung on shutdown) (this was journalctl -b -7)
https://0x0.st/KrCS.txt - from second successful boot back on normal kernel (hung on shutdown) (this was journalctl -b -5)
https://0x0.st/KrCQ.txt - from one of the unsuccessful boots after 2nd with nothing changed (I'm pretty sure) (this was journalctl -b -1)
Offline
Aug 16 23:47:47 opti kernel: #PF: supervisor write access in kernel mode
Aug 16 23:47:47 opti kernel: #PF: error_code(0x0003) - permissions violation
Aug 16 23:47:47 opti kernel: PGD 437829067 P4D 437829067 PUD 43782b067 PMD 10c1a1067 PTE 800000010a828021
Aug 16 23:47:47 opti kernel: Oops: Oops: 0003 [#1] SMP PTI
Aug 16 23:47:47 opti kernel: CPU: 3 UID: 0 PID: 385 Comm: (udev-worker) Tainted: P OE 6.16.1-arch1-1 #1 PREEMPT(full) 83823f140bb4fc8c507f38d1610ad9b642cd4b9a
Aug 16 23:47:47 opti kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Aug 16 23:47:47 opti kernel: Hardware name: Hewlett-Packard HP ProDesk 600 G1 SFF/18E7, BIOS L01 v02.78 02/20/2020
Aug 16 23:47:47 opti kernel: RIP: 0010:intel_oc_wdt_probe.cold+0x1e/0x7f [intel_oc_wdt]
Aug 16 23:47:47 opti kernel: Code: c3 cc cc cc cc b8 f4 ff ff ff eb ec 89 c2 48 8b 7b 08 89 44 24 04 48 c7 c6 98 90 1c c1 81 e2 ff 03 00 00 c6 05 d1 33 97 00 01 <c7> 05 07 30 9d 00 00 81 00 00 83 c2 01 89 53 34 e8 9c 1b 87 c6 48
Aug 16 23:47:47 opti kernel: RSP: 0018:ffffd55840907aa0 EFLAGS: 00010206
Aug 16 23:47:47 opti kernel: RAX: 00000000ffffffff RBX: ffff8f49885ae028 RCX: ffff8f4981c93e40
Aug 16 23:47:47 opti kernel: RDX: 00000000000003ff RSI: ffffffffc11c9098 RDI: ffff8f4980b0dc10
Aug 16 23:47:47 opti kernel: RBP: ffff8f4980b0dc10 R08: ffff8f49885ae028 R09: ffff8f4981c985e0
Aug 16 23:47:47 opti kernel: R10: ffff8f4980b0dc10 R11: 00000000ffffffff R12: ffff8f4980b0dc00
Aug 16 23:47:47 opti kernel: R13: ffffffffc1169068 R14: 00007fd41780e2f2 R15: 0000000000000000
Aug 16 23:47:47 opti kernel: FS: 00007fd4170ed880(0000) GS:ffff8f4efd9db000(0000) knlGS:0000000000000000
Aug 16 23:47:47 opti kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 23:47:47 opti kernel: CR2: ffffffffc11c92c0 CR3: 000000010112a001 CR4: 00000000001706f0
Aug 16 23:47:47 opti kernel: Call Trace:
Aug 16 23:47:47 opti kernel: <TASK>
Aug 16 23:47:47 opti kernel: platform_probe+0x46/0xb0
Aug 16 23:47:47 opti kernel: really_probe+0xde/0x340
Aug 16 23:47:47 opti kernel: ? pm_runtime_barrier+0x55/0x90
Aug 16 23:47:47 opti kernel: __driver_probe_device+0x78/0x140
Aug 16 23:47:47 opti kernel: driver_probe_device+0x1f/0xa0
Aug 16 23:47:47 opti kernel: ? __pfx___driver_attach+0x10/0x10
Aug 16 23:47:47 opti kernel: __driver_attach+0xcb/0x1e0
Aug 16 23:47:47 opti kernel: bus_for_each_dev+0x85/0xd0
Aug 16 23:47:47 opti kernel: bus_add_driver+0x10b/0x1f0
Aug 16 23:47:47 opti kernel: ? __pfx_intel_oc_wdt_platform_driver_init+0x10/0x10 [intel_oc_wdt 33cd99bc949a95a28ca6d799752f6cd6ebde3755]
Aug 16 23:47:47 opti kernel: driver_register+0x75/0xe0
Aug 16 23:47:47 opti kernel: do_one_initcall+0x5b/0x300
Aug 16 23:47:47 opti kernel: do_init_module+0x62/0x250
Aug 16 23:47:47 opti kernel: ? init_module_from_file+0x8a/0xe0
Aug 16 23:47:47 opti kernel: init_module_from_file+0x8a/0xe0
Aug 16 23:47:47 opti kernel: idempotent_init_module+0x114/0x310
Aug 16 23:47:47 opti kernel: __x64_sys_finit_module+0x6d/0xd0
Aug 16 23:47:47 opti kernel: ? syscall_trace_enter+0x8d/0x1f0
Aug 16 23:47:47 opti kernel: do_syscall_64+0x81/0x970
Aug 16 23:47:47 opti kernel: ? count_memcg_events+0x14d/0x1a0
Aug 16 23:47:47 opti kernel: ? handle_mm_fault+0x1d7/0x2d0
Aug 16 23:47:47 opti kernel: ? do_user_addr_fault+0x181/0x690
Aug 16 23:47:47 opti kernel: ? exc_page_fault+0x7e/0x1a0
Aug 16 23:47:47 opti kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 16 23:47:47 opti kernel: RIP: 0033:0x7fd416f1876d
Aug 16 23:47:47 opti kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 05 0f 00 f7 d8 64 89 01 48
Aug 16 23:47:47 opti kernel: RSP: 002b:00007fffd16fc278 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Aug 16 23:47:47 opti kernel: RAX: ffffffffffffffda RBX: 0000562e38ca4ba0 RCX: 00007fd416f1876d
Aug 16 23:47:47 opti kernel: RDX: 0000000000000004 RSI: 00007fd41780e2f2 RDI: 0000000000000019
Aug 16 23:47:47 opti kernel: RBP: 00007fffd16fc310 R08: 0000000000000000 R09: 0000000000000000
Aug 16 23:47:47 opti kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd41780e2f2
Aug 16 23:47:47 opti kernel: R13: 0000000000020000 R14: 0000562e38ca1560 R15: 0000562e38ca4ba0
Aug 16 23:47:47 opti kernel: </TASK>
Aug 16 23:47:47 opti kernel: Modules linked in: i2c_smbus mei_me pcspkr intel_oc_wdt(+) acpi_cpufreq(-) ptp i2c_mux pps_core soundcore mei lpc_ich tpm_infineon mac_hid i2c_dev crypto_user dm_mod loop nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables xfs uas usb_storage i915 i2c_algo_bit drm_buddy ttm intel_gtt drm_display_helper serio_raw cec video wmi
Aug 16 23:47:47 opti kernel: CR2: ffffffffc11c92c0
Aug 16 23:47:47 opti kernel: ---[ end trace 0000000000000000 ]---
Aug 16 23:47:47 opti kernel: RIP: 0010:intel_oc_wdt_probe.cold+0x1e/0x7f [intel_oc_wdt]
Aug 16 23:47:47 opti kernel: Code: c3 cc cc cc cc b8 f4 ff ff ff eb ec 89 c2 48 8b 7b 08 89 44 24 04 48 c7 c6 98 90 1c c1 81 e2 ff 03 00 00 c6 05 d1 33 97 00 01 <c7> 05 07 30 9d 00 00 81 00 00 83 c2 01 89 53 34 e8 9c 1b 87 c6 48
Aug 16 23:47:47 opti kernel: RSP: 0018:ffffd55840907aa0 EFLAGS: 00010206
Aug 16 23:47:47 opti kernel: RAX: 00000000ffffffff RBX: ffff8f49885ae028 RCX: ffff8f4981c93e40
Aug 16 23:47:47 opti kernel: RDX: 00000000000003ff RSI: ffffffffc11c9098 RDI: ffff8f4980b0dc10
Aug 16 23:47:47 opti kernel: RBP: ffff8f4980b0dc10 R08: ffff8f49885ae028 R09: ffff8f4981c985e0
Aug 16 23:47:47 opti kernel: R10: ffff8f4980b0dc10 R11: 00000000ffffffff R12: ffff8f4980b0dc00
Aug 16 23:47:47 opti kernel: R13: ffffffffc1169068 R14: 00007fd41780e2f2 R15: 0000000000000000
Aug 16 23:47:47 opti kernel: FS: 00007fd4170ed880(0000) GS:ffff8f4efd9db000(0000) knlGS:0000000000000000
Aug 16 23:47:47 opti kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 23:47:47 opti kernel: CR2: ffffffffc11c92c0 CR3: 000000010112a001 CR4: 00000000001706f0
Aug 16 23:47:47 opti kernel: note: (udev-worker)[385] exited with irqs disabledhttps://bbs.archlinux.org/viewtopic.php … 0#p2256730
I'm hereby starting a petition to make Linus swear and rant at people again ![]()
Offline
That worked. Thank you so much.
Q: Did I fail to switch to linux-lts? Should that have solved it because (I presume) lts doesn't have this commit yet? I think I may need to reevaluate how much I value being on the bleeding edge. Also, if I wanted to stay on linux and not move to linux-lts, where do I put the blacklist so I don't need to type it in grub each time? /etc/modprobe.d/?
Last edited by superlex (2025-08-18 15:57:23)
Offline
None of the posted journals boots the LTS kernel, no idea whether that's a failure to boot the LTS kernel or to pick the right journal ![]()
You can add it to your /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT (you might have to grub-mkconfig to pick up the LTS kernel anyway)?
The kernel commandline will always apply (ie. cover the initramfs) and you might easier remember that you blacklisted it, the next kernel update should already come w/ a fix for this.
Edit:
Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
Last edited by seth (2025-08-18 19:47:30)
Offline