You are not logged in.

#1 2024-07-08 05:29:52

cheeki-breeki
Member
Registered: 2024-06-06
Posts: 18

Workstation crashes my PC

Good day,

Specs:

OS: Arch Linux x86_64 
Host: X570 Taichi 
Kernel: 6.8.9-arch1-2 
Shell: fish 3.7.1 
WM: bspwm 
Terminal: kitty 
CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz 
GPU: NVIDIA GeForce RTX 3080 
Memory: 3614MiB / 64224MiB 

My system stops executing tasks when multiple vms are up.

These are some relevant logs:

Jul 07 00:49:59 arch kernel: WARNING: CPU: 19 PID: 53462 at kernel/rcu/tree_plugin.h:734 rcu_sched_clock_irq+0x88e/0x101
0
Jul 07 00:49:59 arch kernel: Modules linked in: snd_seq_dummy snd_hrtimer rfcomm snd_seq vmnet(OE) ppdev parport_pc parp
ort vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) cmac algif_hash algif_skcipher af_alg bnep vfat fat intel_rapl_msr intel_rapl_common vboxnetflt(OE) vb
oxnetadp(OE) vboxdrv(OE) pkcs8_key_parser nvidia_drm(POE) nvidia_uvm(POE) nvidia_modeset(POE) kvm_amd kvm irqbypass snd_hda_codec_realtek crct10dif_pclmul s
nd_hda_codec_generic snd_hda_codec_hdmi crc32_pclmul joydev mousedev snd_usb_audio snd_hda_intel polyval_clmulni btusb snd_usbmidi_lib snd_intel_dspcfg poly
val_generic btrtl snd_ump snd_intel_sdw_acpi gf128mul btintel iwlwifi snd_rawmidi ghash_clmulni_intel btbcm snd_hda_codec btmtk hid_generic snd_seq_device s
ha512_ssse3 nvidia(POE) snd_hda_core sha256_ssse3 bluetooth mc snd_hwdep sha1_ssse3 cfg80211 igb aesni_intel snd_pcm usbhid ecdh_generic crypto_simd ptp snd
_timer cryptd pps_core sp5100_tco snd i2c_algo_bit dca rfkill soundcore rapl i2c_piix4 video ccp wmi_bmof k10temp pcspkr acpi_cpufreq mac_hid
Jul 07 00:49:59 arch kernel:  crypto_user dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd
2 nvme mxm_wmi crc32c_intel nvme_core xhci_pci xhci_pci_renesas nvme_auth wmi
Jul 07 00:49:59 arch kernel: CPU: 19 PID: 53462 Comm: vmware-vmx Tainted: P           OE      6.8.9-arch1-2 #1 2add8ee91
5b565df906f38100dd434d161273f2d
Jul 07 00:49:59 arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P3.80 12/01/
2020
Jul 07 00:49:59 arch kernel: RIP: 0010:rcu_sched_clock_irq+0x88e/0x1010
Jul 07 00:49:59 arch kernel: Code: 25 3b 2d cb 78 ff ff ff 7f e9 d6 fe ff ff 8b 87 60 04 00 00 85 c0 0f 84 84 f8 ff ff e
b b2 c6 87 61 04 00 00 01 e9 76 f8 ff ff <0f> 0b e9 17 f8 ff ff 0f b6 05 cc 11 72 02 84 c0 74 05 e8 bb 3d ff
Jul 07 00:49:59 arch kernel: RSP: 0018:ffffb203005dcde8 EFLAGS: 00010086
Jul 07 00:49:59 arch kernel: RAX: ffff90ff248f91c0 RBX: 0000000000000000 RCX: 0000000014021bc9
Jul 07 00:49:59 arch kernel: RDX: 00000000ffffffa6 RSI: ffff90ff044f4d00 RDI: ffff90ff248f91c0
Jul 07 00:49:59 arch kernel: RBP: ffff910deeee2380 R08: 0000000000000000 R09: 0000000000000000
Jul 07 00:49:59 arch kernel: R10: 0000000000000000 R11: ffffb203005dcff8 R12: ffff910deeee4e00
Jul 07 00:49:59 arch kernel: R13: ffffb20307d5fa58 R14: ffff910deeee4e10 R15: ffff910deeee48c0
Jul 07 00:49:59 arch kernel: FS:  000077738faddc00(0000) GS:ffff910deeec0000(0000) knlGS:0000000000000000
Jul 07 00:49:59 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 07 00:49:59 arch kernel: CR2: 000077738ce00000 CR3: 0000000273822000 CR4: 0000000000f50ef0
Jul 07 00:49:59 arch kernel: PKRU: 55555554
Jul 07 00:49:59 arch kernel: Call Trace:
Jul 07 00:49:59 arch kernel:  <IRQ>
Jul 07 00:49:59 arch kernel:  ? rcu_sched_clock_irq+0x88e/0x1010
Jul 07 00:49:59 arch kernel:  ? __warn+0x81/0x130
Jul 07 00:49:59 arch kernel:  ? rcu_sched_clock_irq+0x88e/0x1010
Jul 07 00:49:59 arch kernel:  ? report_bug+0x16f/0x1a0
Jul 07 00:49:59 arch kernel:  ? handle_bug+0x3c/0x80
Jul 07 00:49:59 arch kernel:  ? exc_invalid_op+0x17/0x70
Jul 07 00:49:59 arch kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jul 07 00:49:59 arch kernel:  ? rcu_sched_clock_irq+0x88e/0x1010
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? folio_wake_bit+0x90/0xc0
Jul 07 00:49:59 arch kernel:  update_process_times+0x74/0xb0
Jul 07 00:49:59 arch kernel:  tick_sched_handle+0x21/0x60
Jul 07 00:49:59 arch kernel:  tick_nohz_highres_handler+0x6f/0x90
Jul 07 00:49:59 arch kernel:  ? __pfx_tick_nohz_highres_handler+0x10/0x10
Jul 07 00:49:59 arch kernel:  __hrtimer_run_queues+0x132/0x2a0
Jul 07 00:49:59 arch kernel:  hrtimer_interrupt+0xf8/0x230
Jul 07 00:49:59 arch kernel:  __sysvec_apic_timer_interrupt+0x4d/0x140
Jul 07 00:49:59 arch kernel:  sysvec_apic_timer_interrupt+0x6d/0x90
Jul 07 00:49:59 arch kernel:  </IRQ>
Jul 07 00:49:59 arch kernel:  <TASK>
Jul 07 00:49:59 arch kernel:  asm_sysvec_apic_timer_interrupt+0x1a/0x20
Jul 07 00:49:59 arch kernel: RIP: 0010:_copy_to_iter+0x86/0x560
Jul 07 00:49:59 arch kernel: Code: 31 f6 48 01 d7 48 89 f9 48 01 d9 40 0f 92 c6 48 85 c9 0f 88 f4 00 00 00 48 85 f6 0f 8
5 eb 00 00 00 0f 01 cb 48 89 d9 4c 89 e6 <f3> a4 0f 1f 00 0f 01 ca 48 8b 55 08 49 89 dd 48 8b 45 18 49 29 cd
Jul 07 00:49:59 arch kernel: RSP: 0018:ffffb20307d5fb00 EFLAGS: 00040246
Jul 07 00:49:59 arch kernel: RAX: 0000000000268f88 RBX: 0000000000001000 RCX: 0000000000000584
Jul 07 00:49:59 arch kernel: RDX: 00000000002eb000 RSI: ffff9100b16a5a7c RDI: 000077738ceeba8c
Jul 07 00:49:59 arch kernel: RBP: ffffb20307d5fcc0 R08: 0000000000000000 R09: 000000000160b000
Jul 07 00:49:59 arch kernel: R10: 000000000000000a R11: 000000000160b000 R12: ffff9100b16a5000
Jul 07 00:49:59 arch kernel: R13: 0000000000001000 R14: 0000000000001000 R15: 0000000000000000
Jul 07 00:49:59 arch kernel:  copy_page_to_iter+0x8b/0x140
Jul 07 00:49:59 arch kernel:  filemap_read+0x1cb/0x350
Jul 07 00:49:59 arch kernel:  vfs_read+0x24f/0x380
Jul 07 00:49:59 arch kernel:  ksys_read+0x6d/0xf0
Jul 07 00:49:59 arch kernel:  do_syscall_64+0x83/0x170
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? do_huge_pmd_anonymous_page+0x2fe/0x730
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? __handle_mm_fault+0xcaf/0xe70
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? __count_memcg_events+0x4d/0xc0
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? handle_mm_fault+0x1f2/0x350
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? do_user_addr_fault+0x204/0x670
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jul 07 00:49:59 arch kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
Jul 07 00:49:59 arch kernel: RIP: 0033:0x77738f71c9ba
Jul 07 00:49:59 arch kernel: Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 e8 63 f8 ff 48 8b 55 e8 4
8 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 42 64 f8 ff 48 8b
Jul 07 00:49:59 arch kernel: RSP: 002b:00007ffc7fea3870 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jul 07 00:49:59 arch kernel: RAX: ffffffffffffffda RBX: 0000000000553f88 RCX: 000077738f71c9ba
Jul 07 00:49:59 arch kernel: RDX: 0000000000553f88 RSI: 000077738cc00010 RDI: 0000000000000056
Jul 07 00:49:59 arch kernel: RBP: 00007ffc7fea3890 R08: 0000000000000000 R09: 0000000000000000
Jul 07 00:49:59 arch kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000553f88
Jul 07 00:49:59 arch kernel: R13: 0000000000000027 R14: 000077738cc00010 R15: 0000000000000001
Jul 07 00:49:59 arch kernel:  </TASK>
Jul 07 00:49:59 arch kernel: ---[ end trace 0000000000000000 ]---

Also these

ul 07 00:51:17 arch kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-...D 2-...D 15-...D 19-...D 22-...D
 } 18093 jiffies s: 1533 root: 0x3/.
Jul 07 00:51:17 arch kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x8006/. l=1:16-31:0x48/.
Jul 07 00:51:17 arch kernel: Sending NMI from CPU 11 to CPUs 1:
Jul 07 00:51:17 arch kernel: NMI backtrace for cpu 1 skipped: idling at acpi_processor_ffh_cstate_enter+0x76/0xe0
Jul 07 00:51:17 arch kernel: Sending NMI from CPU 11 to CPUs 2:
Jul 07 00:51:17 arch kernel: NMI backtrace for cpu 2 skipped: idling at acpi_processor_ffh_cstate_enter+0x76/0xe0
Jul 07 00:51:17 arch kernel: Sending NMI from CPU 11 to CPUs 15:
Jul 07 00:51:17 arch kernel: NMI backtrace for cpu 15 skipped: idling at acpi_processor_ffh_cstate_enter+0x76/0xe0
--More--⏎                                                                                                                                                   Jul 07 00:51:17 arch kernel: Sending NMI from CPU 11 to CPUs 19:                                                                         ✘ INT   3m 27s 
Jul 07 00:51:17 arch kernel: NMI backtrace for cpu 19 skipped: idling at acpi_processor_ffh_cstate_enter+0x76/0xe0
Jul 07 00:51:17 arch kernel: Sending NMI from CPU 11 to CPUs 22:
Jul 07 00:51:17 arch kernel: NMI backtrace for cpu 22 skipped: idling at acpi_processor_ffh_cstate_enter+0x76/0xe0
Jul 07 00:52:21 arch systemd[1055]: kitty-53837-0.scope: Couldn't move process 53839 to requested cgroup '/user.slice/user-1000.slice/user@1000.service/kitt
y-53837-0.scope' (directly or via the system bus): Connection timed out
Jul 07 00:52:21 arch systemd[1055]: kitty-53837-0.scope: Failed to add PIDs to scope's control group: Permission denied

Jul 07 00:52:21 arch systemd[1055]: kitty-53837-0.scope: Failed with result 'resources'.
Jul 07 00:52:21 arch systemd[1055]: Failed to start kitty child process: 53839 launched by: 53837.
Jul 07 00:52:56 arch kernel: INFO: task (sd-gens):53595 blocked for more than 122 seconds.
Jul 07 00:52:56 arch kernel:       Tainted: P        W  OE      6.8.9-arch1-2 #1
Jul 07 00:52:56 arch kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag

Its like mu CPU stops takling new tasks.

I had a ping running to google and when the issue reproduced, The ping continued, If I stop the ping and resume, it wont work.

Spotify ends the current song and it wont play the next song.

Nothing new works, and everything that is running, will continue to run for a few minutes and eventually stops.


Finally, this post has a similar issue and it is recommended to upgrade the BIOS, I did that and the issue persists:

This is the reddit post

I also see this logs:

Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-52.ucode failed with error -2
Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-51.ucode failed with error -2
Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: Direct firmware load for iwlwifi-cc-a0-50.ucode failed with error -2
Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: no suitable firmware found!
Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: minimum version required: iwlwifi-cc-a0-50
Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: maximum version supported: iwlwifi-cc-a0-77
Jul 07 22:31:38 arch kernel: iwlwifi 0000:06:00.0: check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

Any help would be appreciated.

Thanks!

Last edited by cheeki-breeki (2024-07-08 05:32:54)

Offline

#2 2024-07-08 06:13:23

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,939

Re: Workstation crashes my PC

Please don't post random lines "you see" out of context and also don't copy/paste out of the pager.

Generically, based only on what I could spot is present:
https://wiki.archlinux.org/title/Ryzen#Troubleshooting
https://wiki.archlinux.org/title/Solid_ … leshooting
https://bbs.archlinux.org/viewtopic.php … 2#p2181442 (see the entire thread, the gist is to try 535xx-dkms

Offline

#3 2024-07-09 01:43:55

cheeki-breeki
Member
Registered: 2024-06-06
Posts: 18

Re: Workstation crashes my PC

Hi,

I ended up downgrading workstation to 16.2 and nvidia drivers to 470. I might upgrade nvidia again since the system is a bit laggy.

What are "random lines" to you, so I dont post irrelevant stuff again If I have issues?

Thanks!

Offline

#4 2024-07-09 06:16:21

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,939

Re: Workstation crashes my PC

Random lines means lines that randomly show up in red and might or not be related to the situation.
What you posted says
1. "something stalled". What or why? Dunno.
2. you probably don't have linux-firmware installed?

The reasonable approach when you don't know exactly what you're looking *for* is to post a lot of irrelevant stuff, ie. the entire journal for the boot. Because otherwise nobody else has a chance to know what you're looking *at*.

I might upgrade nvidia again since the system is a bit laggy.

Sanity check: nvidia-smi repsponds and shows some processes using the GPU?

Offline

#5 2024-07-09 06:35:19

cheeki-breeki
Member
Registered: 2024-06-06
Posts: 18

Re: Workstation crashes my PC

Hi,

Thank you for your reply.

I guess grabbing the red logs is a common practice eh? im sorry.

yeah, nvidia-smi shows processes using the gpu.

You were actually right, I somehow didnt install linux-headers when downgrading the kernel.

Anyways, thanks.

Offline

Board footer

Powered by FluxBB