You are not logged in.

#26 2023-07-14 13:32:31

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:
Hubbleexplorer wrote:

should this be just appending in ryzen 5000, not 4000, weird, i apply the fix any way to see if it works

The series limitation (in the wiki) applies only to the undervoltage, pretty much all of them seem to have trouble w/ higher c-states and I'd not ignore the voltage situation either.
You do have https://archlinux.org/packages/core/any/amd-ucode/ ?

Yes i have the amd-ucode installed

Chinaboy5216 wrote:

Seeing this happening also for some time now (blinking caps lock key from time to time when booting up), I just do hardware reset for now which seems to solve the issue.

As test i installed EndeavourOS and run it for a couple of days, same issue happened (I guess it's something or kernel related or arch base related).
A couple of days ago i did a fresh install of Arch again and had the issue again this morning. Installation done with the archinstall script (updated before running the script)

Next time it happens I'll try to get the boot log also.

amd-ucode is installed here, issue happens on the linux kernel, linux-zen kernel, linux-lts kernel

Switching from Hybrid GPU to Nvidia with the use of optimus-qt and/or envycontrol also leads to frozen system during boot (don't know however is this is related to the blinking caps lock issue, last message during boot is: usci_acpi USBC000:00: error -ETIMEDOUT: PPM init failed)

My system Tuf A15 FA506QM Ryzen 5800H Nvidia RTX 3060 Mobile / MAX - Q (bought this one in China at the time)
OS Arch with KDE desktop
Only modifications done to the system has been swap the Mediatek wifi card with an intel AX210 and add an extra nvme drive to the system (shouldn't be related to this issue but want to mention it anyway)

Interesting switch from Hybrid GPU to Nvidia can cause kernel panics but is not for the same reasons, i dont think is related, but i also i a wifi card that i have is not from the pc is an intel wifi card and just a  extra sata sdd.

Offline

#27 2023-07-15 01:48:58

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:
Jul 13 12:42:03 hubble kernel: smpboot: CPU0: AMD Ryzen 9 4900H with Radeon Graphics (family: 0x17, model: 0x60, stepping: 0x1)
Jul 13 16:27:27 hubble kernel: RIP: 0010:__switch_to+0x130/0x400
Jul 13 16:27:27 hubble kernel: RIP: 0010:__switch_to+0x130/0x400
Jul 13 16:27:27 hubble kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x223/0x2e0
Jul 13 16:27:27 hubble kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x440
Jul 13 16:27:27 hubble kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x2a5/0x2e0
Jul 13 16:27:27 hubble kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2e0
Jul 13 16:27:27 hubble kernel: RIP: 0000:0x0
Jul 13 16:27:27 hubble kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2e0
Jul 13 16:27:27 hubble kernel: RIP: 0000:0x0
Jul 13 16:27:27 hubble kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x223/0x2e0
Jul 13 16:27:27 hubble kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x440
seth wrote:

In case this is a ryzen system, https://wiki.archlinux.org/title/Ryzen#Troubleshooting (processor.max_cstate=1 and the curve optimizer)

So even with the fix it still crashes again without anything (it can be checked here http://ix.io/4ACf, for now i just bump the loglevel to 7 to see if something appears, when it crashes again i will post it.
Also if some one can help understand why doesn't kdump work i would be thankful.

Offline

#28 2023-07-15 06:10:36

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Jul 14 22:28:30 hubble kernel: Command line: BOOT_IMAGE=/vmlinuz-linux-kdump root=UUID=449b6366-7261-4817-8a7a-b83c38c3e71d rw loglevel=debug crashkernel=256M modprobe.blacklist=nouveau resume=/dev/nvme0n1p2

The boot doesn't seem to limit the c-states? If you're using a crash-kernel, you want that for both kernels, but you can really just try whether it helps w/ the default setup.

As for kdump not working, did you test a synthetic crash, "echo c | sudo tee /proc/sysrq-trigger"?
If your CPU gets knocked out by the c-state change and needs a reset, the crash kernel will likely not help you - it's for kernel bugs, not HW ones.

Offline

#29 2023-07-15 14:36:19

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:
Jul 14 22:28:30 hubble kernel: Command line: BOOT_IMAGE=/vmlinuz-linux-kdump root=UUID=449b6366-7261-4817-8a7a-b83c38c3e71d rw loglevel=debug crashkernel=256M modprobe.blacklist=nouveau resume=/dev/nvme0n1p2

The boot doesn't seem to limit the c-states? If you're using a crash-kernel, you want that for both kernels, but you can really just try whether it helps w/ the default setup.

As for kdump not working, did you test a synthetic crash, "echo c | sudo tee /proc/sysrq-trigger"?
If your CPU gets knocked out by the c-state change and needs a reset, the crash kernel will likely not help you - it's for kernel bugs, not HW ones.

it should limit the cstate the option is in grub

Arch_Linux_Hubble ~ $: cat /etc/default/grub 
# GRUB boot loader configuration

GRUB_DEFAULT="0"
GRUB_TIMEOUT="5"
GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=7 crashkernel=256M  modprobe.blacklist=nouveau resume=/dev/nvme0n1p2 processor.max_cstate=1"
GRUB_CMDLINE_LINUX=""

Yes i test a synthentic crash and didn't work for some reason it cant find /proc/vmcore

Jul 15 15:29:24 hubble sh[724]: open_dump_memory: Can't open the dump memory(/proc/vmcore). No such file or directory
Jul 15 15:29:24 hubble systemd[1]: kdump.service: Control process exited, code=exited, status=1/FAILURE

despite this

Arch_Linux_Hubble ~ $: zgrep -E 'CONFIG_DEBUG_INFO=|CONFIG_CRASH_DUMP=|CONFIG_PROC_VMCORE=' /proc/config.gz

CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y
CONFIG_DEBUG_INFO=y

maybe i have some bad configuration i have to look again at the wiki

Offline

#30 2023-07-15 14:40:38

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Arch_Linux_Hubble ~ $: cat /etc/default/grub

Editing that file alone does nothing, you'll still have to "grub-mkconfig -o /boot/grub/grub.cfg"
"cat /proc/cmdline" is authorative wrt. the success here.

Offline

#31 2023-07-15 15:28:06

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:

Arch_Linux_Hubble ~ $: cat /etc/default/grub

Editing that file alone does nothing, you'll still have to "grub-mkconfig -o /boot/grub/grub.cfg"
"cat /proc/cmdline" is authorative wrt. the success here.

where it is

Arch_Linux_Hubble ~ $: cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-linux-kdump root=UUID=449b6366-7261-4817-8a7a-b83c38c3e71d rw loglevel=7 crashkernel=256M modprobe.blacklist=nouveau resume=/dev/nvme0n1p2 processor.max_cstate=1

Offline

#32 2023-07-15 15:54:22

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Ok, but  it didn't show up in the last journal you posted.
Did you experience crashes on a boot where this was guaranteed to be set?

Offline

#33 2023-07-16 01:41:39

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:

Ok, but  it didn't show up in the last journal you posted.
Did you experience crashes on a boot where this was guaranteed to be set?

Yes it crash again without information http://ix.io/4AHU
the cstate is enable

Arch_Linux_Hubble ~ $: sudo journalctl -b -1                         
Jul 15 22:39:16 hubble kernel: Linux version 6.4.2-arch1-1-kdump (linux-kdump@archlinux) (gcc (GCC) 13.1.1 20230429, GNU ld (GNU Binutils) 2.40.0) #1 SMP PREEMPT_DYNAMIC Mon, 10 Jul 2023 15:43:25 +0000
Jul 15 22:39:16 hubble kernel: Command line: BOOT_IMAGE=/vmlinuz-linux-kdump root=UUID=449b6366-7261-4817-8a7a-b83c38c3e71d rw loglevel=7 crashkernel=256M modprobe.blacklist=nouveau resume=/dev/nvme0n1p2 processor.max_cstate=1

last lines

Jul 15 23:00:59 hubble org_kde_powerdevil[1177]: org.kde.powerdevil: Unsatisfied policies, the action has been aborted
Jul 15 23:01:44 hubble org_kde_powerdevil[1177]: org.kde.powerdevil: Unsatisfied policies, the action has been aborted
Jul 15 23:02:29 hubble org_kde_powerdevil[1177]: org.kde.powerdevil: Unsatisfied policies, the action has been aborted
Jul 15 23:04:05 hubble wpa_supplicant[746]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=650000
Jul 15 23:04:29 hubble kded5[1092]: ktp-kded-module: "auto-away" presence change request: "away" ""
Jul 15 23:04:29 hubble kded5[1092]: ktp-kded-module: plugin queue activation: "away" ""
Jul 15 23:05:58 hubble wpa_supplicant[746]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-72 noise=9999 txrate=585100

not even loglevel=7 gives anything relevant i will have to try replace the wifi card to remove that variable, that will take some days to find where i have the replacement...
also i will go to the latest kernel to see if anything has change
for now any other recommendations will be appreciated

Last edited by Hubbleexplorer (2023-07-16 01:42:47)

Offline

#34 2023-07-16 01:48:35

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Hubbleexplorer wrote:
seth wrote:

Ok, but  it didn't show up in the last journal you posted.
Did you experience crashes on a boot where this was guaranteed to be set?

Yes it crash again without information http://ix.io/4AHU
the cstate is enable

Arch_Linux_Hubble ~ $: sudo journalctl -b -1                         
Jul 15 22:39:16 hubble kernel: Linux version 6.4.2-arch1-1-kdump (linux-kdump@archlinux) (gcc (GCC) 13.1.1 20230429, GNU ld (GNU Binutils) 2.40.0) #1 SMP PREEMPT_DYNAMIC Mon, 10 Jul 2023 15:43:25 +0000
Jul 15 22:39:16 hubble kernel: Command line: BOOT_IMAGE=/vmlinuz-linux-kdump root=UUID=449b6366-7261-4817-8a7a-b83c38c3e71d rw loglevel=7 crashkernel=256M modprobe.blacklist=nouveau resume=/dev/nvme0n1p2 processor.max_cstate=1

last lines

Jul 15 23:00:59 hubble org_kde_powerdevil[1177]: org.kde.powerdevil: Unsatisfied policies, the action has been aborted
Jul 15 23:01:44 hubble org_kde_powerdevil[1177]: org.kde.powerdevil: Unsatisfied policies, the action has been aborted
Jul 15 23:02:29 hubble org_kde_powerdevil[1177]: org.kde.powerdevil: Unsatisfied policies, the action has been aborted
Jul 15 23:04:05 hubble wpa_supplicant[746]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=1 signal=-65 noise=9999 txrate=650000
Jul 15 23:04:29 hubble kded5[1092]: ktp-kded-module: "auto-away" presence change request: "away" ""
Jul 15 23:04:29 hubble kded5[1092]: ktp-kded-module: plugin queue activation: "away" ""
Jul 15 23:05:58 hubble wpa_supplicant[746]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-72 noise=9999 txrate=585100

not even loglevel=7 gives anything relevant i will have to try replace the wifi card to remove that variable, that will take some days to find where i have the replacement...
also i will go to the latest kernel to see if anything has change
for now any other recommendations will be appreciated

edit
this pc is just playing with my paciente i guess, minutes after writing this it crashes  again in the middle of mkinitcpio, with a stack trace

Jul 16 02:43:29 hubble sudo[13884]:   hubble : TTY=pts/1 ; PWD=/home/hubble ; USER=root ; COMMAND=/usr/bin/mkinitcpio -P
Jul 16 02:43:29 hubble sudo[13884]: pam_unix(sudo:session): session opened for user root(uid=0) by hubble(uid=1000)
Jul 16 02:43:30 hubble kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010
Jul 16 02:43:30 hubble kernel: #PF: supervisor read access in kernel mode
Jul 16 02:43:30 hubble kernel: #PF: error_code(0x0000) - not-present page
Jul 16 02:43:30 hubble kernel: PGD 0 P4D 0 
Jul 16 02:43:30 hubble kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jul 16 02:43:30 hubble kernel: CPU: 8 PID: 14314 Comm: find Kdump: loaded Tainted: P           OE      6.4.2-arch1-1-kdump #1 989cf3c56bded11ab89eb8592e28e5c14d7d53c4
Jul 16 02:43:30 hubble kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA506IU_FA506IU/FA506IU, BIOS FA506IU.319 04/26/2022
Jul 16 02:43:30 hubble kernel: RIP: 0010:rb_next+0x26/0x50
Jul 16 02:43:30 hubble kernel: Code: 90 90 90 90 f3 0f 1e fa 48 8b 0f 48 39 cf 74 33 48 8b 57 08 48 85 d2 74 1d 48 89 d0 48 8b 52 10 48 85 d2 75 f4 e9 ea 12 03 00 <48> 3b 78 08 75 15 48 8b 08 48 89 c7 48 89 c8 48 83 e0 fc 48 83 f9
Jul 16 02:43:30 hubble kernel: RSP: 0018:ffffa18093437dc8 EFLAGS: 00010202
Jul 16 02:43:30 hubble kernel: RAX: 0000000000000008 RBX: 0000000000000000 RCX: 000000000000000a
Jul 16 02:43:30 hubble kernel: RDX: 0000000000000000 RSI: ffff8db4c1d5aa00 RDI: ffff8db4c1d5a018
Jul 16 02:43:30 hubble kernel: RBP: ffffa18093437e60 R08: 00000000000025a8 R09: 0000000000000008
Jul 16 02:43:30 hubble kernel: R10: ffffffffaec0e030 R11: 0000000000000000 R12: ffff8db4c1d5a000
Jul 16 02:43:30 hubble kernel: R13: 0000000000000008 R14: ffffffffafdff323 R15: ffff8db4c01e6960
Jul 16 02:43:30 hubble kernel: FS:  00007f0c1b0b1740(0000) GS:ffff8db7df800000(0000) knlGS:0000000000000000
Jul 16 02:43:30 hubble kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 02:43:30 hubble kernel: CR2: 0000000000000010 CR3: 00000003cc74e000 CR4: 0000000000350ee0
Jul 16 02:43:30 hubble kernel: Call Trace:
Jul 16 02:43:30 hubble kernel:  <TASK>
Jul 16 02:43:30 hubble kernel:  ? __die+0x23/0x70
Jul 16 02:43:30 hubble kernel:  ? page_fault_oops+0x171/0x4e0
Jul 16 02:43:30 hubble kernel:  ? exc_page_fault+0x7f/0x180
Jul 16 02:43:30 hubble kernel:  ? asm_exc_page_fault+0x26/0x30
Jul 16 02:43:30 hubble kernel:  ? __pfx_filldir64+0x10/0x10
Jul 16 02:43:30 hubble kernel:  ? rb_next+0x26/0x50
Jul 16 02:43:30 hubble kernel:  kernfs_fop_readdir+0x157/0x280
Jul 16 02:43:30 hubble kernel:  iterate_dir+0x17b/0x1c0
Jul 16 02:43:30 hubble kernel:  __x64_sys_getdents64+0x88/0x130
Jul 16 02:43:30 hubble kernel:  ? __pfx_filldir64+0x10/0x10
Jul 16 02:43:30 hubble kernel:  do_syscall_64+0x60/0x90
Jul 16 02:43:30 hubble kernel:  ? do_syscall_64+0x6c/0x90
Jul 16 02:43:30 hubble kernel:  ? __x64_sys_fcntl+0x94/0xc0
Jul 16 02:43:30 hubble kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Jul 16 02:43:30 hubble kernel:  ? do_syscall_64+0x6c/0x90
Jul 16 02:43:30 hubble kernel:  ? do_syscall_64+0x6c/0x90
Jul 16 02:43:30 hubble kernel:  ? do_syscall_64+0x6c/0x90
Jul 16 02:43:30 hubble kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jul 16 02:43:30 hubble kernel: RIP: 0033:0x7f0c1b184577
Jul 16 02:43:30 hubble kernel: Code: 3c fb ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 a7 10 00 f7 d8 64 89 02 48
Jul 16 02:43:30 hubble kernel: RSP: 002b:00007ffda19b5628 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Jul 16 02:43:30 hubble kernel: RAX: ffffffffffffffda RBX: 00005636143ce4d0 RCX: 00007f0c1b184577
Jul 16 02:43:30 hubble kernel: RDX: 0000000000008000 RSI: 00005636143ce500 RDI: 0000000000000006
Jul 16 02:43:30 hubble kernel: RBP: 00005636143ce4d4 R08: 0000000000000040 R09: 0000000000000001
Jul 16 02:43:30 hubble kernel: R10: 0000000000000100 R11: 0000000000000293 R12: 00005636143ce500
Jul 16 02:43:30 hubble kernel: R13: ffffffffffffff88 R14: 0000000000000000 R15: 0000000000000000
Jul 16 02:43:30 hubble kernel:  </TASK>
Jul 16 02:43:30 hubble kernel: Modules linked in: hid_sony ff_memless ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_s>
Jul 16 02:43:30 hubble kernel:  sha512_ssse3 snd_acp_config snd_hwdep mc fat ecdh_generic mousedev joydev crc16 snd_pcm aesni_intel snd_timer crypto_simd ucsi_ccg typec_ucsi cfg80211 cryptd snd_soc_acpi platform_profile snd typec i2c_h>
Jul 16 02:43:30 hubble kernel: CR2: 0000000000000010
Jul 16 02:43:30 hubble kernel: ---[ end trace 0000000000000000 ]---
Jul 16 02:43:30 hubble kernel: RIP: 0010:rb_next+0x26/0x50
Jul 16 02:43:30 hubble kernel: Code: 90 90 90 90 f3 0f 1e fa 48 8b 0f 48 39 cf 74 33 48 8b 57 08 48 85 d2 74 1d 48 89 d0 48 8b 52 10 48 85 d2 75 f4 e9 ea 12 03 00 <48> 3b 78 08 75 15 48 8b 08 48 89 c7 48 89 c8 48 83 e0 fc 48 83 f9
Jul 16 02:43:30 hubble kernel: RSP: 0018:ffffa18093437dc8 EFLAGS: 00010202
Jul 16 02:43:30 hubble kernel: RAX: 0000000000000008 RBX: 0000000000000000 RCX: 000000000000000a
Jul 16 02:43:30 hubble kernel: RDX: 0000000000000000 RSI: ffff8db4c1d5aa00 RDI: ffff8db4c1d5a018
Jul 16 02:43:30 hubble kernel: RBP: ffffa18093437e60 R08: 00000000000025a8 R09: 0000000000000008
Jul 16 02:43:30 hubble kernel: R10: ffffffffaec0e030 R11: 0000000000000000 R12: ffff8db4c1d5a000
Jul 16 02:43:30 hubble kernel: R13: 0000000000000008 R14: ffffffffafdff323 R15: ffff8db4c01e6960
Jul 16 02:43:30 hubble kernel: FS:  00007f0c1b0b1740(0000) GS:ffff8db7df800000(0000) knlGS:0000000000000000
Jul 16 02:43:30 hubble kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 02:43:30 hubble kernel: CR2: 0000000000000010 CR3: 00000003cc74e000 CR4: 0000000000350ee0
Jul 16 02:43:30 hubble kernel: note: find[14314] exited with irqs disabled
Jul 16 02:43:30 hubble kernel: ------------[ cut here ]------------
Jul 16 02:43:30 hubble kernel: kernfs_put: pci:0000:01:00.0--pci:0000:01:00.1/sync_state_only: released with incorrect active_ref 0
Jul 16 02:43:30 hubble kernel: WARNING: CPU: 8 PID: 14314 at fs/kernfs/dir.c:549 kernfs_put.part.0+0xfb/0x150
Jul 16 02:43:30 hubble kernel: Modules linked in: hid_sony ff_memless ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_s>
Jul 16 02:43:30 hubble kernel:  sha512_ssse3 snd_acp_config snd_hwdep mc fat ecdh_generic mousedev joydev crc16 snd_pcm aesni_intel snd_timer crypto_simd ucsi_ccg typec_ucsi cfg80211 cryptd snd_soc_acpi platform_profile snd typec i2c_h>
Jul 16 02:43:30 hubble kernel: CPU: 8 PID: 14314 Comm: find Kdump: loaded Tainted: P      D    OE      6.4.2-arch1-1-kdump #1 989cf3c56bded11ab89eb8592e28e5c14d7d53c4
Jul 16 02:43:30 hubble kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA506IU_FA506IU/FA506IU, BIOS FA506IU.319 04/26/2022
Jul 16 02:43:30 hubble kernel: RIP: 0010:kernfs_put.part.0+0xfb/0x150
Jul 16 02:43:30 hubble kernel: Code: 8b 4b 04 c6 05 84 26 b1 01 01 48 c7 c6 25 56 dc af 48 8b 53 10 48 85 ed 74 04 48 8b 75 10 48 c7 c7 b8 d8 e5 af e8 d5 db bf ff <0f> 0b e9 30 ff ff ff 48 8b 7b 40 48 85 ff 0f 84 34 ff ff ff f0 ff
Jul 16 02:43:30 hubble kernel: RSP: 0018:ffffa18093437e58 EFLAGS: 00010286
Jul 16 02:43:30 hubble kernel: RAX: 0000000000000000 RBX: ffff8db4c1d5a000 RCX: 0000000000000027
Jul 16 02:43:30 hubble kernel: RDX: ffff8db7df8216c8 RSI: 0000000000000001 RDI: ffff8db7df8216c0
Jul 16 02:43:30 hubble kernel: RBP: ffff8db4c1d5aa00 R08: 0000000000000000 R09: ffffa18093437ce8
Jul 16 02:43:30 hubble kernel: R10: 0000000000000003 R11: ffffffffb06ca868 R12: ffff8db4c01e6910
Jul 16 02:43:30 hubble kernel: R13: ffff8db4c01e6900 R14: ffff8db4ccc84a80 R15: ffff8db4d10b8900
Jul 16 02:43:30 hubble kernel: FS:  0000000000000000(0000) GS:ffff8db7df800000(0000) knlGS:0000000000000000
Jul 16 02:43:30 hubble kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 02:43:30 hubble kernel: CR2: 0000000000000010 CR3: 00000003aa020000 CR4: 0000000000350ee0
Jul 16 02:43:30 hubble kernel: Call Trace:
Jul 16 02:43:30 hubble kernel:  <TASK>
Jul 16 02:43:30 hubble kernel:  ? kernfs_put.part.0+0xfb/0x150
Jul 16 02:43:30 hubble kernel:  ? __warn+0x81/0x130
Jul 16 02:43:30 hubble kernel:  ? kernfs_put.part.0+0xfb/0x150
Jul 16 02:43:30 hubble kernel:  ? report_bug+0x171/0x1a0
Jul 16 02:43:30 hubble kernel:  ? prb_read_valid+0x1b/0x30
Jul 16 02:43:30 hubble kernel:  ? handle_bug+0x3c/0x80
Jul 16 02:43:30 hubble kernel:  ? exc_invalid_op+0x17/0x70
Jul 16 02:43:30 hubble kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jul 16 02:43:30 hubble kernel:  ? kernfs_put.part.0+0xfb/0x150
Jul 16 02:43:30 hubble kernel:  ? kernfs_put.part.0+0xfb/0x150
Jul 16 02:43:30 hubble kernel:  kernfs_dir_fop_release+0x1f/0x30
Jul 16 02:43:30 hubble kernel:  __fput+0x89/0x250
Jul 16 02:43:30 hubble kernel:  task_work_run+0x5d/0x90
Jul 16 02:43:30 hubble kernel:  do_exit+0x377/0xb20
Jul 16 02:43:30 hubble kernel:  make_task_dead+0x81/0x170
Jul 16 02:43:30 hubble kernel:  rewind_stack_and_make_dead+0x17/0x20
Jul 16 02:43:30 hubble kernel: RIP: 0033:0x7f0c1b184577
Jul 16 02:43:30 hubble kernel: Code: Unable to access opcode bytes at 0x7f0c1b18454d.
Jul 16 02:43:30 hubble kernel: RSP: 002b:00007ffda19b5628 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Jul 16 02:43:30 hubble kernel: RAX: ffffffffffffffda RBX: 00005636143ce4d0 RCX: 00007f0c1b184577
Jul 16 02:43:30 hubble kernel: RDX: 0000000000008000 RSI: 00005636143ce500 RDI: 0000000000000006
Jul 16 02:43:30 hubble kernel: RBP: 00005636143ce4d4 R08: 0000000000000040 R09: 0000000000000001
Jul 16 02:43:30 hubble kernel: R10: 0000000000000100 R11: 0000000000000293 R12: 00005636143ce500
Jul 16 02:43:30 hubble kernel: R13: ffffffffffffff88 R14: 0000000000000000 R15: 0000000000000000
Jul 16 02:43:30 hubble kernel:  </TASK>
Jul 16 02:43:30 hubble kernel: ---[ end trace 0000000000000000 ]---
Jul 16 02:43:31 hubble wpa_supplicant[764]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-81 noise=9999 txrate=650000
Jul 16 02:43:33 hubble wpa_supplicant[764]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-80 noise=9999 txrate=520000
Jul 16 02:43:37 hubble wpa_supplicant[764]: wlp3s0: CTRL-EVENT-SIGNAL-CHANGE above=0 signal=-81 noise=9999 txrate=195100

full log here http://ix.io/4AHW

Offline

#35 2023-07-16 05:45:31

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Jul 16 02:43:30 hubble kernel: CPU: 8 PID: 14314 Comm: find Kdump: loaded Tainted: P           OE      6.4.2-arch1-1-kdump #1 989cf3c56bded11ab89eb8592e28e5c14d7d53c4

Jul 16 02:43:30 hubble kernel: kernfs_put: pci:0000:01:00.0--pci:0000:01:00.1/sync_state_only: released with incorrect active_ref 0

That's the nvidia GPU, but i don't think it's the cause.

This is a g4m0rz notebook, right?
Did you try to chose more conservative RAM timings? (Yes, you ran memtest86+ for five cycles, but not for days …)
Alternatively try the LTS kernel, keep the cstate limitation in place for now.

Oh, and

Jul 15 23:06:38 hubble kernel: DMI: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA506IU_FA506IU/FA506IU, BIOS FA506IU.319 04/26/2022

Is that the most recent BIOS available?

Offline

#36 2023-07-16 15:39:09

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:

Jul 16 02:43:30 hubble kernel: CPU: 8 PID: 14314 Comm: find Kdump: loaded Tainted: P           OE      6.4.2-arch1-1-kdump #1 989cf3c56bded11ab89eb8592e28e5c14d7d53c4

Jul 16 02:43:30 hubble kernel: kernfs_put: pci:0000:01:00.0--pci:0000:01:00.1/sync_state_only: released with incorrect active_ref 0

That's the nvidia GPU, but i don't think it's the cause.

This is a g4m0rz notebook, right?
Did you try to chose more conservative RAM timings? (Yes, you ran memtest86+ for five cycles, but not for days …)
Alternatively try the LTS kernel, keep the cstate limitation in place for now.

Oh, and

Jul 15 23:06:38 hubble kernel: DMI: ASUSTeK COMPUTER INC. ASUS TUF Gaming A15 FA506IU_FA506IU/FA506IU, BIOS FA506IU.319 04/26/2022

Is that the most recent BIOS available?

I didnt change the ram timings, but perhaps i should be more aggressive with that, i just have to see how im going to do that.
I will change to the lts kernel.
The bios is not the most recent, i dont think thats the root cause because the problem is not old enough for that.
Also i found a tread on arch linux subreddit of people having the similar problem with amd systems (https://www.reddit.com/r/archlinux/comm … w_kernels/), maybe the cause for this is in here.

Offline

#37 2023-07-16 19:32:58

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

reddit wrote:

I'm now booting with the LTS kernel and it's completely fine

This is probably the most relevant part to see whether you face the same issue.

Offline

#38 2023-07-17 22:03:20

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Some update, I'm using the lts kernel and still crashes, i remove one stick of ram, if it crashes again i will take this again one an test the other if both crash...well I'm screwed.

Offline

#39 2023-07-18 05:35:56

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Iff the hardware manipulation gets you nowhere you might want to check https://bbs.archlinux.org/viewtopic.php … 4#p2074524
You'd be the third reporter of vastly detremental impact, so it's not overly likely. But also not impossible.

Offline

#40 2023-07-18 14:27:05

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:

Iff the hardware manipulation gets you nowhere you might want to check https://bbs.archlinux.org/viewtopic.php … 4#p2074524
You'd be the third reporter of vastly detremental impact, so it's not overly likely. But also not impossible.

It appears so that both ram sticks crash so i will apply the service that he has in the end

covid19 wrote:
[Unit]
Description=Friends do NOT let friends use transparent huge pages

[Service]
Type=oneshot
ExecStart=/bin/sh -c "/usr/bin/echo never | /usr/bin/tee /sys/kernel/mm/transparent_hugepage/enabled"
ExecStart=/bin/sh -c "/usr/bin/echo never | /usr/bin/tee /sys/kernel/mm/transparent_hugepage/defrag"
ExecStart=/bin/sh -c "/usr/bin/echo 0 | /usr/bin/tee /proc/sys/vm/compaction_proactiveness"

[Install]
WantedBy=multi-user.target

for now i wont change the kernel flags like he has ("rw loglevel=3 mitigations=off transparent_hugepage=never") because for now i dont want to turn off "mitigations" for security sake.

Offline

#41 2023-07-18 14:47:49

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

"transparent_hugepage=never" is the (only) relevant parameter here.

Offline

#42 2023-07-18 16:18:27

matryoshka
Member
Registered: 2022-05-14
Posts: 17

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Not sure but I would add I have also started to see random kernel panics (caps blinks) on reboots or just it boots to black screen (no caps blink). It started some kernels versions ago with stock and zen kernels. My laptop is Lenovo IdeaPad 5 Pro 14ACN6 with ryzen cpu and it's the first thread I came across that sounds relatable.

From logs, when it looks like kernel panic I can find kernel messages like these: https://logpaste.com/7vMpKHny

But sometimes it hangs in userspace? with gdm, or sometimes like here in this post journalctl returns nothing only first lines from kernel, but was always stable before this kernel panics random stuff begin to happen

Offline

#43 2023-07-18 19:27:28

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

matryoshka wrote:

Not sure but I would add I have also started to see random kernel panics (caps blinks) on reboots or just it boots to black screen (no caps blink). It started some kernels versions ago with stock and zen kernels. My laptop is Lenovo IdeaPad 5 Pro 14ACN6 with ryzen cpu and it's the first thread I came across that sounds relatable.

From logs, when it looks like kernel panic I can find kernel messages like these: https://logpaste.com/7vMpKHny

But sometimes it hangs in userspace? with gdm, or sometimes like here in this post journalctl returns nothing only first lines from kernel, but was always stable before this kernel panics random stuff begin to happen

Very weird behavior for this newest kernels.

seth wrote:

"transparent_hugepage=never" is the (only) relevant parameter here.

So even with "transparent_hugepage=never" it still crashes all the same. For now i will continue too ignore in the crashes because i cant do anything

Edit
About "transparent_hugepage=never", this kills performance my pc starting getting a lot slower with this kernel option, especially in games (ex: Rocket League), i will removing this from my kernel options.

Last edited by Hubbleexplorer (2023-07-19 22:39:09)

Offline

#44 2023-07-20 19:07:05

Berbigou
Member
Registered: 2023-07-02
Posts: 5

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Hello,
I tried  kernel 6.4.1 on my ASUS VivoBook X412D, but it still freezed sometimes.
I installed package amd-ucode and ran 'grub-mkconfig -o /boot/grub/grub.cfg' as in https://wiki.archlinux.org/title/Microcode, and since, I didn't have any freeze.
I now run 6.4.4 without problems.

Hope this will help someone.
Best regards.

Offline

#45 2023-07-20 20:34:05

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Berbigou wrote:

Hello,
I tried  kernel 6.4.1 on my ASUS VivoBook X412D, but it still freezed sometimes.
I installed package amd-ucode and ran 'grub-mkconfig -o /boot/grub/grub.cfg' as in https://wiki.archlinux.org/title/Microcode, and since, I didn't have any freeze.
I now run 6.4.4 without problems.

Hope this will help someone.
Best regards.

Thanks for the contribution, but for me at least this doesn't solve the problem, for now i list what i have done to try to fix this:

  • Reinstalling the kernel

  • Tried the lts and zen kernels

  • Uninstalled xf86 drivers

  • Trying to debug with kdump

  • adding kernel options:

    • "processor.max_cstate=1"

    • "transparent_hugepage=never"

  • Reinstalling grub

  • Reinstalling amd-ucode

  • Testing ram with memtest

  • chaging the nvidia drivers from kms to normal nvidia drivers

I think this is all and still no success at all, still crashing, something it just freezes other times goes black screen, I beginning to suspect that the problem is an hardware problem in the CPU or in the embedded GPU (maybe disable it and run the pc only with the nvidia gpu would be a good way to test that, but i dont know how to do it). I don't know how to test for that so if anyone have suggestions help would be appreciated.

Last edited by Hubbleexplorer (2023-07-20 20:37:11)

Offline

#46 2023-07-20 20:51:28

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Try a different SW stack.
If it happesn across various live-distros, chances of this being a software issue wither away…

Offline

#47 2023-07-21 01:36:10

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:

Try a different SW stack.
If it happesn across various live-distros, chances of this being a software issue wither away…

That's a way to test it.For now i don't have time and honestly the patience to do it.
For now i will just live with it an pray it doesn't corrupt any critical data.

Offline

#48 2023-07-22 21:23:40

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

Quick update so i replace the nvidia-dkms driver for the normal nvidia drivers, and the system as been a lot more stable. The performance of the system is awfull tho, it drop by around 10-20% but it's usable.
Don't know why the dkms driver is causing such a issue maybe some miss configuration on my system or maybe a driver bug, for now i will use the normal nvidia drivers even if they have a lot less performance.

Offline

#49 2023-07-22 21:31:17

seth
Member
Registered: 2012-09-03
Posts: 52,454

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

What is currently the output of "glxinfo -B"?

Offline

#50 2023-07-23 14:08:31

Hubbleexplorer
Member
Registered: 2021-05-15
Posts: 89

Re: [CLOSE/Unable to find cause]Random kernel panics without log's

seth wrote:

What is currently the output of "glxinfo -B"?

I so without any modifications

: glxinfo -B                                   
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.4.4-arch1-1) (0x1636)
    Version: 23.1.3
    Accelerated: yes
    Video memory: 512MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 107 MB, largest block: 107 MB
    VBO free aux. memory - total: 7521 MB, largest block: 7521 MB
    Texture free memory - total: 107 MB, largest block: 107 MB
    Texture free aux. memory - total: 7521 MB, largest block: 7521 MB
    Renderbuffer free memory - total: 107 MB, largest block: 107 MB
    Renderbuffer free aux. memory - total: 7521 MB, largest block: 7521 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 512 MB
    Total available memory: 8216 MB
    Currently available dedicated video memory: 107 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.4.4-arch1-1)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 23.1.3
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.1.3
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.1.3
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

with a little script i use to run games

cat /bin/nvidiarun
#!/bin/zsh
export __NV_PRIME_RENDER_OFFLOAD=1;
export __GLX_VENDOR_LIBRARY_NAME=nvidia;

$@;
 nvidiarun glxinfo -B             
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 6144 MB
    Total available memory: 6144 MB
    Currently available dedicated video memory: 5928 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce GTX 1660 Ti/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 535.86.05
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 535.86.05
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 535.86.05
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

Offline

Board footer

Powered by FluxBB