You are not logged in.
To whom it may concern,
My current build is as follows:
Ryzen 5 3600 (stock settings, no overclock)
Gigabyte X570 Gaming X
Corsair Vengeance LPX 2X8GB (CMK16GX4M2B3200C16W) (Samsung E-die chips)
Unfortunately this system has been giving me nothing but trouble since the initial build date of July 2019. Things have been getting particularly bad since the start of 2020 with the symptoms primarily manifesting themselves as daily hard lockups (mainly above 2133 MHz DDR4 speeds) and segfaults (at any DDR4 speed).
When the system locks up I am unable to do anything other than a hard reset via the physical button. No response from: SysReq, CAPS LOCK, CTRL+ALT+F2, CTRL+ALT+BACKSPACE. I have not tried to SSH into the system when it is in this state but I do not imagine it would be responsive if even the keyboard LEDs are not.
Things I have tried include:
- Various kernel parameters (acpi_enforce_resources=lax amdgpu.noretry=0 idle=nomwait rcu_nocbs=0-11 processor.max_cstate=1 <--- these are the ones I have enabled currently)
- Disabling C6 states via UEFI, ryzen-stabilizator, kernel paramaters, and systemd files
- Disabling Cool n' Quiet via UEFI
- Disabling PBO via UEFI
- Updating UEFI to latest stable version (F11)
- Loading factory default settings via UEFI
- Using DDR4 timings from DRAM Calculator For Ryzen
- Running memtest86 overnight (passed with 0 errors even at 3200 MHz DDR4 speeds)
- Running mprime overnight (passed with 0 errors for over 8 hours at 100% CPU and RAM usage)
Things I have not tried include:
- Disabling features I feel are necessary to the "safe" operation of the CPU (ASLR, SMT, SVM)
In spite of everything I have attempted the system is not stable. Most of these crashes happen at idle and with DDR4 speeds above 2133 MHz. Dropping the RAM speeds to 2133 MHz seems to help with the system locking up (makes it happen once a month instead of every day) but the segfaults still happen. Both attached files from dmesg and journalctl are with DDR4 running at Auto (2133 MHz).
I am really getting to the point where I am running out of things to try so I would greatly appreciate any and all suggestions you may have. I recently attempted to open an RMA with AMD and they were pretty adamant that they do not feel as though any of this is CPU related... but I do not see what else would cause these issues. I also do not feel as though their support bothered to actually read my ticket as all of the questions they asked me to respond to had already been answered in the initial request. I have been running Arch on systems for over 15 years and I have never seen one suffer from this kind of stability issues... which is a real shame because I had high hopes for Ryzen.
Here is a link to dmesg, journalctl, and the message I received from AMD support. The log files are indicative of the segfault issues I am experiencing. When the system hard locks there is no indication of it in the log files.
https://gist.github.com/poisonoushydra/ … 1f879508ae
Thanks in advance for any and all help provided!
Last edited by raydnmalachi (2020-04-21 19:34:43)
To dwell in the past is to die in the present.
Later is always later than later.
Offline
While typing the above more "fun" stuff has happened:
[29815.870978] BUG: scheduling while atomic: firefox/36468/0x00000002
[29815.870980] Modules linked in: snd_hrtimer snd_seq snd_seq_device cfg80211 rfkill 8021q garp mrp stp llc nls_iso8859_1 nls_cp437 vfat fat fuse edac_mce_amd kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg irqbypass input_leds xpad snd_hda_codec joydev r8169 crct10dif_pclmul ff_memless snd_hda_core crc32_pclmul snd_hwdep wmi_bmof ghash_clmulni_intel realtek ccp snd_pcm zenpower(OE) aesni_intel crypto_simd snd_timer cryptd glue_helper snd pcspkr k10temp i2c_piix4 rng_core libphy soundcore wmi mousedev evdev pinctrl_amd mac_hid acpi_cpufreq msr ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid crc32c_intel xhci_pci xhci_hcd i915 intel_gtt amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm agpgart
[29815.871013] Preemption disabled at:
[29815.871016] [<0000000000000000>] 0x0
[29815.871019] CPU: 3 PID: 36468 Comm: firefox Tainted: G D OE 5.6.5-arch3-1 #1
[29815.871019] Hardware name: Gigabyte Technology Co., Ltd. X570 GAMING X/X570 GAMING X, BIOS F11 12/06/2019
[29815.871020] Call Trace:
[29815.871027] dump_stack+0x66/0x90
[29815.871030] __schedule_bug.cold+0x8e/0x9b
[29815.871032] __schedule+0x64c/0x7a0
[29815.871034] schedule+0x46/0xf0
[29815.871035] schedule_hrtimeout_range_clock+0x10a/0x120
[29815.871039] poll_schedule_timeout.constprop.0+0x42/0x70
[29815.871041] do_sys_poll+0x411/0x540
[29815.871042] ? __switch_to_asm+0x40/0x70
[29815.871046] ? poll_select_finish+0x280/0x280
[29815.871047] ? poll_select_finish+0x280/0x280
[29815.871048] ? poll_select_finish+0x280/0x280
[29815.871050] ? poll_select_finish+0x280/0x280
[29815.871051] ? poll_select_finish+0x280/0x280
[29815.871055] __x64_sys_poll+0x48/0x140
[29815.871058] do_syscall_64+0x4e/0x150
[29815.871059] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[29815.871061] RIP: 0033:0x7f74add2eabf
[29815.871062] Code: 54 24 1c 48 89 74 24 10 48 89 7c 24 08 e8 09 0f f9 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 48 8b 7c 24 08 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 89 44 24 08 e8 3d 0f f9 ff 8b 44
[29815.871063] RSP: 002b:00007ffc9b18a910 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
[29815.871064] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f74add2eabf
[29815.871064] RDX: 00000000ffffffff RSI: 0000000000000005 RDI: 00007f7482e48790
[29815.871065] RBP: 00007f74ad659600 R08: 0000000000000000 R09: 0000000000000001
[29815.871066] R10: 00007f747e415260 R11: 0000000000000293 R12: 00007f7482e48790
[29815.871067] R13: 00007f74ad668000 R14: 00000000ffffffff R15: 0000000000000005
[29839.090168] audit: type=1101 audit(1587497683.941:571): pid=43413 uid=1000 auid=1000 ses=1 msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="hydra" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[29839.090305] audit: type=1110 audit(1587497683.941:572): pid=43413 uid=0 auid=1000 ses=1 msg='op=PAM:setcred grantors=pam_unix,pam_permit,pam_env acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[29839.092655] audit: type=1105 audit(1587497683.945:573): pid=43413 uid=0 auid=1000 ses=1 msg='op=PAM:session_open grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[29839.098702] audit: type=1106 audit(1587497683.951:574): pid=43413 uid=0 auid=1000 ses=1 msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[29839.098772] audit: type=1104 audit(1587497683.951:575): pid=43413 uid=0 auid=1000 ses=1 msg='op=PAM:setcred grantors=pam_unix,pam_permit,pam_env acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'Apr 21 15:34:20 doppelganger kernel: BUG: scheduling while atomic: firefox/36468/0x00000002
Apr 21 15:34:20 doppelganger kernel: Modules linked in: snd_hrtimer snd_seq snd_seq_device cfg80211 rfkill 8021q garp mrp stp llc nls_iso8859_1 nls_cp437 vfat fat fuse edac_mce_amd kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg irqbypass input_leds xpad snd_hda_codec joydev r8169 crct10dif_pclmul ff_memless snd_hda_core crc32_pclmul snd_hwdep wmi_bmof ghash_clmulni_intel realtek ccp snd_pcm zenpower(OE) aesni_intel crypto_simd snd_timer cryptd glue_helper snd pcspkr k10temp i2c_piix4 rng_core libphy soundcore wmi mousedev evdev pinctrl_amd mac_hid acpi_cpufreq msr ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid crc32c_intel xhci_pci xhci_hcd i915 intel_gtt amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm agpgart
Apr 21 15:34:20 doppelganger kernel: Preemption disabled at:
Apr 21 15:34:20 doppelganger kernel: [<0000000000000000>] 0x0
Apr 21 15:34:20 doppelganger kernel: CPU: 3 PID: 36468 Comm: firefox Tainted: G D OE 5.6.5-arch3-1 #1
Apr 21 15:34:20 doppelganger kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 GAMING X/X570 GAMING X, BIOS F11 12/06/2019
Apr 21 15:34:20 doppelganger kernel: Call Trace:
Apr 21 15:34:20 doppelganger kernel: dump_stack+0x66/0x90
Apr 21 15:34:20 doppelganger kernel: __schedule_bug.cold+0x8e/0x9b
Apr 21 15:34:20 doppelganger kernel: __schedule+0x64c/0x7a0
Apr 21 15:34:20 doppelganger kernel: schedule+0x46/0xf0
Apr 21 15:34:20 doppelganger kernel: schedule_hrtimeout_range_clock+0x10a/0x120
Apr 21 15:34:20 doppelganger kernel: poll_schedule_timeout.constprop.0+0x42/0x70
Apr 21 15:34:20 doppelganger kernel: do_sys_poll+0x411/0x540
Apr 21 15:34:20 doppelganger kernel: ? __switch_to_asm+0x40/0x70
Apr 21 15:34:20 doppelganger kernel: ? poll_select_finish+0x280/0x280
Apr 21 15:34:20 doppelganger kernel: ? poll_select_finish+0x280/0x280
Apr 21 15:34:20 doppelganger kernel: ? poll_select_finish+0x280/0x280
Apr 21 15:34:20 doppelganger kernel: ? poll_select_finish+0x280/0x280
Apr 21 15:34:20 doppelganger kernel: ? poll_select_finish+0x280/0x280
Apr 21 15:34:20 doppelganger kernel: __x64_sys_poll+0x48/0x140
Apr 21 15:34:20 doppelganger kernel: do_syscall_64+0x4e/0x150
Apr 21 15:34:20 doppelganger kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 21 15:34:20 doppelganger kernel: RIP: 0033:0x7f74add2eabf
Apr 21 15:34:20 doppelganger kernel: Code: 54 24 1c 48 89 74 24 10 48 89 7c 24 08 e8 09 0f f9 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 48 8b 7c 24 08 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 44 89 c7 89 44 24 08 e8 3d 0f f9 ff 8b 44
Apr 21 15:34:20 doppelganger kernel: RSP: 002b:00007ffc9b18a910 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
Apr 21 15:34:20 doppelganger kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f74add2eabf
Apr 21 15:34:20 doppelganger kernel: RDX: 00000000ffffffff RSI: 0000000000000005 RDI: 00007f7482e48790
Apr 21 15:34:20 doppelganger kernel: RBP: 00007f74ad659600 R08: 0000000000000000 R09: 0000000000000001
Apr 21 15:34:20 doppelganger kernel: R10: 00007f747e415260 R11: 0000000000000293 R12: 00007f7482e48790
Apr 21 15:34:20 doppelganger kernel: R13: 00007f74ad668000 R14: 00000000ffffffff R15: 0000000000000005To dwell in the past is to die in the present.
Later is always later than later.
Offline
You should try to SSH to the system. I have experienced freezes that leave everything working except the display and keyboard. What GPU are you using?
Last edited by Pse (2020-04-21 21:03:08)
Offline
Thank you for your reply!
My GPU is the Gigabyte RX570 Gaming 4G. I was previously experiencing the odd GPU related lockup but I have not seen anything pertaining to that specifically since adding "amdgpu.noretry=0" to the kernel parameters.
I have enabled SSH and will attempt to log into the machine remotely if the lockup happens again. The frequency is sporadic, not easy to reproduce, and it currently has not happened since the 14th (7 days). I am hoping that the issue with the lockups is resolved but in the meantime am still experiencing these segfaults which I do not believe are related to the GPU.
Thanks again!
To dwell in the past is to die in the present.
Later is always later than later.
Offline
Looks like this gigabyte motherboard uses the x570 chipset ?
A possibilty is that the F11 firmware is the problem , check https://bbs.archlinux.org/viewtopic.php?id=252859 .
Some other weird things :
[ 3.019038] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[ 3.019038] AMD-Vi: AMD IOMMUv2 functionality not available on this systemAMD processors tend to rely on IOMMU functionality on x86_64 bits OSes since more then a decade, you may want to check your firmware settings.
[ 5.921712] zenpower: module verification failed: signature and/or required key missing - tainting kernelI'm guessing that comes from https://aur.archlinux.org/packages/zenpower-dkms ?
Have you tried running without that kernel module ?
Last edited by Lone_Wolf (2020-04-21 21:51:32)
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
I had seen this post previously, but I do not see the issues as being connected. The poster in the topic you linked to has an easily reproducible error, and one which he claims was fixed with the 5.6.3 Linux kernel (I am running 5.6.5-arch3-1). In fact the only thing that poster and I have in common appear to be our UEFI versions. Even his motherboard is different from mine. Gigabyte X570 UD is not the same as Gigabyte X570 Gaming X (the X570 part is there to indicate the chipset).
For fun I decided to run
while true; do rocminfo; donewhich ran for over an hour with no issues. The post you linked to mentioned that command crashing his system immediately.
I have removed the zenpower-dkms module as per your suggestion and reinstalled linux and linux-headers. Before I reboot I now have more issues indicated in my log files:
[42479.018634] BUG: kernel NULL pointer dereference, address: 0000000000000000
[42479.018638] #PF: supervisor instruction fetch in kernel mode
[42479.018639] #PF: error_code(0x0010) - not-present page
[42479.018640] PGD 0 P4D 0
[42479.018643] Oops: 0010 [#2] PREEMPT SMP NOPTI
[42479.018645] CPU: 9 PID: 1093 Comm: dhcpcd-gtk Tainted: G D W OE 5.6.5-arch3-1 #1
[42479.018646] Hardware name: Gigabyte Technology Co., Ltd. X570 GAMING X/X570 GAMING X, BIOS F11 12/06/2019
[42479.018648] RIP: 0010:0x0
[42479.018651] Code: Bad RIP value.
[42479.018652] RSP: 0018:ffffb78b8243fd68 EFLAGS: 00010246
[42479.018653] RAX: ffff9202e2392000 RBX: ffffffffaf1262a0 RCX: 0000000000000000
[42479.018654] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9205c9530000
[42479.018655] RBP: ffff9205cc117800 R08: 0000000000000000 R09: ffff9202e2392000
[42479.018656] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[42479.018656] R13: 0000000000000dc0 R14: 0000000000000001 R15: 0000000000000000
[42479.018658] FS: 00007f504bdb68c0(0000) GS:ffff9205cec40000(0000) knlGS:0000000000000000
[42479.018659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42479.018660] CR2: ffffffffffffffd6 CR3: 00000003a203c000 CR4: 0000000000340ee0
[42479.018661] Call Trace:
[42479.018667] ? sk_alloc+0x2c/0x270
[42479.018670] ? unix_create1+0x61/0x1f0
[42479.018672] ? unix_stream_connect+0xb4/0x761
[42479.018674] ? preempt_count_add+0x49/0xa0
[42479.018677] ? percpu_counter_add_batch+0x81/0xb0
[42479.018680] ? __sys_connect+0xad/0xe0
[42479.018682] ? alloc_file_pseudo+0xb5/0x120
[42479.018684] ? __x64_sys_connect+0x16/0x20
[42479.018687] ? do_syscall_64+0x4e/0x150
[42479.018689] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[42479.018693] Modules linked in: snd_hrtimer snd_seq snd_seq_device cfg80211 rfkill 8021q garp mrp stp llc nls_iso8859_1 nls_cp437 vfat fat fuse edac_mce_amd kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg irqbypass input_leds xpad snd_hda_codec joydev r8169 crct10dif_pclmul ff_memless snd_hda_core crc32_pclmul snd_hwdep wmi_bmof ghash_clmulni_intel realtek ccp snd_pcm zenpower(OE) aesni_intel crypto_simd snd_timer cryptd glue_helper snd pcspkr k10temp i2c_piix4 rng_core libphy soundcore wmi mousedev evdev pinctrl_amd mac_hid acpi_cpufreq msr ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid crc32c_intel xhci_pci xhci_hcd i915 intel_gtt amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm agpgart
[42479.018724] CR2: 0000000000000000
[42479.018726] ---[ end trace 3f3e7dd209414772 ]---
[42479.018729] RIP: 0010:up_read+0xc/0x40
[42479.018731] Code: 89 e6 e8 b7 a7 8b 00 48 89 e7 e8 3f 63 fd ff eb 95 e8 88 df f9 ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 c0 00 ff ff ff <f0> 48 0f c1 07 48 2d 00 01 00 00 24 03 48 83 f8 02 74 01 c3 48 8b
[42479.018731] RSP: 0000:ffffb78b83007ef0 EFLAGS: 00010246
[42479.018732] RAX: ffffffffffffff00 RBX: ffff920566d6ee80 RCX: 0000000000000000
[42479.018733] RDX: 0000000000000000 RSI: 00007f0db499e000 RDI: 0000000000000000
[42479.018734] RBP: 0000000000000006 R08: 0000000000000000 R09: 00000000000ab6cd
[42479.018734] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb78b83007f58
[42479.018735] R13: 00007f0db499efe0 R14: ffff9204c70c5ac0 R15: 0000000000000055
[42479.018736] FS: 00007f504bdb68c0(0000) GS:ffff9205cec40000(0000) knlGS:0000000000000000
[42479.018737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42479.018738] CR2: ffffffffffffffd6 CR3: 00000003a203c000 CR4: 0000000000340ee0EDIT: Here is my dmesg contents after removing the zenpower-dkms module (notice it ends with two segfaults on different threads). Unfortunately the problem is not solved!
https://gist.github.com/poisonoushydra/ … ba63de101b
EDIT 2: The IOMMU "error" you mentioned appears to be a warning about the version support and not really an "error" after all:
https://www.linuxquestions.org/question … 175589036/
Last edited by raydnmalachi (2020-04-21 23:25:38)
To dwell in the past is to die in the present.
Later is always later than later.
Offline
IOMMUv2
That thread you linked to was from 2016 when IOMMUv2 was still uncommon for amd non-server processors .
All ryzen / threadripper / epyc aka zen family processors do support it in hardware and only show that message if IOMMU is disabled in firmware.
A disabled IOMMU works fine on 32-bit OSes but can severely hamper functionality on 64-bit OSes.
The thread also mentioned that AMD-Vi support is what matters, not the informational message. It's an oversimplification, but I can agree with that.
Taken from my threadripper system :
$ dmesg | grep AMD-Vi
[ 1.411425] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.411516] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.443275] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.443276] pci 0000:00:00.2: AMD-Vi: Extended features (0xf77ef22294ada):
[ 1.443279] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 1.443280] pci 0000:40:00.2: AMD-Vi: Extended features (0xf77ef22294ada):
[ 1.443282] AMD-Vi: Interrupt remapping enabled
[ 1.443282] AMD-Vi: Virtual APIC enabled
[ 1.443782] AMD-Vi: Lazy IO/TLB flushing enabled
[ 1.459973] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
$Run that on your system and notice the differences .
In fact the only thing that poster and I have in common appear to be our UEFI versions. Even his motherboard is different from mine. Gigabyte X570 UD is not the same as Gigabyte X570 Gaming X (the X570 part is there to indicate the chipset).
Firmware for AMD chipsets is supplied by AMD to motherboard manufacturers in the form of AGESA . The manufacturer usually does some tweaking , testing and adds a few things but the majority of the functionality comes from the agesa version.
The Gigabyte X570 UD and your Gigabyte X570 Gaming X use the same chipset and come from the same manufacturer. There's a very big chance they also use the same AGESA version .
It does look like you have different symptoms though.
The 2 segfaults in dmesg both appear to have to do with steam .
Try starting steam directly from console to verify if those are generic for steam or related to some apllication .
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Good morning! Thank you for all of the suggestions you have provided so far!
After rebooting this morning and making sure IOMMU was enabled (it must have been disabled during the UEFI reset) I am unfortunately still experiencing the segfault issue. I wish it was program specific but I have seen it affect many different applications on this device. Here are my logs from this morning:
https://gist.github.com/poisonoushydra/ … 58c65e464c
Still scratching my head about this one...
Thanks again!
To dwell in the past is to die in the present.
Later is always later than later.
Offline
I looked through everything you've tried and you've pretty much everything I can also come up with.
Are there sometimes boots where nothing bad happens and everything runs fine for hours and hours?
You mention start of 2020, this then means you were all this time on the newest BIOS that exists for your board?
I remember seeing one or two people say that the AMD "AGESA 1.0.0.4 B" firmware has issues with something, and that the previous "AGESA 1.0.0.3 ABBA" was somehow better. I didn't see any details. If Gigabyte offers a BIOS with that 1.0.0.3 ABBA thingy on their support site, maybe try that and see what happens. You will have to set all your BIOS settings again if you try a different BIOS (don't load a saved settings profile from a different version).
I think I remember someone say that their 3000 series CPU needs a tiny bit of extra voltage to run stable. There's an "offset voltage" setting that can add (or subtract) from the CPU's core voltage. You would try a value like "+0.025 V" for that kind of setting. Something like "+0.1 V" would be too much, don't try that kind of large value.
On my Ryzen system here, I need "pcie_aspm=off" on the kernel command line. I get strange "machine check errors" when the CPU is under stress without that kernel parameter. This "pcie_aspm=off" is about power saving on the PCIe bus.
Offline
Thank you for the suggestion!
Unfortunately the issue persists even after booting with ASPM disabled. By the way, this segfault happened while Firefox was running in Safe Mode, and I performed a Refresh on it a few days ago...
https://gist.github.com/poisonoushydra/ … df7880586f
I will look into your suggestion regarding downgrading the BIOS and tweaking the offset voltage. I believe I was on version F4 of the UEFI until around January. I only began to flash versions more recent than F4 when the stability issues and hard locks began to present themselves.
F4
10.11 MB
2019/09/04
Update AGESA 1.0.0.3 ABB
Improve Destiny 2 gaming compatibility
Improve XMP DDR compatibility
Fix compatibility of SATA hot plug and RAIDXpert2 settingIt seems like F5b may in fact be the version I should downgrade to based on your suggestions regarding the AGESA versions. I believe I went straight from F4 to F10 (and then on to the F11 when that failed to resolve my issues).
F5b
10.12 MB
2019/10/18
Update AGESA 1.0.0.3 ABBAI will probably just wind up trying all firmware versions until I find one which is stable.
I will continue to monitor the situation and report back if anything changes.
Many thanks!
Last edited by raydnmalachi (2020-04-22 16:59:59)
To dwell in the past is to die in the present.
Later is always later than later.
Offline
I have common symptoms - again random freezes and so on. I noticed the system does not freeze completely just seems like graphics card borks and everything becomes very sluggish. Cursor moves very slow, keystrokes are slow and so on.
My configuration is:
Ryzen 5 3600 (stock settings, no overclock)
Gigabyte B450 Auros Pro F50 bios
Kingston KHX3200C18D4/8G x 2
With latest kernels freezes are more rare but still happen from time to time.
I read in some forums that the below 2 bios parameters may help, so i'm trying them
global c-state control --> disabled
power supply idle control --> typical current idle
I removed other tweaks and kernel parameters to see how it goes. If it does not freeze for a week or two it will be great.
Last edited by ieti (2020-04-22 18:17:53)
Offline
This afternoon I decided to downgrade to the F5b BIOS, making sure to Load Optimized Defaults. Unfortunately the issue is still manifesting itself (although it seems to be constrained to Firefox Web Content thread... for now):
https://gist.github.com/poisonoushydra/ … 9341705eaa
All UEFI settings are currently set to default except for:
- Full screen LOGO show = Disabled
- Precision Boost Overdrive = Disabled
- IOMMU = Enabled
- Power Supply Idle Control = Typical Current Idle
- SVM = Enabled
- Dynamic VCORE = +0.024V
I am starting to think it might be time to push AMD a bit harder for an RMA here...
Last edited by raydnmalachi (2020-04-22 20:24:25)
To dwell in the past is to die in the present.
Later is always later than later.
Offline
I saw on their site that there is F12e bios which "Improve memory compatibility". Maybe this can fix your issues.
Offline