You are not logged in.

#1 2020-05-09 16:35:35

ricky
Member
Registered: 2020-04-27
Posts: 28

[SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Hi folks,

Hardware
Fresh build with the following hardware, and I am trying to troubleshoot a system instability:

  • Ryzen 3 3200G

  • AsRock B450M Pro4, BIOS v3.90 (latest FW)

  • 500GB NVME WD SSD

  • 16GB DDR4 RAM @ 3200MHz

  • I would like to note that the RAM I am using is not on the approved list for my motherboard, but I have about 14 hours of memtest on it without any errors.

OS
Kernel: 5.6.13-arch1-1, amd-ucode is being loaded prior to the kernel using the systemd-boot bootloader.

Symptom
During use, but typically after sitting unattended the system locks up and a hard reset is required. Cannot switch to a different /dev/tty terminal. After a hard reboot, running:

$ journalctl --no-hostname -b -1 | tail -n 10

yields the following:

May 09 09:21:37 ksmserver[923]: MapNotify: 46140864
May 09 09:21:37 ksmserver[923]: CreateNotify: 46140865
May 09 09:21:37 ksmserver[923]: MapNotify: 46140865
May 09 09:21:37 ksmserver[923]: CreateNotify: 46140866
May 09 09:21:37 ksmserver[923]: MapNotify: 46140866
May 09 09:21:37 ksmserver[923]: UnmapNotify: 46139963
May 09 09:21:37 ksmserver[923]: UnmapNotify: 46139962
May 09 09:21:37 ksmserver[923]: UnmapNotify: 46139964
May 09 09:28:31 kernel: pcieport 0000:02:04.0: can't change power state from D3cold to D0 (config space inaccessible)
May 09 09:28:33 kernel: pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)

After logging each of the crashes, `pcieport` is one of the final log entries.

# lspci -v

yields the following after redacting irrelevant output:

02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 32
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: [disabled]
        Prefetchable memory behind bridge: [disabled]
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Downstream Port (Slot+), MSI 00
        Capabilities: [c0] Subsystem: ASRock Incorporation 400 Series Chipset PCIe Port
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [200] Secondary PCI Express
        Capabilities: [400] L1 PM Substates
        Kernel driver in use: pcieport

02:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 35
        Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: [disabled]
        Prefetchable memory behind bridge: [disabled]
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Downstream Port (Slot+), MSI 00
        Capabilities: [c0] Subsystem: ASRock Incorporation 400 Series Chipset PCIe Port
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [200] Secondary PCI Express
        Capabilities: [400] L1 PM Substates
        Kernel driver in use: pcieport

My kernel command line follows as:

$ cat /proc/cmdline
initrd=\amd-ucode.img initrd=\initramfs-linux.img root=/dev/nvme0n1p1 clocksource=hpet

I am specifying the clocksource, because previously I was receiving clock instability messages with a switch to hpet, and I believed this was the cause of the system hang.

I am receiving an iommu error on each boot as well, which I believe could be related to my problem.

$ journalctl --no-hostname -b 0 | grep -i 'iommu'
May 09 12:23:22 kernel: iommu: Default domain type: Translated
May 09 12:23:22 kernel: AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
May 09 12:23:22 kernel: AMD-Vi: AMD IOMMUv2 functionality not available on this system
May 09 12:23:23 kernel: kfd kfd: error getting iommu info. is the iommu enabled?
May 09 12:23:23 kernel: kfd kfd: Error initializing iommuv2

Unsure if I should be using amd_iommu=on, and iommu=pt kernel parameters, and where I should go for further debugging. I appreciate all the help and feedback that is provided. Thanks in advance!

Last edited by ricky (2020-06-23 10:31:34)

Offline

#2 2020-05-09 21:47:11

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

That "D3cold" and "D0" stuff is maybe a feature of PCIe "ASPM" = "active state power management"?  You can disable ASPM like this on the kernel command line:

pcie_aspm=off

About IOMMU, you don't need any kernel command line option for it. If it's not working for you, this means it's disabled in the BIOS. You need to enable it in the BIOS first. About where to find it there, I can't remember the exact location because it's pretty hidden on my ASRock board. It's somewhere in the "Advanced" section.

Last edited by Ropid (2020-05-09 21:48:32)

Offline

#3 2020-05-09 23:52:52

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

After researching ASPM I would tend to agree with you. Reference for reading about the different power states.

I set pcie_aspm=off, I will run with this and report back.

I am still trying to understand what is going on under the hood and causing this fault. Not sure what PCIe device it is calling out either.

I plan on reviewing the following:

Cheers

Offline

#4 2020-05-10 02:01:56

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

"AER" is disabled by default on my ASRock board, I have to enable it manually. PCIe has some sort of error detection for data transfers. What AER does is make the error correction visible (you'll get log messages if there's errors), so I think it's a good idea to enable it.

Offline

#5 2020-05-10 11:42:22

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Ropid wrote:

That "D3cold" and "D0" stuff is maybe a feature of PCIe "ASPM" = "active state power management"?  You can disable ASPM like this on the kernel command line:

pcie_aspm=off

About IOMMU, you don't need any kernel command line option for it. If it's not working for you, this means it's disabled in the BIOS. You need to enable it in the BIOS first. About where to find it there, I can't remember the exact location because it's pretty hidden on my ASRock board. It's somewhere in the "Advanced" section.

Unfortunately, this does not seem to be the cure for the problem. Let it run overnight, but the journal log ended at 05:44 this morning, and I was faced with a computer that was outputting no video, and completely frozen. Here is the end of the log:

May 10 05:44:18 kernel: perf: interrupt took too long (665481 > 642243), lowering kernel.perf_event_max_sample_rate to 300
May 10 05:44:17 kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 65.626 msecs
May 10 05:44:14 kernel: audit: type=1131 audit(1589100254.900:63): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 10 05:44:14 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 10 05:44:14 systemd[1]: NetworkManager-dispatcher.service: Succeeded.
May 10 05:44:05 kernel: audit: type=1130 audit(1589100245.137:62): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 10 05:44:05 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 10 05:44:05 systemd[1]: Started Network Manager Script Dispatcher Service.
May 10 05:44:05 dbus-daemon[406]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
May 10 05:44:05 systemd[1]: Starting Network Manager Script Dispatcher Service...
May 10 05:44:05 systemd[1]: Condition check resulted in Update is Completed being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Create System Users being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in First Boot Wizard being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Store a System Token in an EFI Variable being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in Rebuild Dynamic Linker Cache being skipped.
May 10 05:44:05 systemd[1]: Condition check resulted in FUSE Control File System being skipped.
May 10 05:44:05 dbus-daemon[406]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.1' (uid=0 pid=410 comm="/usr/bin/NetworkManager --no-daemon ")
May 10 05:44:05 NetworkManager[410]: <info>  [1589100245.1230] manager: NetworkManager state is now CONNECTED_SITE
May 10 05:43:28 kernel: pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)
May 10 05:43:27 kernel: perf: interrupt took too long (513795 > 2500), lowering kernel.perf_event_max_sample_rate to 300
May 10 05:43:27 kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
May 10 05:43:27 kernel: pcieport 0000:02:04.0: can't change power state from D3cold to D0 (config space inaccessible)
May 10 05:12:03 plasmashell[598]: libkcups: Request failed 1282 -1
May 10 05:12:03 plasmashell[598]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor
May 10 04:13:43 plasmashell[598]: libkcups: Request failed 1282 -1
May 10 04:13:43 plasmashell[598]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor
May 10 03:15:24 plasmashell[598]: libkcups: Request failed 1282 -1
May 10 03:15:24 plasmashell[598]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor
May 10 02:17:03 plasmashell[598]: libkcups: Request failed 1282 -1
May 10 02:17:03 plasmashell[598]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor
May 10 01:18:43 plasmashell[598]: libkcups: Request failed 1282 -1
May 10 01:18:43 plasmashell[598]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor

Perhaps the pcieport is a red herring and there's another issue? I am having a hard time gathering the root cause from the journal logs.

Note that I have disabled SVM, and SR-IOV in the BIOS now. Should be unrelated to what we're dealing with here.

Last edited by ricky (2020-05-10 11:47:50)

Offline

#6 2020-05-15 22:28:09

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

I have lowered the speed I am running my RAM at to 2133MHz, instead of it being overclocked to run at 3200MHz as the CPU technically only supports up to 2933MHz.

However, this has not stopped the system hangs. Log from the latest:

May 14 22:50:10 kernel: pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)
May 14 22:50:10 kernel: pcieport 0000:02:04.0: can't change power state from D3cold to D0 (config space inaccessible)
May 14 22:50:10 kernel: perf: interrupt took too long (1142238 > 2500), lowering kernel.perf_event_max_sample_rate to 300
May 14 22:50:08 kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs
May 14 21:55:59 plasmashell[599]: libkcups: Request failed 1282 -1
May 14 21:55:59 plasmashell[599]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor
May 14 20:57:39 plasmashell[599]: libkcups: Request failed 1282 -1
May 14 20:57:39 plasmashell[599]: libkcups: Create-Printer-Subscriptions last error: 1282 Bad file descriptor
May 14 20:48:44 kded5[526]: ktp-kded-module: plugin queue activation: "xa" ""
May 14 20:48:44 kded5[526]: ktp-kded-module: "auto-away" presence change request: "xa" ""
May 14 20:38:44 kded5[526]: ktp-kded-module: plugin queue activation: "away" ""
May 14 20:38:44 kded5[526]: ktp-kded-module: "auto-away" presence change request: "away" ""

There are two things I would like to know to further debug this, but I am unsure how to find this information.

  1. How can I determine from lspci what these two PCI devices actually are? i.e. Could they be my two sticks of RAM? Refer to first post for lspci -vv.

  2. Why is the kernel trying to change the power state of these PCIe devices?

I appreciate the help and insight. Cheers!

Offline

#7 2020-05-16 02:50:08

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Leaving a few notes:

  • I have installed and enabled cups. I hadn't configured any printers but I keep seeing the same failed output, so hoping to cleanup my logs.

  • I have installed linux-tools, which includes perf. I know that perf is supposed to be included in the kernel, but the 'perf: interrupt took too long' lines are always one of the final lines in my log before a hang..

  • I have removed clocksource=hpet from my kernel command line. I made this change before believing this was part of my issue, I no longer believe the tsc clock is the issue.

  • I have elevated the kernel debug level to level 7 to run overnight, hopefully I catch more useful information.

Here's hoping the system is still alive tomorrow morning!

Cheers

Last edited by ricky (2020-05-16 03:03:26)

Offline

#8 2020-05-16 12:46:10

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

System ran an hour and a half before locking up. Following up from my last post, looks like I definitely need to be keeping clocksource=hpet on the kernel command line, but in addition, this clock source instability occurs at the same time as the pcieport power state changing. Here's the end of the log:

May 16 01:34:38 kernel: clocksource: Switched to clocksource hpet
May 16 01:34:37 kernel: pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)
May 16 01:34:37 kernel: [drm] Fence fallback timer expired on ring gfx
May 16 01:34:37 kernel: sched_clock: Marking unstable (5726924152410, -27090991)<-(5726900414914, -3353509)
May 16 01:34:37 kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
May 16 01:34:37 kernel: tsc: Marking TSC unstable due to clocksource watchdog
May 16 01:34:37 kernel: clocksource:                       'tsc' cs_now: 12c3278f19a8 cs_last: 12c2fc50c404 mask: ffffffffffffffff
May 16 01:34:37 kernel: clocksource:                       'hpet' wd_now: 178f997f wd_last: 17376bea mask: ffffffff
May 16 01:34:37 kernel: clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
May 16 01:34:36 kernel: pcieport 0000:02:04.0: can't change power state from D3cold to D0 (config space inaccessible)
May 16 01:34:35 kernel: [drm] Fence fallback timer expired on ring gfx
May 16 00:57:37 plasmashell[593]: libkcups: Renew-Subscription last error: 0 successful-ok
May 16 00:22:02 kded5[521]: ktp-kded-module: plugin queue activation: "xa" ""
May 16 00:22:02 kded5[521]: ktp-kded-module: "auto-away" presence change request: "xa" ""

I am starting to believe this issue might be related to amdgpu. These kernel panics always occur while I'm using the system but they're only warnings, but hours before the system will lockup I have repeats of these:

May 15 23:59:13 kernel: ------------[ cut here ]------------
May 15 23:59:13 kernel: ---[ end trace 6da776008c9610c6 ]---
May 15 23:59:13 kernel: R13: 0000000000000001 R14: 0000564a06ad3760 R15: 0000564a06abc7a0
May 15 23:59:13 kernel: R10: 0000000000000002 R11: 0000000000000246 R12: 00007fb42671197d
May 15 23:59:13 kernel: RBP: 00007fb423cb7010 R08: 0000564a06ad84b0 R09: 000000000099bd90
May 15 23:59:13 kernel: RDX: 00007fb42671197d RSI: 000000000099bd81 RDI: 00007fb423cb7010
May 15 23:59:13 kernel: RAX: ffffffffffffffda RBX: 0000564a06abc7a0 RCX: 00007fb426a7b6ce
May 15 23:59:13 kernel: RSP: 002b:00007ffefea62838 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
May 15 23:59:13 kernel: Code: 48 8b 0d c5 f7 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 92 f7 0b 00 f7 d8 64 89 01 48
May 15 23:59:13 kernel: RIP: 0033:0x7fb426a7b6ce
May 15 23:59:13 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 15 23:59:13 kernel:  do_syscall_64+0x49/0x90
May 15 23:59:13 kernel:  __do_sys_init_module+0x172/0x1a0
May 15 23:59:13 kernel:  load_module+0x2137/0x23a0
May 15 23:59:13 kernel:  do_init_module+0x5c/0x260
May 15 23:59:13 kernel:  do_one_initcall+0x59/0x240
May 15 23:59:13 kernel:  ? 0xffffffffc0e36000
May 15 23:59:13 kernel:  driver_register+0x8b/0xe0
May 15 23:59:13 kernel:  bus_add_driver+0x12b/0x1e0
May 15 23:59:13 kernel:  bus_for_each_dev+0x89/0xd0
May 15 23:59:13 kernel:  ? device_driver_attach+0xb0/0xb0
May 15 23:59:13 kernel:  ? device_driver_attach+0xb0/0xb0
May 15 23:59:13 kernel:  __driver_attach+0x8a/0x150
May 15 23:59:13 kernel:  device_driver_attach+0xa1/0xb0
May 15 23:59:13 kernel:  driver_probe_device+0xb6/0x100
May 15 23:59:13 kernel:  really_probe+0x167/0x410
May 15 23:59:13 kernel:  pci_device_probe+0xfa/0x1b0
May 15 23:59:13 kernel:  ? pci_match_device+0xd7/0x100
May 15 23:59:13 kernel:  local_pci_probe+0x42/0x80
May 15 23:59:13 kernel:  ? __pm_runtime_resume+0x49/0x60
May 15 23:59:13 kernel:  amdgpu_pci_probe+0xec/0x150 [amdgpu]
May 15 23:59:13 kernel:  drm_dev_register+0x110/0x150 [drm]
May 15 23:59:13 kernel:  amdgpu_driver_load_kms+0x5c/0x1e0 [amdgpu]
May 15 23:59:13 kernel:  amdgpu_device_init.cold+0x1419/0x19c3 [amdgpu]
May 15 23:59:13 kernel:  amdgpu_fbdev_init+0xbc/0xf0 [amdgpu]
May 15 23:59:13 kernel:  __drm_fb_helper_initial_config_and_unlock+0x335/0x4b0 [drm_kms_helper]
May 15 23:59:13 kernel:  register_framebuffer+0x1f6/0x310
May 15 23:59:13 kernel:  do_fbcon_takeover+0x5c/0xc0
May 15 23:59:13 kernel:  do_take_over_console+0x116/0x180
May 15 23:59:13 kernel:  do_bind_con_driver.isra.0+0x1da/0x2e0
May 15 23:59:13 kernel:  visual_init+0xce/0x130
May 15 23:59:13 kernel:  fbcon_init+0x2b2/0x5e0
May 15 23:59:13 kernel:  drm_fb_helper_set_par+0x30/0x40 [drm_kms_helper]
May 15 23:59:13 kernel:  drm_fb_helper_restore_fbdev_mode_unlocked+0x49/0x90 [drm_kms_helper]
May 15 23:59:13 kernel:  ? set_inverse_trans_unicode.constprop.0+0xcc/0xf0
May 15 23:59:13 kernel:  drm_client_modeset_commit_force+0x54/0x150 [drm]
May 15 23:59:13 kernel:  drm_client_modeset_commit_atomic+0x1e1/0x220 [drm]
May 15 23:59:13 kernel:  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
May 15 23:59:13 kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
May 15 23:59:13 kernel:  amdgpu_dm_atomic_commit_tail+0x38c/0x2390 [amdgpu]
May 15 23:59:13 kernel:  dc_commit_state+0x2fc/0x840 [amdgpu]
May 15 23:59:13 kernel:  dce110_apply_ctx_to_hw+0x52d/0x570 [amdgpu]
May 15 23:59:13 kernel:  core_link_enable_stream+0x74d/0x790 [amdgpu]
May 15 23:59:13 kernel: Call Trace:
May 15 23:59:13 kernel: CR2: 000055a685da5020 CR3: 0000000409bfe000 CR4: 00000000003406e0
May 15 23:59:13 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 15 23:59:13 kernel: FS:  00007fb425b12a80(0000) GS:ffff8bef508c0000(0000) knlGS:0000000000000000
May 15 23:59:13 kernel: R13: 0000000000000000 R14: ffff8bef3d2e01b8 R15: ffffaca1006170fe
May 15 23:59:13 kernel: R10: ffff8bef446118c0 R11: 0000000000000010 R12: 000000000000005d
May 15 23:59:13 kernel: RBP: ffffaca100617145 R08: 0000000000000000 R09: ffffaca100616e38
May 15 23:59:13 kernel: RDX: 000000000032a403 RSI: a897e237d199855a RDI: 0000000000032080
May 15 23:59:13 kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: 000000000032a603
May 15 23:59:13 kernel: RSP: 0018:ffffaca1006170e8 EFLAGS: 00010246
May 15 23:59:13 kernel: Code: 48 8d 54 24 14 48 8b 40 08 48 8b b8 10 01 00 00 e8 08 eb 00 00 84 c0 74 2d 0f b6 44 24 15 e9 7d fe ff ff 0f 0b e9 ec fc ff ff <0f> 0b e9 e8 fd ff ff 0f 0b e9 44 fe ff ff 0f 0b e9 59 ff ff ff 0f
May 15 23:59:13 kernel: RIP: 0010:write_i2c_retimer_setting+0x3dd/0x410 [amdgpu]
May 15 23:59:13 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P3.90 12/09/2019
May 15 23:59:13 kernel: CPU: 3 PID: 325 Comm: systemd-udevd Tainted: G        W         5.6.12-arch1-1 #1
May 15 23:59:13 kernel: Modules linked in: input_leds mousedev joydev rfkill hid_generic usbhid amdgpu(+) hid edac_mce_amd nls_iso8859_1 nls_cp437 snd_hda_codec_realtek kvm snd_hda_codec_generic vfat fat ledtrig_audio snd_hda_codec_hdmi gpu_sched irqbypass snd_hda_intel i2c_algo_bit snd_intel_dspcfg ttm wmi_bmof snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core sp5100_tco snd_hwdep aesni_intel drm_kms_helper crypto_simd r8169 cryptd glue_helper pcspkr k10temp snd_pcm realtek i2c_piix4 ccp snd_timer cec libphy rng_core snd rc_core syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore wmi evdev mac_hid gpio_amdpt pinctrl_amd acpi_cpufreq drm usbip_host usbip_core crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_hcd
May 15 23:59:13 kernel: WARNING: CPU: 3 PID: 325 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:1756 write_i2c_retimer_setting+0x3dd/0x410 [amdgpu]

The elevated logging level has given me more information however I need to determine the relevancy. Will continue to investigate further.

Found the kernel doc page on PCI, time to delve into this to become a PCI wizard (hopefully).

Last edited by ricky (2020-05-16 13:00:04)

Offline

#9 2020-05-16 12:47:58

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

$ lspci -tv

should help to see what devices are using those PCIe ports (run as normal user) .

Also please post a full journal from the boot after you made those changes (you may want to use a pastebin client )


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#10 2020-05-16 13:09:46

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Lone_Wolf wrote:
$ lspci -tv

should help to see what devices are using those PCIe ports (run as normal user) .

Also please post a full journal from the boot after you made those changes (you may want to use a pastebin client )

Here's a pastebin of the full log.

I have added radeon.dpm=0 to my kernel command line to disable dynamic power management.

$ lspci -tv

yields the following:

-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
           +-01.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-01.2-[01-05]--+-00.0  Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller
           |               +-00.1  Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller
           |               \-00.2-[02-05]--+-00.0-[03]--
           |                               +-01.0-[04]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           |                               \-04.0-[05]--
           +-01.6-[06]----00.0  Sandisk Corp Device 5006
           +-08.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
           +-08.1-[07]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Picasso
           |            +-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
           |            +-00.3  Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
           |            +-00.4  Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
           |            \-00.6  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
           +-08.2-[08]----00.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-18.0  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
           +-18.1  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
           +-18.2  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
           +-18.3  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
           +-18.4  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
           +-18.5  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
           +-18.6  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
           \-18.7  Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7

Oddly there is nothing listed at 02:00 and 02:04...

Last edited by ricky (2020-05-16 13:16:04)

Offline

#11 2020-05-16 13:44:45

judd1
Member
Registered: 2015-09-04
Posts: 260

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

pci=nomsi,noaer

noaer [PCIE] If the PCIEAER kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of PCIE advanced error reporting.

nomsi [MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.

Source → https://www.kernel.org/doc/html/latest/ ... eters.html


This isn't right. This isn't even wrong.
-- Wolfgang Pauli --

Offline

#12 2020-05-16 13:56:00

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

judd1 wrote:

pci=nomsi,noaer

noaer [PCIE] If the PCIEAER kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of PCIE advanced error reporting.

nomsi [MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.

Source → https://www.kernel.org/doc/html/latest/ ... eters.html

May I ask what the motivation is behind setting these parameters? i.e. What in my logs led you to believe that these would resolve my issue?

Is this issue as simple as I am missing some drivers for these PCIe devices?

Last edited by ricky (2020-05-16 14:08:04)

Offline

#13 2020-05-16 14:18:02

judd1
Member
Registered: 2015-09-04
Posts: 260

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

ricky wrote:

Is this issue as simple as I am missing some drivers for these PCIe devices?

I do not know, I thought those parameters in the kernel line could help, just that.

Last edited by judd1 (2020-05-16 14:19:49)


This isn't right. This isn't even wrong.
-- Wolfgang Pauli --

Offline

#14 2020-05-16 14:57:54

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

judd1 wrote:
ricky wrote:

Is this issue as simple as I am missing some drivers for these PCIe devices?

I do not know, I thought those parameters in the kernel line could help, just that.

I guess I am just confused how disabling Advanced Error Reporting and MSI interrupts could help me determine this. Apologies if I am missing something obvious. Nonetheless, I am currently running with these parameters configured.

Last edited by ricky (2020-05-16 15:04:10)

Offline

#15 2020-05-16 15:24:15

judd1
Member
Registered: 2015-09-04
Posts: 260

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

ricky wrote:

... how disabling Advanced Error Reporting and MSI interrupts

https://www.kernel.org/doc/html/latest/ … howto.html


This isn't right. This isn't even wrong.
-- Wolfgang Pauli --

Offline

#16 2020-05-16 21:08:51

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Unfortunately those kernel parameters did not provide any additional information during the latest crash, as I would have expected.

My understanding, is that I need to recompile the kernel with CONFIG_PCIEPORTBUS=y and CONFIG_PCIEAER = y to enable AER.
I exported my kernel config and both of these parameters are set, so AER should be enabled currently.

What additional steps should be taken to debug the PCI bus?

Last edited by ricky (2020-05-16 23:47:43)

Offline

#17 2020-05-17 20:07:31

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

May 15 23:59:12 kernel: pci 0000:02:00.0: [1022:43c7] type 01 class 0x060400
May 15 23:59:12 kernel: pci 0000:02:00.0: enabling Extended Tags
May 15 23:59:12 kernel: pci 0000:02:00.0: PME# supported from D3hot D3cold
May 15 23:59:12 kernel: pci 0000:02:01.0: [1022:43c7] type 01 class 0x060400
May 15 23:59:12 kernel: pci 0000:02:01.0: enabling Extended Tags
May 15 23:59:12 kernel: pci 0000:02:01.0: PME# supported from D3hot D3cold
May 15 23:59:12 kernel: pci 0000:02:04.0: [1022:43c7] type 01 class 0x060400
May 15 23:59:12 kernel: pci 0000:02:04.0: enabling Extended Tags
May 15 23:59:12 kernel: pci 0000:02:04.0: PME# supported from D3hot D3cold

according to https://pci-ids.ucw.cz/read/PC/1022 this gives

43c7	400 Series Chipset PCIe Port

.
It does appear nothing is connected to those ports.

You did try booting with pcie_aspm=off already I think ?

The log mentions iommu related issues several times.
I'm thinking the issue originates at low level, so firmware settings or kernel changes come to mind as possible causes.

Has the system worked reliably under archlinux with older kernels / firmwares ?


In the firmware
- advanced / AMD CBS/ IOMMU
It's probably disabled now, try enabling it.

- boot screen / CSM
I hope your performing an EFI boot ?
If yes, disable CSM.
if not, I need to know

Last edited by Lone_Wolf (2020-05-17 20:08:34)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#18 2020-05-18 22:05:26

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Thanks for the reply.

  • New hardware, Arch is the first OS I have run on it.

  • UEFI. I have disabled CSM in the BIOS.

  • I have enabled IOMMU in the BIOS, cleans up the IOMMU errors from the Kernel log.

  • There are posts on various sources of people having stability/freezing issues on Ryzen 3 3200G, with Vega 8 integrated graphics running on Manjaro and Arch. The processor was released a year ago, but it is possible there is an underlying kernel issue at hand here. If so, I need to learn more about the Kernel and try and detail an exact cause, in hopes to contribute this info back to have support added in the kernel.

  • I have added pcie_aspm=off back to my kernel command line, but you are correct I had a hang with it before.

Promising excerpt from the kernel log:

$ journalctl --no-hostname -b 0 | grep iommu
May 18 17:57:42 kernel: iommu: Default domain type: Translated 
May 18 17:57:42 kernel: pci 0000:00:01.0: Adding to iommu group 0
May 18 17:57:42 kernel: pci 0000:00:01.2: Adding to iommu group 1
May 18 17:57:42 kernel: pci 0000:00:01.6: Adding to iommu group 2
May 18 17:57:42 kernel: pci 0000:00:08.0: Adding to iommu group 3
May 18 17:57:42 kernel: pci 0000:00:08.1: Adding to iommu group 4
May 18 17:57:42 kernel: pci 0000:00:08.2: Adding to iommu group 5
May 18 17:57:42 kernel: pci 0000:00:14.0: Adding to iommu group 6
May 18 17:57:42 kernel: pci 0000:00:14.3: Adding to iommu group 6
May 18 17:57:42 kernel: pci 0000:00:18.0: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.1: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.2: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.3: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.4: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.5: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.6: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:00:18.7: Adding to iommu group 7
May 18 17:57:42 kernel: pci 0000:01:00.0: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:01:00.1: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:01:00.2: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:02:00.0: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:02:01.0: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:02:04.0: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:04:00.0: Adding to iommu group 8
May 18 17:57:42 kernel: pci 0000:06:00.0: Adding to iommu group 9
May 18 17:57:42 kernel: pci 0000:07:00.0: Adding to iommu group 10
May 18 17:57:42 kernel: pci 0000:07:00.0: Using iommu direct mapping
May 18 17:57:42 kernel: pci 0000:07:00.1: Adding to iommu group 11
May 18 17:57:42 kernel: pci 0000:07:00.2: Adding to iommu group 11
May 18 17:57:42 kernel: pci 0000:07:00.3: Adding to iommu group 11
May 18 17:57:42 kernel: pci 0000:07:00.4: Adding to iommu group 11
May 18 17:57:42 kernel: pci 0000:07:00.6: Adding to iommu group 11
May 18 17:57:42 kernel: pci 0000:08:00.0: Adding to iommu group 12

Hopefully the system is still alive tomorrow AM! Thanks all for the help thus far.

Offline

#19 2020-05-19 07:25:42

mrlamud
Member
Registered: 2014-09-27
Posts: 104

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Please report back.
I'm going to jump on the same CPU after 6 years of i5-4590's comfort zone.

Offline

#20 2020-05-19 13:48:42

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

No dice, crashed again. Same output as always.

View full log here.

I enabled a few settings in the BIOS to enable virtualization, and enable virtualization for PCIe.. (SR-IOV) was one of them I believe. This eliminated the kvm error messages from the kernel log on boot. Will report back.

Last edited by ricky (2020-05-19 13:58:58)

Offline

#21 2020-05-19 16:52:38

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Ran about 15 minutes before crashing, same result as always.

➜  ~ journalctl --no-hostname -b -1 | tail
                                                        Primary: false
May 19 11:00:45 kscreen_backend_launcher[688]: kscreen.xrandr: Output 84 : connected = false , enabled = false
May 19 11:00:45 kded5[514]: kscreen.kded: Config does not have at least one screen enabled, WILL NOT save this config, this is not what user wants.
May 19 11:00:45 kscreen_backend_launcher[688]: kscreen.xrandr: Emitting configChanged()
May 19 11:00:46 kded5[514]: kscreen.kded: Config does not have at least one screen enabled, WILL NOT save this config, this is not what user wants.
May 19 11:00:46 kernel: usb 1-5: USB disconnect, device number 2
May 19 11:00:46 kernel: xhci_hcd 0000:01:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
May 19 11:01:27 kernel: usb 1-6: USB disconnect, device number 3
May 19 11:03:57 kernel: pcieport 0000:02:04.0: can't change power state from D3cold to D0 (config space inaccessible)
May 19 11:03:59 kernel: pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)
  1. Should I be looking towards the amdgpu warnings in the logs?

  2. Is there a way I can tell the kernel to ignore those PCIe ports that are not connected to anything?

  3. Is there a way for me to determine why the kernel is trying to modify the power state of these non-existent PCIe ports?

I would assume my next step would be to debug the kernel is some fashion. This will be new for me but excited to learn!

Last edited by ricky (2020-05-19 16:55:49)

Offline

#22 2020-05-19 18:21:18

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Been watching this thread for a bit and thought it might be time for me to chime in - I think the pcieport errors are a red herring and agree that the hangs are more likely due to ryzen and/or amdgpu issues. (Granted I'm on an intel system)

FWIW I also have the same pcieport can't change power state from D3cold errors

May 18 20:32:11 lap kernel: xhci_hcd 0000:3c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
May 18 20:32:11 lap kernel: xhci_hcd 0000:3c:00.0: can't change power state from D3hot to D0 (config space inaccessible)
May 18 20:32:11 lap kernel: xhci_hcd 0000:3c:00.0: PCI post-resume error -19!
May 18 20:32:11 lap kernel: xhci_hcd 0000:3c:00.0: HC died; cleaning up
May 18 20:32:16 lap kernel: pcieport 0000:05:00.0: can't change power state from D3cold to D0 (config space inaccessible)

checking my logs it looks like I started seeing that error on 2020-02-03. That was the same day I updated from linux 5.4.15 to 5.5.1 so I think it's related to some changes there of which there were a lot.

grep '^\[2020\-02\-03' /var/log/pacman.log | grep -e upgraded -e installed                                                                                                                 
[2020-02-03T17:30:49-0500] [ALPM] upgraded cryptsetup (2.2.2-1 -> 2.3.0-1)                                                                                                                     
[2020-02-03T17:30:49-0500] [ALPM] upgraded libsecret (0.20.0-1 -> 0.20.1-1)                                                                                                                    
[2020-02-03T17:30:50-0500] [ALPM] upgraded linux (5.4.15.arch1-1 -> 5.5.1.arch1-1)                                                                                                             
[2020-02-03T17:30:51-0500] [ALPM] upgraded linux-firmware (20191220.6871bff-1 -> 20200122.1eb2408-1)                                                                                           [2020-02-03T17:30:54-0500] [ALPM] upgraded linux-headers (5.4.15.arch1-1 -> 5.5.1.arch1-1)                                                                                                     
[2020-02-03T21:34:02-0500] [ALPM] upgraded linux-drm-tip-git (5.6.891985.727605cdef77-1 -> 5.6.891996.b7b84e9f2d00-1)                                                                          [2020-02-03T21:34:05-0500] [ALPM] upgraded linux-drm-tip-git-docs (5.6.891985.727605cdef77-1 -> 5.6.891996.b7b84e9f2d00-1)                                                                     [2020-02-03T21:34:10-0500] [ALPM] upgraded linux-drm-tip-git-headers (5.6.891985.727605cdef77-1 -> 5.6.891996.b7b84e9f2d00-1)

In my case I think both devices reporting errors are thunderbolt subsystem?

 $ journalctl -b | grep 'pci 0000:05'                                                                                                                                                         
May 18 20:31:50 lap kernel: pci 0000:05:00.0: [8086:1576] type 01 class 0x060400
...
$ journalctl -b | grep 'pci 0000:3c'                                                                                                                                                         
May 18 20:31:50 lap kernel: pci 0000:3c:00.0: [8086:15b5] type 00 class 0x0c0330

Even with the errors I've had no issues with system hangs or the like as you've reported.

Last edited by CarbonChauvinist (2020-05-19 18:29:22)


"the wind-blown way, wanna win? don't play"

Offline

#23 2020-05-20 13:07:47

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Looks like it's time to check if other kernel versions have the same issue.

linux-lts is an option, also https://aur.archlinux.org/packages/linux-amd/ and https://aur.archlinux.org/packages/linu … -next-git/ .

They can all coexist together but you'll have to configure your bootloader to choose between them.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#24 2020-05-20 14:00:39

theDOC
Member
From: Aachen, Germany
Registered: 2009-06-18
Posts: 50

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

My system was very unstable, too. It crashed very often when idle until I discovered that there is something broken with powersaving c-states. You could try booting with:

processor.max_cstate=5

This "fixed" it for me. I Have a Ryzen 7 1700X cpu, but maybe your system has the same issues.

Offline

#25 2020-05-22 10:53:16

ricky
Member
Registered: 2020-04-27
Posts: 28

Re: [SOLVED] Ryzen 3 3200G, AsRock B450M Pro4 system crashes

Yesterday I set

processor.max_cstate=5

in my kernel command line.

For the first time ever, my computer was still on and usable (writing this post) after running all night long! Insert Lionel Richie.

Hopefully I can spend some time using the computer this weekend, and further testing its abilities. So far the only thing it has done is idled.

I am hard pressed to call this "solved", because I still have a boat load of kernel panics with amdgpu/dc, and I feel like a formal bug report needs to be filed to the kernel to increase support for this processor.

I will report back after the weekend.

Thanks to all who have provided help and insight!

Offline

Board footer

Powered by FluxBB