You are not logged in.

#1 2021-05-10 03:07:59

jamdox
Member
Registered: 2015-05-02
Posts: 46

pci=nomsi kernel flag, AER NVME drive errors, and "system" use

So I have AER enabled on my X570 motherboard along with IOMMU and other such options for enabling VFIO (which I will use for a VM any day now).

The AER setting appears to cause journalctl to be spammed with "correctable" PCIe errors from my NVME drive.  Since I dislike log spam, I dug around and found some references to the "pci=nomsi" kernel flag, which I have tried.

This was effective at ending the AER errors, however my "system" stat (as from top) is now running at ~4 all the time, the system won't sleep automatically, and libinput is spamming journalctl with its charming "your system is too slow" messages (though things seem to be running ok).

I'm curious if there's a way to resolve the AER error spam without doing whatever the nomsi flag is doing, or if there is something wrong with nomsi.  Thanks in advance.

Offline

#2 2021-05-10 03:58:53

loqs
Member
Registered: 2014-03-06
Posts: 17,373

Re: pci=nomsi kernel flag, AER NVME drive errors, and "system" use

Offline

#3 2021-05-10 04:07:06

jamdox
Member
Registered: 2015-05-02
Posts: 46

Re: pci=nomsi kernel flag, AER NVME drive errors, and "system" use

loqs wrote:

Have you tried "pci=noaer"?

I'm concerned that that will interfere with whatever enabling AER in UEFI was intended to do.

Offline

#4 2021-05-10 04:12:05

loqs
Member
Registered: 2014-03-06
Posts: 17,373

Re: pci=nomsi kernel flag, AER NVME drive errors, and "system" use

What is it supposed to do?
Edit:
The kernel disables AER when MSI is disabled so you have it disabled anyway:
https://github.com/torvalds/linux/blob/ … aer.c#L114
https://github.com/torvalds/linux/blob/ … er.c#L1451

Last edited by loqs (2021-05-10 04:20:11)

Offline

#5 2021-05-10 04:23:34

jamdox
Member
Registered: 2015-05-02
Posts: 46

Re: pci=nomsi kernel flag, AER NVME drive errors, and "system" use

Setting AER to enabled is generally listed in VFIO guides for X570 motherboards.  I'm not sure if it is needed to enable PCIe passthrough or just helps with the IOMMU groups.  But it's good to know that nomsi is no panacea.

Anyway, this is the error that gets spammed:

May 09 21:22:09 hostname kernel: pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0
May 09 21:22:09 hostname kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
May 09 21:22:09 hostname kernel: nvme 0000:01:00.0:   device [xxxx:xxxx] error status/mask=00000001/0000e000
May 09 21:22:09 hostname kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)

Last edited by jamdox (2021-05-10 04:28:33)

Offline

#6 2021-05-10 22:31:46

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: pci=nomsi kernel flag, AER NVME drive errors, and "system" use

"pcie_aspm=off" on the kernel command line fixes PCI AER error events here for me on my system.

My system here is PCIe 3.0, I have no PCIe 4.0 hardware. It's an X470 board and R7 2700X CPU, with a GPU and NVMe drive and WiFi-card in the board's PCIe slots. The AER events I got were generally for the GPU slot. I could find an example message from the last time I tried removing "pcie_aspm=off" from the kernel command line:

kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:0e:00.0
kernel: amdgpu 0000:0e:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
kernel: amdgpu 0000:0e:00.0:   device [1002:67df] error status/mask=00000040/00002000
kernel: amdgpu 0000:0e:00.0:    [ 6] BadTLP                

If you can't find a way to fix the AER events, I guess the thing to do is to just disable AER. I think AER is just the error reporting feature, it's not adding anything besides being able to get those log messages.

Are all the messages you got "corrected" errors or were there also some "uncorrected" ones?

Offline

#7 2021-05-10 23:39:09

jamdox
Member
Registered: 2015-05-02
Posts: 46

Re: pci=nomsi kernel flag, AER NVME drive errors, and "system" use

"pcie_aspm=off" appears to have worked!  Thanks!

I have not noticed any uncorrected errors, fortunately.

Offline

Board footer

Powered by FluxBB