You are not logged in.

#1 2017-12-23 11:52:24

mouseman
Member
From: Outta nowhere
Registered: 2014-04-04
Posts: 291

[solved] NVMe PCIe errors

Searching the net I come across bugs from Fedora from 2015 closed without a solution, to askubuntu articles without an answer. Hopefully someone here can help me.

I installed a PCI-e card for NVMe SSD in my home server. It works, the system boots from it and it seems all fine. The system is stable, never any trouble so far.

I do get the following errors on the console though:

Dec 23 09:03:06 hostname kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=0100
Dec 23 09:03:06 hostname kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0100(Receiver ID)
Dec 23 09:03:06 hostname kernel: nvme 0000:01:00.0:   device [144d:a802] error status/mask=00000001/00006000
Dec 23 09:03:06 hostname kernel: nvme 0000:01:00.0:    [ 0] Receiver Error

Although it says corrected, I am worried. I don't think I should be seeing these at all.

If you have any ideas please share. Thanks!

Last edited by mouseman (2017-12-23 13:53:17)

Offline

#2 2017-12-23 13:05:10

ooo
Member
Registered: 2013-04-10
Posts: 1,638

Re: [solved] NVMe PCIe errors

Corrected AER errors shouldn't be a cause for concern.
From kernel documentation:

3.2.2.1 Correctable errors

Correctable errors pose no impacts on the functionality of
the interface. The PCI Express protocol can recover without any
software intervention or any loss of data. These errors do not
require any recovery actions. The AER driver clears the device's
correctable error status register accordingly and logs these errors.

You can disable AER with pci=noaer kernel boot parameter if you want to get rid of the error messages, although that also disables reporting of non-correctable AER errors.

Offline

#3 2017-12-23 13:53:02

mouseman
Member
From: Outta nowhere
Registered: 2014-04-04
Posts: 291

Re: [solved] NVMe PCIe errors

Thanks for the reply, good to know they are of no consequence.

So, just for my information. Even though the error occurs and is corrected, shouldn't errors not occur at all, or is that perfectly normal?

Offline

#4 2017-12-23 19:32:35

ooo
Member
Registered: 2013-04-10
Posts: 1,638

Re: [solved] NVMe PCIe errors

Frankly, I know nothing about PCI specifications, so I have no clue what can cause these errors. Best explanation of AER I've found is the pciaer-howto.txt from my last post.

However, from what I've gathered, this seems to be very common, yet I've never read about the errors being a symptom of any kind of issue. Also, it seems disabling AER should be safe, and you are still left with basic PCIe error reporting capabilities. AER is just for "advanced" error reporting.

Offline

#5 2023-12-02 03:50:59

dufresnep
Member
Registered: 2008-02-03
Posts: 14

Re: [solved] NVMe PCIe errors

Could you try pcie_aspm=off  "This seems to disable power management mode which is throwing the error."
Taken from:
https://forums.unraid.net/topic/118286- … nt=1165004

Offline

#6 2023-12-02 04:22:31

Mrkd1904
Member
Registered: 2023-11-08
Posts: 51

Re: [solved] NVMe PCIe errors

mouseman wrote:

Searching the net I come across bugs from Fedora from 2015 closed without a solution, to askubuntu articles without an answer. Hopefully someone here can help me.

I installed a PCI-e card for NVMe SSD in my home server. It works, the system boots from it and it seems all fine. The system is stable, never any trouble so far.

I do get the following errors on the console though:

Dec 23 09:03:06 hostname kernel: pcieport 0000:00:1c.0: AER: Corrected error received: id=0100
Dec 23 09:03:06 hostname kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0100(Receiver ID)
Dec 23 09:03:06 hostname kernel: nvme 0000:01:00.0:   device [144d:a802] error status/mask=00000001/00006000
Dec 23 09:03:06 hostname kernel: nvme 0000:01:00.0:    [ 0] Receiver Error

Although it says corrected, I am worried. I don't think I should be seeing these at all.

If you have any ideas please share. Thanks!


Which PCIe card? In my experience lower grade or cheaper NVMe add-in cards can trip the error. As far as I can tell the corrected errors seem to come from an "unclean" or substandard connection from card to port. This is just anecdotal though so take it at face value.

Offline

#7 2023-12-02 04:36:06

OpusOne
Member
Registered: 2023-05-31
Posts: 83

Re: [solved] NVMe PCIe errors

I got the same issue on my machine with a Samsung SSD. And since this error was litteraly flooding dmesg without any other impact, I disabled AER with pci=noaer rather than disable power management for PCIe, which was to me a much worse solution.
Of course, this way I can't get potentially severe AER errors, but I did check that this AER error with this SSD was absolutely the only one I got, so no harm done. If I ever run into an odd problem with some PCIe device, I'll re-enable it momentarily to debug the problem.

This is apparently a relatively "common" problem with some SSD controllers. Mine is the controller on a Samsung 960 Pro, so not a particularly cheap device.
And I have 3 other Samsung SSDs in this machine (with different controllers) which give absolutely no AER errors.

Just like BIOSes, some PCIe controllers don't fully implement the specs, and as a general rule, the Linux kernel tries to implement specs to the letter.

Offline

#8 2023-12-02 08:12:11

seth
Member
Registered: 2012-09-03
Posts: 51,165

Re: [solved] NVMe PCIe errors

https://wiki.archlinux.org/title/Solid_ … ST_support in case it results in more than correctable bus errors.

Offline

Board footer

Powered by FluxBB