You are not logged in.

#1 2021-03-09 12:50:58

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,911

ASPM AER: disable for specific device

I have booted with pcie_aspm=off for  a long time due to aspm issues with my videocard.
Recent improvements in the amdgpu kernel driver allow the videocard to function properly with aspm and this reduces power drain and heat.

Another device in my system doesn't work well with aspm however.

This is the first occurence in dmesg of the error (full log at [1] )

[di mrt  9 11:57:30 2021] pcieport 0000:00:01.1: AER: Corrected error received: 0000:00:00.0
[di mrt  9 11:57:30 2021] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
[di mrt  9 11:57:30 2021] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[di mrt  9 11:57:30 2021] pcieport 0000:00:01.1:    [ 6] BadTLP

device info (full at [2] )

$ lspci -s 00:01.1 -kvn
00:01.1 0604: 1022:1453 (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 28, NUMA node 0, IOMMU group 1
        Bus: primary=00, secondary=01, subordinate=07, sec-latency=0
        I/O behind bridge: 00001000-00002fff [size=8K]
        Memory behind bridge: ba000000-ba3fffff [size=4M]
        Prefetchable memory behind bridge: [disabled]
        Capabilities: <access denied>
        Kernel driver in use: pcieport
$ 

That controller has some rather important parts connected to it

$ lspci -s :00:01.1 -tv
0000:00:01.1-[01-07]--+-00.0  Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller
                      +-00.1  Advanced Micro Devices, Inc. [AMD] X399 Series Chipset SATA Controller
                      \-00.2-[02-07]--+-00.0-[03]--
                                      +-04.0-[04]----00.0  Intel Corporation I211 Gigabit Network Connection
                                      +-05.0-[05]----00.0  Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak]
                                      +-06.0-[06]----00.0  Intel Corporation I211 Gigabit Network Connection
                                      \-07.0-[07]--
$ 

A phoronix article[3] lead me to [4] and [5] .

c/p from [6] :

What:		/sys/bus/pci/devices/.../link/clkpm
		/sys/bus/pci/devices/.../link/l0s_aspm
		/sys/bus/pci/devices/.../link/l1_aspm
		/sys/bus/pci/devices/.../link/l1_1_aspm
		/sys/bus/pci/devices/.../link/l1_2_aspm
		/sys/bus/pci/devices/.../link/l1_1_pcipm
		/sys/bus/pci/devices/.../link/l1_2_pcipm
Date:		October 2019
Contact:	Heiner Kallweit <hkallweit1@gmail.com>
Description:	If ASPM is supported for an endpoint, these files can be
		used to disable or enable the individual power management
		states. Write y/1/on to enable, n/0/off to disable.

So there does appear to be a method to disable aspm for specific devices as a (root) user.
/sys/bus/pci/devices/0000:00:01.1/ does have a link folder, but it's empty .

Does anybody have an idea what commands are needed to disable aspm for just that one troublesome device ?



[1] http://ix.io/2SeK
[2] http://ix.io/2SeQ
[3] https://www.phoronix.com/scan.php?page= … Knob-Sysfs
[4] https://lkml.org/lkml/2019/12/2/670
[5] https://patchwork.kernel.org/project/li … gmail.com/
[6] https://git.kernel.org/pub/scm/linux/ke … ?h=v5.11.5


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#2 2021-03-15 18:44:19

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,911

Re: ASPM AER: disable for specific device

# echo n > /sys/bus/pci/devices/0000\:00\:01.1/link/l1_aspm
-bash: /sys/bus/pci/devices/0000:00:01.1/link/l1_aspm: Permission denied
# echo n > /sys/bus/pci/devices/0000\:00\:01.1/link/l1_aspm
-bash: /sys/bus/pci/devices/0000:00:01.1/link/l1_aspm: Permission denied
#

trying to add to the empty link folder fails, I searched whether testing ABI required to setup something but couldn't find instructions for that.

# find /sys/bus/pci/devices/0000\:00\:01.1/ -name *aspm
/sys/bus/pci/devices/0000:00:01.1/0000:01:00.2/0000:02:05.0/0000:05:00.0/link/l1_1_aspm
/sys/bus/pci/devices/0000:00:01.1/0000:01:00.2/0000:02:05.0/0000:05:00.0/link/l1_aspm
/sys/bus/pci/devices/0000:00:01.1/0000:01:00.2/link/l1_1_aspm
/sys/bus/pci/devices/0000:00:01.1/0000:01:00.0/link/l1_1_aspm
/sys/bus/pci/devices/0000:00:01.1/0000:01:00.1/link/l1_1_aspm
# 

The find command shows there are devices with those *aspm 'files' .

The first 2 have to do with wireless, while the last 3 are one level higher in the lspci -tv output .

The aer message doesn't give details, guess i'll have to experiment to find which of those devices are causing the errors.

# echo n > /sys/bus/pci/devices/0000\:00\:01.1/0000\:01\:00.0/link/l1_1_aspm

does not give an error.

maybe a bit less errors, adding wireless (using wired connection anyway).

# echo n > /sys/bus/pci/devices/0000:00:01.1/0000:01:00.2/0000:02:05.0/0000:05:00.0/link/l1_1_aspm
# echo n > /sys/bus/pci/devices/0000:00:01.1/0000:01:00.2/0000:02:05.0/0000:05:00.0/link/l1_aspm

Edit :
Even disabling all 5 entries doesn't get rid of the errors, looks like I need to search further.

Last edited by Lone_Wolf (2021-03-15 20:52:38)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#3 2021-05-26 13:53:03

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,911

Re: ASPM AER: disable for specific device

I have not been able to get rid of those errors by disabling specific devices .

Setting amdgpu.aspm=0 did reduce the number of error messages, but only setting pcie_aspm=off did make all of them go away.

Aspm appears to be good at increasing / decreasing the speed of the videocard fan gradually , resulting in low noise / rpm overall .

Without aspm there are occasional bursts where fan rpm and noise goes up a lot for 10+ seconds .
I have not seen big differences in power use though.

Last edited by Lone_Wolf (2021-05-26 13:53:26)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

Board footer

Powered by FluxBB