You are not logged in.

#1 2018-08-27 09:46:55

yaogen
Member
Registered: 2018-08-03
Posts: 4

dmesg PCIe Bus Errors after upgrade

After upgrading yesterday I started seeing these errors in dmesg

[ 8877.164418] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[ 8877.164424] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 8877.164428] pcieport 0000:00:1c.0:   device [8086:a110] error status/mask=00001000/00002000
[ 8877.164430] pcieport 0000:00:1c.0:    [12] Replay Timer Timeout  

Being a bit of an Arch newbie I'm struggling to find out what is going on, but after a bit of searching I think I understand what device it's complaining about:

$ lspci -kvvvt
-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
           +-01.0-[01]----00.0  NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
           +-02.0  Intel Corporation Device 591b
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
           +-14.0  Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller
           +-14.2  Intel Corporation Sunrise Point-H Thermal subsystem
           +-15.0  Intel Corporation Sunrise Point-H Serial IO I2C Controller #0
           +-15.1  Intel Corporation Sunrise Point-H Serial IO I2C Controller #1
           +-16.0  Intel Corporation Sunrise Point-H CSME HECI #1
           +-17.0  Intel Corporation Sunrise Point-H SATA Controller [AHCI mode]
           +-1c.0-[02]----00.0  Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
           +-1c.1-[03]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader
           +-1d.0-[04]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
           +-1d.4-[05]--
           +-1d.6-[06-3e]--
           +-1f.0  Intel Corporation Sunrise Point-H LPC Controller
           +-1f.2  Intel Corporation Sunrise Point-H PMC
           +-1f.3  Intel Corporation CM238 HD Audio Controller
           \-1f.4  Intel Corporation Sunrise Point-H SMBus
$ sudo lshw
<...>
        *-pci:1
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #1
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:123 memory:ed200000-ed3fffff
           *-network
                description: Wireless interface
                product: QCA6174 802.11ac Wireless Network Adapter
                vendor: Qualcomm Atheros
                physical id: 0
                bus info: pci@0000:02:00.0
                logical name: wlp2s0
                version: 32
                serial: 9c:b6:d0:f6:1a:a3
                width: 64 bits
                clock: 33MHz
                capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless
                configuration: broadcast=yes driver=ath10k_pci driverversion=4.18.5-arch1-1-ARCH firmware=WLAN.RM.4.4.1-00079-QCARMSWPZ-1 ip=192.168.0.100 latency=0 link=yes multicast=yes wireless=IEEE 802.11
                resources: irq:141 memory:ed200000-ed3fffff
<...>

... but that is as far as I can come it seems.

If this is correct, I can say I can't notice any problems with my WiFi and there seems to be no interruptions while pinging on my LAN.

The dmesg error message refer to "device [8086:a110]" which I don't know how to locate, based on the ID. Any help appreciated.

Edit: Perhaps this shows the device. If so then it's not the WiFi card after all.

$ sudo lspci -nnkvvv -d 8086:a110
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #1 [8086:a110] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 123
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000f000-00000fff [empty]
	Memory behind bridge: ed200000-ed3fffff [size=2M]
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [empty]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #1, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (downgraded), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #4, PowerLimit 10.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00238  Data: 0000
	Capabilities: [90] Subsystem: Dell Sunrise Point-H PCI Express Root Port [1028:07be]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
		RootCmd: CERptEn+ NFERptEn+ FERptEn+
		RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
			 FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
		ErrorSrc: ERR_COR: 00e0 ERR_FATAL/NONFATAL: 0000
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [220 v1] Secondary PCI Express <?>
	Kernel driver in use: pcieport

Finally, I don't know how to resolve this error. Anyone that can point me in a direction would be much appreciated.

Thanks,

Last edited by yaogen (2018-08-27 11:09:43)

Offline

#2 2018-08-28 08:24:57

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,237

Re: dmesg PCIe Bus Errors after upgrade

How often is this occuring? A single one off thing or so? Might be some slight incompat with power management or so, however "Corrected" errors mean the hardware managed to recover. Does this only occur on a 4.18 kernel? You might disable pci power management with the kernel parameter pcie_aspm=off' to check if this "fixes" the issue. FWIW to get some more context could you share a complete dmesg?

Offline

#3 2018-08-28 11:16:32

yaogen
Member
Registered: 2018-08-03
Posts: 4

Re: dmesg PCIe Bus Errors after upgrade

Hi and thanks for your reply.
I didn't see this before upgrading to 4.18 (from 4.17) but I can't swear it didn't occur before.

I've pasted the dmesg log here

FWIW I'm on a Dell XPS 15 9560 laptop.

I did not yet try disabling the pci power management (will get back on that one).

Did some testing and added pcie_aspm=off to the boot parameters and rebooted. The messages doesn't appear anymore but I was curious to see if the reappeared with pcie_aspm=on. But they didn't. As it stands now, I don't know why these messages appeared. Edit: The messages reappeared after 1588 sec.

I am tinkering with PCI passthrough to QEMU and have installed vfio, libvirt etc etc -- perhaps that could be a source?

Last edited by yaogen (2018-08-28 16:09:56)

Offline

Board footer

Powered by FluxBB