You are not logged in.
Pages: 1
After upgrading yesterday I started seeing these errors in dmesg
[ 8877.164418] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[ 8877.164424] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 8877.164428] pcieport 0000:00:1c.0: device [8086:a110] error status/mask=00001000/00002000
[ 8877.164430] pcieport 0000:00:1c.0: [12] Replay Timer Timeout Being a bit of an Arch newbie I'm struggling to find out what is going on, but after a bit of searching I think I understand what device it's complaining about:
$ lspci -kvvvt
-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
+-02.0 Intel Corporation Device 591b
+-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
+-14.0 Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller
+-14.2 Intel Corporation Sunrise Point-H Thermal subsystem
+-15.0 Intel Corporation Sunrise Point-H Serial IO I2C Controller #0
+-15.1 Intel Corporation Sunrise Point-H Serial IO I2C Controller #1
+-16.0 Intel Corporation Sunrise Point-H CSME HECI #1
+-17.0 Intel Corporation Sunrise Point-H SATA Controller [AHCI mode]
+-1c.0-[02]----00.0 Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
+-1c.1-[03]----00.0 Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader
+-1d.0-[04]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
+-1d.4-[05]--
+-1d.6-[06-3e]--
+-1f.0 Intel Corporation Sunrise Point-H LPC Controller
+-1f.2 Intel Corporation Sunrise Point-H PMC
+-1f.3 Intel Corporation CM238 HD Audio Controller
\-1f.4 Intel Corporation Sunrise Point-H SMBus$ sudo lshw
<...>
*-pci:1
description: PCI bridge
product: Sunrise Point-H PCI Express Root Port #1
vendor: Intel Corporation
physical id: 1c
bus info: pci@0000:00:1c.0
version: f1
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:123 memory:ed200000-ed3fffff
*-network
description: Wireless interface
product: QCA6174 802.11ac Wireless Network Adapter
vendor: Qualcomm Atheros
physical id: 0
bus info: pci@0000:02:00.0
logical name: wlp2s0
version: 32
serial: 9c:b6:d0:f6:1a:a3
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless
configuration: broadcast=yes driver=ath10k_pci driverversion=4.18.5-arch1-1-ARCH firmware=WLAN.RM.4.4.1-00079-QCARMSWPZ-1 ip=192.168.0.100 latency=0 link=yes multicast=yes wireless=IEEE 802.11
resources: irq:141 memory:ed200000-ed3fffff
<...>... but that is as far as I can come it seems.
If this is correct, I can say I can't notice any problems with my WiFi and there seems to be no interruptions while pinging on my LAN.
The dmesg error message refer to "device [8086:a110]" which I don't know how to locate, based on the ID. Any help appreciated.
Edit: Perhaps this shows the device. If so then it's not the WiFi card after all.
$ sudo lspci -nnkvvv -d 8086:a110
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #1 [8086:a110] (rev f1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 123
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 0000f000-00000fff [empty]
Memory behind bridge: ed200000-ed3fffff [size=2M]
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [empty]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <16us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #4, PowerLimit 10.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00238 Data: 0000
Capabilities: [90] Subsystem: Dell Sunrise Point-H PCI Express Root Port [1028:07be]
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn+ NFERptEn+ FERptEn+
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
ErrorSrc: ERR_COR: 00e0 ERR_FATAL/NONFATAL: 0000
Capabilities: [140 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [220 v1] Secondary PCI Express <?>
Kernel driver in use: pcieportFinally, I don't know how to resolve this error. Anyone that can point me in a direction would be much appreciated.
Thanks,
Last edited by yaogen (2018-08-27 11:09:43)
Offline
How often is this occuring? A single one off thing or so? Might be some slight incompat with power management or so, however "Corrected" errors mean the hardware managed to recover. Does this only occur on a 4.18 kernel? You might disable pci power management with the kernel parameter pcie_aspm=off' to check if this "fixes" the issue. FWIW to get some more context could you share a complete dmesg?
Offline
Hi and thanks for your reply.
I didn't see this before upgrading to 4.18 (from 4.17) but I can't swear it didn't occur before.
I've pasted the dmesg log here
FWIW I'm on a Dell XPS 15 9560 laptop.
I did not yet try disabling the pci power management (will get back on that one).
Did some testing and added pcie_aspm=off to the boot parameters and rebooted. The messages doesn't appear anymore but I was curious to see if the reappeared with pcie_aspm=on. But they didn't. As it stands now, I don't know why these messages appeared. Edit: The messages reappeared after 1588 sec.
I am tinkering with PCI passthrough to QEMU and have installed vfio, libvirt etc etc -- perhaps that could be a source?
Last edited by yaogen (2018-08-28 16:09:56)
Offline
Pages: 1