You are not logged in.

#1 2021-11-18 13:09:42

HenriqueHCM
Member
Registered: 2021-11-10
Posts: 15

Arch partially hangs and multiple hardware errors

Good morning,

I'm running Arch 5.14.16-arch1-1 with LXDE on an HPe Probook 260 G6.

Yesterday, out of nowhere, my system "partially hanged". Explaining:

- LXDE was still operable so I could do things like open LXTerminal;
- Dhcpcd went down;
- Any sudo command would end with a frozen terminal. I could end that terminal but a "ps aux" would show the "sudo" command still running (frozen) in the background;
- Had to hard reset the system because LXDM would not allow shutdown/reboot/logoff or any kind of command (nothing happens, probably also hanged just like the "sudos" above;
- System works fine after reboot until it hangs again randomly.

One thing I've been postponing so far was a series of messages that show up on the console when I do a graceful shutdown/restart and I can see those same messages when running a "journactl -b -r":

-- Journal begins at Thu 2021-11-18 09:49:31 -03, ends at Thu 2021-11-18 09:57:40 -03. --
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 33 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 29 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: can't find device of ID00e8
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 20 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 29 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 22 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 20 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 23 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 20 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:    [ 0] RxErr                  (First)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 24 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 37 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 30 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:    [ 0] RxErr                  (First)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 19 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:    [ 0] RxErr                  (First)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 21 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:    [ 0] RxErr                  (First)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 34 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 25 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 27 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 26 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:    [ 0] RxErr                  (First)
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 21 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0: AER: can't find device of ID00e8
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 18 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 15 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 25 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 33 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 27 kernel messages
Nov 18 09:57:40 tldhcm kernel: pcieport 0000:00:1d.0:   device [8086:9d18] error status/mask=00000001/00002000
Nov 18 09:57:40 tldhcm systemd-journald[4111]: Missed 18 kernel messages

Did a "lspci -k" and found this about the 00:1d.0 device:

00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
	Kernel driver in use: pcieport

Does any one has any leads on where to go next to troubleshoot this and if those hardware messages are worrisome or not?

Offline

#2 2021-11-18 13:40:11

HenriqueHCM
Member
Registered: 2021-11-10
Posts: 15

Re: Arch partially hangs and multiple hardware errors

Well, those messages disappeared after I added the "pcie_aspm=off" boot parameter. I know that will disable some power saving features but that it's not actually an issue.

Now that it's a lot cleaner I noticed another message in the journactl output:

[Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb2 (or later)

I installed the intel-ucode package but the message's still showing up so I'll keep digging into it as well as checking if I get another freeze.

Last edited by HenriqueHCM (2021-11-18 13:40:39)

Offline

#3 2021-11-18 16:31:05

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,950

Re: Arch partially hangs and multiple hardware errors

https://wiki.archlinux.org/title/Microc … ly_loading

- Dhcpcd went down

this is more interesting in particular if it (or the same cause) leads to a failure in resolving the localhost.
Do you have anything in the journal itr?

Online

#4 2021-11-18 17:47:39

HenriqueHCM
Member
Registered: 2021-11-10
Posts: 15

Re: Arch partially hangs and multiple hardware errors

And with that all error messages are gone.

For the localhost resolution test I'll now have to wait if it freezes again.
I'll keep you updated.

Offline

Board footer

Powered by FluxBB