You are not logged in.

#1 2023-03-16 11:30:57

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 352

Network interface not being initialized correctly with lts kernel

I operate 3 Supermicro servers (DMI: Supermicro X8SIL/X8SIL, BIOS 1.2a  06/27/2012) which have 2 integrated Ethernet controllers. All use the lts kernel atm: 6.1.18-1-lts

Every now and then the network interface does not come up resulting in the server being inaccesible over the 'normal' interfaces. The IPMI interface is responding and a reboot normally cures the situation. This only happens on an irregular basis once or twice a week.

journalctl output in this cases is:

[root@nullnullsix ~]# journalctl -b-1 | grep kernel | grep e1000
Mär 15 23:15:30 nullnullsix kernel: e1000e: Intel(R) PRO/1000 Network Driver
Mär 15 23:15:30 nullnullsix kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Disabling ASPM L0s L1
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): Failed to initialize MSI-X interrupts.  Falling back to MSI interrupts.
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Mär 15 23:15:30 nullnullsix kernel: e1000e: probe of 0000:04:00.0 failed with error -2
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0: Disabling ASPM L0s L1
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): registered PHC clock
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:bb:41
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 eth0: Intel(R) PRO/1000 Network Connection
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 eth0: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 enp5s0: renamed from eth0

a normal boot looks like this:

[root@nullnullsix ~]# journalctl -b | grep kernel | grep e1000
Mär 16 12:02:48 nullnullsix kernel: e1000e: Intel(R) PRO/1000 Network Driver
Mär 16 12:02:48 nullnullsix kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0: Disabling ASPM L0s L1
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): registered PHC clock
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:bb:40
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 eth0: Intel(R) PRO/1000 Network Connection
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 eth0: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0: Disabling ASPM L0s L1
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): registered PHC clock
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 eth1: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:bb:41
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 eth1: Intel(R) PRO/1000 Network Connection
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 eth1: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 enp4s0: renamed from eth0
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 enp5s0: renamed from eth1
Mär 16 12:02:56 nullnullsix kernel: e1000e 0000:04:00.0 enp4s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

lspci -vv output:

4:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Super Micro Computer Inc X8SIL
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fb5e0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at dc00 [size=32]
        Region 3: Memory at fb5dc000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout+ AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Device Serial Number xx-xx-xx-xx-xx-xx-xx-xx
        Kernel driver in use: e1000e
        Kernel modules: e1000e

Maybe someone has a clue here?

And no, I can't change to kernel 6.2 because there the Wake-on-LAN is not working sad


Greetings
Harvey


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#2 2023-03-16 12:57:20

seth
Member
Registered: 2012-09-03
Posts: 36,837

Re: Network interface not being initialized correctly with lts kernel

Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible

Does cold ./. warm boot matter?
Does the device show up and behave correctly on a rescan?
(eg. https://stackoverflow.com/questions/323 … f-pcie-bus )

Offline

#3 2023-03-16 16:28:37

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 352

Re: Network interface not being initialized correctly with lts kernel

Seth,

first, thanks for your input!

seth wrote:
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible

Does cold ./. warm boot matter?

No. Had it after cold boot as well as after a reboot. I turns up sporadically without any rule afaict.

seth wrote:

Does the device show up and behave correctly on a rescan?
(eg. https://stackoverflow.com/questions/323 … f-pcie-bus )

Will have to wait for the next failure to test that. But that is a good point to try.
At some point I had the suspicion that it could be the BMC sharing the same network port with 'normal' LAN. Will have to connect an additional network cable next time I am present at the server to rule that out. But why did it work then with pre-6 kernels... hmm

Greetings
Harvey


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#4 2023-03-18 12:41:21

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 352

Re: Network interface not being initialized correctly with lts kernel

Okay, today it did fail again. So I logged into the machine using IPMI console and it looks like the network devices do show up on the PCI bus:

[root@nullnullsix ~]# lspci | grep Ethernet
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

Hence rescanning the pci bus is not the way to go I think..
Nevertheless I did try

echo 1 > /sys/bus/pci/rescan

without any changes.
But the second network interface seems to be functional (no cable connected here)

[root@nullnullsix ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp5s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 00:25:90:09:bb:41 brd ff:ff:ff:ff:ff:ff

Note that enp4s0 is missing...
I get a strong feeling that this could be related to the IPMI device and the 'normal' interface sharing the same pysical interface. I will try to give the IPMI a dedicated network interface and cable and see if the problem persists... Weird hmm


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

Board footer

Powered by FluxBB