You are not logged in.

#1 2023-03-16 11:30:57

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 359

Network interface not being initialized correctly with lts kernel

I operate 3 Supermicro servers (DMI: Supermicro X8SIL/X8SIL, BIOS 1.2a  06/27/2012) which have 2 integrated Ethernet controllers. All use the lts kernel atm: 6.1.18-1-lts

Every now and then the network interface does not come up resulting in the server being inaccesible over the 'normal' interfaces. The IPMI interface is responding and a reboot normally cures the situation. This only happens on an irregular basis once or twice a week.

journalctl output in this cases is:

[root@nullnullsix ~]# journalctl -b-1 | grep kernel | grep e1000
Mär 15 23:15:30 nullnullsix kernel: e1000e: Intel(R) PRO/1000 Network Driver
Mär 15 23:15:30 nullnullsix kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Disabling ASPM L0s L1
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): Failed to initialize MSI-X interrupts.  Falling back to MSI interrupts.
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Mär 15 23:15:30 nullnullsix kernel: e1000e: probe of 0000:04:00.0 failed with error -2
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0: Disabling ASPM L0s L1
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): registered PHC clock
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:bb:41
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 eth0: Intel(R) PRO/1000 Network Connection
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 eth0: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:05:00.0 enp5s0: renamed from eth0

a normal boot looks like this:

[root@nullnullsix ~]# journalctl -b | grep kernel | grep e1000
Mär 16 12:02:48 nullnullsix kernel: e1000e: Intel(R) PRO/1000 Network Driver
Mär 16 12:02:48 nullnullsix kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0: Disabling ASPM L0s L1
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): registered PHC clock
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:bb:40
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 eth0: Intel(R) PRO/1000 Network Connection
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 eth0: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0: Disabling ASPM L0s L1
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): registered PHC clock
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 eth1: (PCI Express:2.5GT/s:Width x1) 00:25:90:09:bb:41
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 eth1: Intel(R) PRO/1000 Network Connection
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 eth1: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:04:00.0 enp4s0: renamed from eth0
Mär 16 12:02:48 nullnullsix kernel: e1000e 0000:05:00.0 enp5s0: renamed from eth1
Mär 16 12:02:56 nullnullsix kernel: e1000e 0000:04:00.0 enp4s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

lspci -vv output:

4:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Super Micro Computer Inc X8SIL
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fb5e0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at dc00 [size=32]
        Region 3: Memory at fb5dc000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout+ AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Device Serial Number xx-xx-xx-xx-xx-xx-xx-xx
        Kernel driver in use: e1000e
        Kernel modules: e1000e

Maybe someone has a clue here?

And no, I can't change to kernel 6.2 because there the Wake-on-LAN is not working sad


Greetings
Harvey


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#2 2023-03-16 12:57:20

seth
Member
Registered: 2012-09-03
Posts: 50,008

Re: Network interface not being initialized correctly with lts kernel

Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible

Does cold ./. warm boot matter?
Does the device show up and behave correctly on a rescan?
(eg. https://stackoverflow.com/questions/323 … f-pcie-bus )

Offline

#3 2023-03-16 16:28:37

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 359

Re: Network interface not being initialized correctly with lts kernel

Seth,

first, thanks for your input!

seth wrote:
Mär 15 23:15:30 nullnullsix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible

Does cold ./. warm boot matter?

No. Had it after cold boot as well as after a reboot. I turns up sporadically without any rule afaict.

seth wrote:

Does the device show up and behave correctly on a rescan?
(eg. https://stackoverflow.com/questions/323 … f-pcie-bus )

Will have to wait for the next failure to test that. But that is a good point to try.
At some point I had the suspicion that it could be the BMC sharing the same network port with 'normal' LAN. Will have to connect an additional network cable next time I am present at the server to rule that out. But why did it work then with pre-6 kernels... hmm

Greetings
Harvey


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#4 2023-03-18 12:41:21

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 359

Re: Network interface not being initialized correctly with lts kernel

Okay, today it did fail again. So I logged into the machine using IPMI console and it looks like the network devices do show up on the PCI bus:

[root@nullnullsix ~]# lspci | grep Ethernet
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

Hence rescanning the pci bus is not the way to go I think..
Nevertheless I did try

echo 1 > /sys/bus/pci/rescan

without any changes.
But the second network interface seems to be functional (no cable connected here)

[root@nullnullsix ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp5s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 00:25:90:09:bb:41 brd ff:ff:ff:ff:ff:ff

Note that enp4s0 is missing...
I get a strong feeling that this could be related to the IPMI device and the 'normal' interface sharing the same pysical interface. I will try to give the IPMI a dedicated network interface and cable and see if the problem persists... Weird hmm


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#5 2023-03-29 14:14:58

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 359

Re: Network interface not being initialized correctly with lts kernel

Okay, so that was the wrong idea. I gave the BMC a dedicated cable and set the IPMI network to 'dedicated' which means that it is away from the normal network interfaces. And right after the next boot:

[root@numalfix ~]# journalctl -b-1 | grep e1000
Mär 29 16:04:05 numalfix kernel: e1000e: Intel(R) PRO/1000 Network Driver
Mär 29 16:04:05 numalfix kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Mär 29 16:04:05 numalfix kernel: e1000e 0000:04:00.0: Disabling ASPM L0s L1
Mär 29 16:04:05 numalfix kernel: e1000e 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Mär 29 16:04:05 numalfix kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 29 16:04:05 numalfix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): Failed to initialize MSI-X interrupts.  Falling back to MSI interrupts.
Mär 29 16:04:05 numalfix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Mär 29 16:04:05 numalfix kernel: e1000e: probe of 0000:04:00.0 failed with error -2
Mär 29 16:04:05 numalfix kernel: e1000e 0000:05:00.0: Disabling ASPM L0s L1
Mär 29 16:04:05 numalfix kernel: e1000e 0000:05:00.0: Unable to change power state from D3cold to D0, device inaccessible
Mär 29 16:04:05 numalfix kernel: e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 29 16:04:05 numalfix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): Failed to initialize MSI-X interrupts.  Falling back to MSI interrupts.
Mär 29 16:04:05 numalfix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): Failed to initialize MSI interrupts.  Falling back to legacy interrupts.
Mär 29 16:04:05 numalfix kernel: e1000e: probe of 0000:05:00.0 failed with error -2

Both network interfaces were inactive. After a reset (via the management interface which is working...) all is back to normal:

[root@numalfix ~]# journalctl -b | grep e1000
Mär 29 16:05:08 numalfix kernel: e1000e: Intel(R) PRO/1000 Network Driver
Mär 29 16:05:08 numalfix kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0: Disabling ASPM L0s L1
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0 0000:04:00.0 (uninitialized): registered PHC clock
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:25:90:37:67:f4
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0 eth0: Intel(R) PRO/1000 Network Connection
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0 eth0: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0: Disabling ASPM L0s L1
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0 0000:05:00.0 (uninitialized): registered PHC clock
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0 eth1: (PCI Express:2.5GT/s:Width x1) 00:25:90:37:67:f5
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0 eth1: Intel(R) PRO/1000 Network Connection
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0 eth1: MAC: 3, PHY: 8, PBA No: 0101FF-0FF
Mär 29 16:05:08 numalfix kernel: e1000e 0000:04:00.0 enp4s0: renamed from eth0
Mär 29 16:05:08 numalfix kernel: e1000e 0000:05:00.0 enp5s0: renamed from eth1
Mär 29 16:05:11 numalfix kernel: e1000e 0000:04:00.0 enp4s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

That was my best guess until now. Anyone else an idea? hmm

Last edited by Harey (2023-03-29 14:15:34)


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#6 2023-03-29 14:58:30

seth
Member
Registered: 2012-09-03
Posts: 50,008

Re: Network interface not being initialized correctly with lts kernel

"pcie_aspm=off"?
Add e1000e to the initramfs?

Did you check the journal whether there're bus errors preceeding the device failure?

Offline

#7 2023-03-29 15:51:48

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 359

Re: Network interface not being initialized correctly with lts kernel

I did check the journal, no bus errors, not even warnings before. For now I tried to add the module to the initramfs. Let's see what happens now.


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#8 2023-04-08 01:28:30

prokrypt
Member
Registered: 2006-04-06
Posts: 6

Re: Network interface not being initialized correctly with lts kernel

seth wrote:

"pcie_aspm=off"?
Add e1000e to the initramfs?

Did both and it fixed my problem on a X8SIE upon reboot. Then I removed the kernel switch and it remained fixed, so most likely the mkinitcpio.conf edit made it work.
Not sure when my ports stopped working as it was connected to my google wifi mesh which falls back to wireless...

Last edited by prokrypt (2023-04-08 01:30:18)

Offline

#9 2023-04-08 11:05:14

Harey
Member
From: Bavaria, Germany
Registered: 2007-03-24
Posts: 359

Re: Network interface not being initialized correctly with lts kernel

@prokrypt: Is this on 6.2 or lts kernel? At least I am not alone hmm
The mkinitcpio.conf edit makes it a lot more stable for me too, but it's not fixed completely... Yesterday I played around with one of the servers and had to reboot several times and look - here it is again... But only for 1 time. By now I can't tell why this is happening. I hoped that the move to the 6.2 kernel with the Wake-onLAN problem fixed would squash this bug as well tongue


Linux is like a wigwam: No Gates, no Windows and an Apache inside

Offline

#10 2023-04-08 19:48:09

prokrypt
Member
Registered: 2006-04-06
Posts: 6

Re: Network interface not being initialized correctly with lts kernel

I was on 6.2.8.
Maybe I should upgrade to 6.2.10 and roll the dice again? Or perhaps just enjoy my ethernet while it's still working big_smile

Last edited by prokrypt (2023-04-08 19:49:08)

Offline

Board footer

Powered by FluxBB