Detected Hardware Unit Hang : Reset adapter unexpectedly

erikvv · 2013-05-06 22:28:13

Hello all!

### summary ###

I have upgraded my kernel from 3.5 to 3.8 and since then I am experiencing timeouts on one of the network interfaces. I get the following error in journalctl when this happens.

May 06 23:23:41 PRIME kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
  TDH                  <3d>
  TDT                  <92>
  next_to_use          <92>
  next_to_clean        <3d>
buffer_info[next_to_clean]:
  time_stamp           <100039539>
  next_to_watch        <3e>
  jiffies              <100039856>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
May 06 23:23:43 PRIME kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
  TDH                  <3d>
  TDT                  <92>
  next_to_use          <92>
  next_to_clean        <3d>
buffer_info[next_to_clean]:
  time_stamp           <100039539>
  next_to_watch        <3e>
  jiffies              <100039aae>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
May 06 23:23:44 PRIME kernel: e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
May 06 23:23:44 PRIME kernel: br0: port 1(eno1) entered disabled state
May 06 23:23:48 PRIME kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 06 23:23:48 PRIME kernel: br0: port 1(eno1) entered forwarding state
May 06 23:23:48 PRIME kernel: br0: port 1(eno1) entered forwarding state

### Distro ###

Originally I was using Ubuntu. I upgraded from 12.10 to 13.04 and then issue started. I borked my kernel in an attempt to fix things. After that I have switched to Arch, but the issue remains.

### diagnostics ###

Motherboard is a P9X79 deluxe (lspci has it wrong). It has 3 network interfaces: one wireless, one Realtek 8111E (enp10s0), one Intel 82579V (eno1). Only the Intel is affected by this problem.

lspci -vvv

....
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
        Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 92
        Region 0: Memory at fbf00000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at fbf28000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at f040 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00000  Data: 4055
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: e1000e
        Kernel modules: e1000e
...

ethtool -i eno1

driver: e1000e
version: 2.3.2-NAPI
firmware-version: 0.13-4
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

ethtool -k eno1

Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off
rx-all: off

### use case ###

I am using this linux box as a router like below. It also runs Samba and Dhcpd for the LAN.

                           #----------#   #-----#   #--------#
                           |          |   |     |---|  eno1  |---(LAN)
             #---------#   |          |   |     |   #--------#   
(internet)---| enp10s0 |---| COMPUTER |---| br0 |         
             #---------#   |          |   |     |   #--------#   
                           |          |   |     |---| wlp8s0 |---(LAN)
                           #----------#   #-----#   #--------#

I use netctl to set up enp10s0 (dhcp) and br0 (static), and hostapd to set up wlp8s0.

iptables-save (I'm running a minimal setup to keep it simple for now)

*nat
:PREROUTING ACCEPT [7840:680033]
:INPUT ACCEPT [2619:163187]
:OUTPUT ACCEPT [283:17319]
:POSTROUTING ACCEPT [929:78831]
-A POSTROUTING -s 10.0.0.0/24 -o enp10s0 -j MASQUERADE
COMMIT

*filter
:INPUT ACCEPT [98426:606304255]
:FORWARD ACCEPT [7047:594273]
:OUTPUT ACCEPT [106432:61533766]
COMMIT

### reproducability ###

The issue seems random and occurs under different circumstances, but I've managed to find a 100% reproducable use case: on a PC in the LAN i use a firefox download manager to download a large file at 10 MB/s which I save directly on the router through SMB or SCP. This error occurs within a few seconds.

Oddly, when I download a file to a local drive and transfer it to the router afterwards (at much higher speed), the issue does not occur.

### what i've tried ###

- fresh installation
- updated bios
- installed latest drivers from intel
- removed the network bridge and wireless interface, and run the system with only the 2 wired interfaces
- replaced physical cables
- turned off auto-negotiation
- disabled rx flow control
- enabled arp filtering

nothing changed

### Final notes ####

I must say I'm very content with Arch so far. The wiki is great and many actions are simpler than on Ubuntu.

This is the only linux pc I ever use, so take it a bit slow on me.

One more thing I could try is swap the interfaces: use the Realtek one for LAN and the Intel for internet. But that also means that when the issues occurs I might not notice it while others using the server remotely (of which there are many more) will be affected.

I really hope someone can help.

Last edited by erikvv (2013-05-22 20:45:32)

erikvv · 2013-05-06 23:18:53

I've swapped the network interfaces and the issue is gone. Still curious after this though. Intel is supposed to be better than Realtek.

Last edited by erikvv (2013-05-06 23:19:21)

erikvv · 2013-05-10 00:01:11

Alas, the problem appeared even in the new configuration. So not solved.

erikvv · 2013-05-14 13:23:53

I've contacted Asus support, and they want me to test on Windows.

Nothing against Windows, but I'd have to buy a license and take out my entire server again (I don't have another system to replace it with. I lease Minecraft servers on it which requires powerful hardware).

Last edited by erikvv (2013-05-14 13:25:14)

erikvv · 2013-05-20 19:25:43

nvm

Last edited by erikvv (2013-05-22 20:45:41)

demize · 2013-06-26 16:06:34

From #archlinux:

18:03:45 onox | demize: could you post at https://bbs.archlinux.org/viewtopic.php?id=162841 the following: I tried to disable tcp-segmentation-offload with: ethtool -K eno1 tso off (seems to work for my 82579LM)

Last edited by demize (2013-06-26 16:07:27)

adamcstephens · 2014-02-08 05:00:17

This worked for me on my 82566MM in my laptop which I use as a NAT. Mine was on the internal/private interface.

ethtool -K enp0s25 tso off

And try to set it permanent in /etc/netctl/<profile>

ExecUpPost='/usr/bin/ethtool -K enp0s25 tso off'

Last edited by adamcstephens (2014-02-08 05:07:19)

Arch Linux

#1 2013-05-06 22:28:13

Detected Hardware Unit Hang : Reset adapter unexpectedly

#2 2013-05-06 23:18:53

Re: Detected Hardware Unit Hang : Reset adapter unexpectedly

#3 2013-05-10 00:01:11

Re: Detected Hardware Unit Hang : Reset adapter unexpectedly

#4 2013-05-14 13:23:53

Re: Detected Hardware Unit Hang : Reset adapter unexpectedly

#5 2013-05-20 19:25:43

Re: Detected Hardware Unit Hang : Reset adapter unexpectedly

#6 2013-06-26 16:06:34

Re: Detected Hardware Unit Hang : Reset adapter unexpectedly

#7 2014-02-08 05:00:17

Re: Detected Hardware Unit Hang : Reset adapter unexpectedly

Board footer