You are not logged in.
I've noticed that my machine drops up to 10% RX packets, as reported by ifconfig. I see no packet loss on TX, nor anywhere else on my network. The NIC/machine is hanging off a Netgear switch, and I tried changing the cable/switch port to no effect.
Googling led me to believe that there is an issue with the r8169 module utilized for my Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller hardware. I tried installing the r8168 package and blacklisting the r8169 module. I verified that the kernel module was in use with lspci -v but it didn't resolve the RX packet drops, so I've reverted to the r8169 module for now.
Here's some of what I've done so far. Default config:
lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)
Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
Flags: bus master, fast devsel, latency 0, IRQ 18
I/O ports at e000 [size=256]
Memory at f7c00000 (64-bit, non-prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-58-6c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8169
Kernel modules: r8169
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.10 netmask 255.255.255.0 broadcast 10.0.0.255
inet6 fe50::feaa:13ff:fe78:b543 prefixlen 64 scopeid 0x20<link>
ether fc:aa:14:78:b4:43 txqueuelen 1000 (Ethernet)
RX packets 13762 bytes 7634587 (7.2 MiB)
RX errors 0 dropped 988 overruns 0 frame 0
TX packets 4357 bytes 1028463 (1004.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
And here is the output after installing/loading r8168 and blacklisting/unloading r8169:
lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)
Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet
Flags: bus master, fast devsel, latency 0, IRQ 30
I/O ports at e000 [size=256]
Memory at f7c00000 (64-bit, non-prefetchable) [size=4K]
Memory at f0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-58-6c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8168
Kernel modules: r8169, r8168
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.10 netmask 255.255.255.0 broadcast 10.0.0.255
inet6 fe80::feaa:14ff:fe98:b543 prefixlen 64 scopeid 0x20<link>
ether fc:aa:14:78:b4:43 txqueuelen 1000 (Ethernet)
RX packets 5076 bytes 1805105 (1.7 MiB)
RX errors 0 dropped 456 overruns 0 frame 0
TX packets 1792 bytes 411024 (401.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 30 base 0xb000
I booted the hardware off a MintXFCE 19 LiveCD and didn't see the packet loss while in the Live environment. My only thought at this time is to further explore differences from the Live environment module and my Arch instance. One difference that may be noteworthy is that in the Live environment lsmod shows
r8169 86016 0
mii 16384 1 r8169
and my Arch instance shows:
r8169 90112 0
libphy 77824 2 r8169
This is the only difference I can find, but I'm not sure where to go from here, even if it's noteworthy...any input?
*edit*
I kept digging and noticed that the dropped packets were not shown when running ethtool, despite appearing in ifconfig output. Along the way, I've learned that there were many issues with the r8168 and r8169 modules, but the 8169 module is now correct for my hardware. I've also found this document, which says that ethtool output should be preferred over ifconfig output: https://access.redhat.com/solutions/504293
I found this "issue" initially because I start running netdata on my server and then got into network tuning. I have been able to tune out all of the buffer issues which were causing legitimate packet loss, but netdata is still reporting the high RX packet loss. netdata must be pulling this data from the same source as ifconfig. I suspect it's /sys/class/net/eth0/statistics/ but I will have to dig into netdata to confirm.
If the RH doc is to be believed, ifconfig and /sys/class/net/eth0/statistics/ is not to be believed over ethtool and I should find a way to squelch this alarm in netdata. I'm not 100% satisfied with all of this, but maybe I've solved my own problem by coming to the conclusion that there is no problem...
Last edited by tixetsal (2018-11-21 20:49:42)
Offline
I turns out that the dropped packets were some type of traffic being generated by the 2 small switches on my network which are not even involved in this scenario. I noticed that the dropped packets were always reported in twos. I unplugged one switch, and the dropped packets decreased to one at a time. I unplugged the other switch, and the dropped packets stopped!
This document https://access.redhat.com/solutions/657483 leads me to believe that the switch is passing some type of 1) Bad VLAN tag or 2) Packets received with unknown or unregistered protocols. The complete data collected agrees with the RH doc and general consensus that it's better to trust ethtool than ifconfig. Resolved!
Offline
https://en.wikipedia.org/wiki/Netgear_NSDP is probably the culprit in my situation. I suspect that if I had sniffed in promiscuous mode, I would have seen the the traffic. It's interesting that they only register on my Arch machine...
*5 mins later*
Turning off "loop detection" on the Netgear switches stopped the counter from going up! Stick a fork in this turkey. It's done.
Last edited by tixetsal (2018-11-21 22:24:54)
Offline