You are not logged in.

#1 2022-02-14 18:53:50

drumino
Member
Registered: 2022-02-14
Posts: 1

DHCP lease drop under heavy bandwidth load

I've been hunting the cause for random dhcp lease drops for two days now, after trying to upload a huge amount of data and constantly losing my ipv4 lease lease every 5-10 minutes.

Some things I noticed:

- Issue only occurs when my bandwidth is close to being maxed out, but happens with high %of my maximum upload as well as download
- No disconnects / DHCP lease renewals occured when I just let the system idle for a night without an active network transfer
- My router does not disconnect, nor do my other devices in the network. Just my machine and the router lose all desire to communicate for a few seconds
- I already tried another cable and the observed behaviour did not change at all, so the proposed solution here does not work for me with a sample size of 2 cables
- I've swapped dhcpcd for NetworkManager and thought for ~30 minutes that the problem has been fixed, but it returned after a short while and results in the same bug
- Probably unrelated (and nothing new in the setup) but I do have a pihole which acts as my DNS server. The router manages the dhcp lease though, so I do not see any significance for this
- ip addr shows the device as DOWN in the few seconds where this bug occurs
- Back when i was using dhcpcd I tried disabling ipv6 in the dhcpcd conf, but no difference


NetworkManager shows the following logs everytime the drop occurs:

Feb 14 19:16:26 hostname NetworkManager[690]: <info>  [1644862586.7813] device (enp5s0): carrier: link connected
Feb 14 19:16:26 hostname NetworkManager[690]: <info>  [1644862586.7814] device (enp5s0): DHCPv4 lease renewal requested
Feb 14 19:16:26 hostname NetworkManager[690]: <info>  [1644862586.8140] dhcp4 (enp5s0): canceled DHCP transaction
Feb 14 19:16:26 hostname NetworkManager[690]: <info>  [1644862586.8140] dhcp4 (enp5s0): state changed bound -> terminated
Feb 14 19:16:26 hostname NetworkManager[690]: <info>  [1644862586.8143] dhcp4 (enp5s0): activation: beginning transaction (timeout in 45 seconds)
Feb 14 19:16:28 hostname NetworkManager[690]: <info>  [1644862588.8194] dhcp4 (enp5s0): state changed unknown -> bound, address=192.168.178.2

dhcpcd showed the equivalent, I rebooted since i swapped to NetworkManager and therefore lost my logs, but the only significant difference was "carrier lost"  in dhcpcd logs while NetworkManager stays on the positive side and only informs about a successful connection.
Not helpful though, even though I do appreciate the positivity here. /s

I do see quite a few errors with:

$ ip -s link
...
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a8:a1:59:18:0e:5f brd ff:ff:ff:ff:ff:ff
    RX:   bytes  packets errors dropped  missed   mcast           
    97133225704 66428667    350   10110       0    3047 
    TX:   bytes  packets errors dropped carrier collsns           
      783812410 11008235      0       0       0       0 
...

systemd services:

$ find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -f
bluetooth.service                        | bluetooth.target.wants
containerd.service                       | multi-user.target.wants
dbus-org.bluez.service                   | system
dbus-org.freedesktop.nm-dispatcher.service | system
dirmngr.socket                           | sockets.target.wants
display-manager.service                  | system
docker.service                           | multi-user.target.wants
gcr-ssh-agent.socket                     | sockets.target.wants
getty@tty1.service                       | getty.target.wants
gpg-agent-browser.socket                 | sockets.target.wants
gpg-agent-extra.socket                   | sockets.target.wants
gpg-agent-ssh.socket                     | sockets.target.wants
gpg-agent.socket                         | sockets.target.wants
NetworkManager-wait-online.service       | network-online.target.wants
NetworkManager.service                   | multi-user.target.wants
p11-kit-server.socket                    | sockets.target.wants
pipewire-media-session.service           | pipewire.service.wants
pipewire-session-manager.service         | user
pipewire.socket                          | sockets.target.wants
pulseaudio.socket                        | sockets.target.wants
remote-fs.target                         | multi-user.target.wants
sshd.service                             | multi-user.target.wants
xdg-user-dirs-update.service             | default.target.wants

Maybe someone can inspire any further investigations.
Thanks in advance!

Edit:

I just noticed a drop without any significant bandwidth usage and had another look at

$ journalctl -b

Found the following, potentially interesting part, but can't really identify a cause:

...
Feb 14 20:25:46 hostname kernel: igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Down
Feb 14 20:25:49 hostname kernel: igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 14 20:25:51 hostname kernel: igb 0000:05:00.0: exceed max 2 second
Feb 14 20:25:51 hostname kernel: igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Down
Feb 14 20:25:52 hostname NetworkManager[694]: <info>  [1644866752.0769] device (enp5s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 14 20:25:52 hostname NetworkManager[694]: <info>  [1644866752.0942] dhcp4 (enp5s0): canceled DHCP transaction
Feb 14 20:25:52 hostname NetworkManager[694]: <info>  [1644866752.0942] dhcp4 (enp5s0): state changed bound -> terminated
Feb 14 20:25:52 hostname NetworkManager[694]: <info>  [1644866752.0987] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 14 20:25:52 hostname dbus-daemon[693]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.2' (uid=0 pid=694 comm="/usr/bin/NetworkManager --no-daemon ")
Feb 14 20:25:52 hostname systemd[1]: Starting Network Manager Script Dispatcher Service...
Feb 14 20:25:52 hostname dbus-daemon[693]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Feb 14 20:25:52 hostname systemd[1]: Started Network Manager Script Dispatcher Service.
Feb 14 20:25:52 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 14 20:25:52 hostname kernel: audit: type=1130 audit(1644866752.124:120): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 14 20:25:52 hostname kernel: igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2218] device (enp5s0): carrier: link connected
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2220] device (enp5s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2237] policy: auto-activating connection 'Wired connection 1' (7c8c9eaa-7b43-3e69-938d-56765eabf0ef)
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2241] device (enp5s0): Activation: starting connection 'Wired connection 1' (7c8c9eaa-7b43-3e69-938d-56765eabf0ef)
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2242] device (enp5s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2244] manager: NetworkManager state is now CONNECTING
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2246] device (enp5s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2255] device (enp5s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 14 20:25:53 hostname NetworkManager[694]: <info>  [1644866753.2258] dhcp4 (enp5s0): activation: beginning transaction (timeout in 45 seconds)
Feb 14 20:25:53 hostname kded5[837]: org.kde.plasma.nm.kded: Unhandled active connection state change:  1
Feb 14 20:25:54 hostname kwin_x11[840]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 28325, resource id: 33554651, major code: 15 (QueryTree), minor code: 0
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2325] dhcp4 (enp5s0): state changed unknown -> bound, address=192.168.178.2
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2334] device (enp5s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2342] policy: set 'Wired connection 1' (enp5s0) as default for IPv4 routing and DNS
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2357] device (enp5s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2359] device (enp5s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2362] manager: NetworkManager state is now CONNECTED_SITE
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.2365] device (enp5s0): Activation: successful, device activated.
Feb 14 20:25:55 hostname NetworkManager[694]: <info>  [1644866755.3372] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 14 20:25:57 hostname kwin_x11[840]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 33039, resource id: 33554657, major code: 15 (QueryTree), minor code: 0
Feb 14 20:26:02 hostname systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Feb 14 20:26:02 hostname audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 14 20:26:02 hostname kernel: audit: type=1131 audit(1644866762.131:121): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 14 20:26:25 hostname dbus-daemon[693]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.138' (uid=0 pid=60759 comm="sudo systemctl status NetworkManager ")
Feb 14 20:26:25 hostname dbus-daemon[693]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.home1.service': Unit dbus-org.freedesktop.home1.service not found.
Feb 14 20:26:25 hostname sudo[60759]: pam_systemd_home(sudo:auth): systemd-homed is not available: Unit dbus-org.freedesktop.home1.service not found.
Feb 14 20:26:27 hostname audit[60759]: USER_AUTH pid=60759 uid=1000 auid=1000 ses=1 msg='op=PAM:authentication grantors=pam_faillock,pam_permit,pam_faillock acct="drumino" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/8 res=success'
Feb 14 20:26:27 hostname audit[60759]: USER_ACCT pid=60759 uid=1000 auid=1000 ses=1 msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="drumino" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/8 res=success'
Feb 14 20:26:27 hostname sudo[60759]:  drumino : TTY=pts/8 ; PWD=/mnt ; USER=root ; COMMAND=/usr/bin/systemctl status NetworkManager
...

Last edited by drumino (2022-02-14 19:35:27)

Offline

#2 2022-02-14 21:01:26

-thc
Member
Registered: 2017-03-15
Posts: 775

Re: DHCP lease drop under heavy bandwidth load

The adapter repeatedly changes its "up" status - seemingly on its own. Together with the RX errors I suspect faulty hardware.

I would suggest either testing another network adapter or another operating system (Live DVD / USB drive) to verify this.

Offline

Board footer

Powered by FluxBB