[Solved] WiFi times out but does not disconnect, especially under load

niisse · 2022-02-14 09:54:51

Hey all, first post here.

I'd appreciate any help tracking down my WiFi problem. I have no issues connecting to any WiFi network, and these issues do not persist on Windows. Whenever I'm connected to any network, after anything between 2 - 60 minutes, my connection seems to time out (webpages load indefinitely, occasionally giving a 'host not found' or dns error). These issues seem to get worse when load is high: this morning I was on a Teams call; opening a browser window and loading a page would more or less instantly make my connection time out. The system will still report being connected to the WiFi connection, however.

I can't seem to find relevant errors in the output of 'dmesg', 'systemctl status NetworkManager', or 'journalctl'.

I'd like to know what system info is relevant to share in order to help diagnose the issue - please let me know what commands to run and I'll post their output!

Thanks in advance.

Last edited by niisse (2022-02-25 11:02:55)

V1del · 2022-02-14 10:27:14

A full unaltered and unfiltered

sudo journalctl -b

covering the issue would be best: https://wiki.archlinux.org/title/List_o … n_services

Since you do mention Windows, a basic precaution should be to disable fast boot: https://wiki.archlinux.org/title/Dual_b … ibernation

Last edited by V1del (2022-02-14 10:28:11)

niisse · 2022-02-14 10:37:42

Here's a link to the journalctl output. This one was taken just after I noticed another timeout had occured.

Fast boot on Windows is already disabled.

Last edited by niisse (2022-02-14 10:38:03)

V1del · 2022-02-14 11:09:46

loads of BT errors towards the end there. And "under load" and "teams call" likely means both BT and wifi were heavily utilised. That's always a tricky situation that does lead to conflicts and a constellation that Windows (or rather the Windows firmware) often has better mitigations for. The "most" proper way to fix this would be to switch the wireless "out of range" of the bluetooth band, i.e. make that a 5Ghz connection if possible. Or avoid BT and wifi at the same time altogether.

Otherwise from a cursory googler it's also possible that the current standard ath10k firmware does not enable bt coexistance by default, which would definitely show itself in the manner like you're seeing. I've found https://gist.github.com/jmfernandez/0b0 … c774e25fe6 for a procedure to try and enable that for a firmware, but that's of course a bit involved.

Last edited by V1del (2022-02-14 11:10:40)

niisse · 2022-02-14 11:35:49

The BT errors are a seperate issue, actually. I'll open another thread for that sometime.

I'm aware of BT/WiFi coexistance woes - ever since acquiring a BT mouse & keyboard I've configured my router to send out seperate SSIDs for 2.4GHz and 5GHz. That definitely made the coexistance a lot better; but it's not related to my current issues as far as I'm aware. During the Teams call, I used my headphones without bluetooth (good ol' aux cable).

The WiFi issues I'm having persist when I'm not using any bluetooth devices at all.

seth · 2022-02-15 09:56:50

When this happens, can you still

ping _gateway

and also obtain

iw dev wlp1s0 station dump

and when in doubt, disable https://wiki.archlinux.org/title/Networ … domization

niisse · 2022-02-15 11:05:09

It just happened again (before I read your reply, unfortunately). Will report back here with that data once I get the chance.

In the meantime, I poked around with the ip command. 'ip link show wlp1s0' returned

GENERAL.STATE: 100 (Connected)

but 'ip monitor' gave as output:

ip monitor
192.168.2.254 dev wlp1s0 FAILED 
	fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 DELAY 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 PROBE 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
192.168.2.254 dev wlp1s0 FAILED 
192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 STALE 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
Deleted 2: wlp1s0    inet 192.168.2.9/24 brd 192.168.2.255 scope global dynamic noprefixroute wlp1s0
       valid_lft 73443sec preferred_lft 73443sec
Deleted broadcast 192.168.2.255 dev wlp1s0 table local proto kernel scope link src 192.168.2.9 
Deleted local 192.168.2.9 dev wlp1s0 table local proto kernel scope host src 192.168.2.9 
Deleted 192.168.2.3 dev wlp1s0 lladdr 7c:d9:5c:0e:93:89 STALE 
Deleted 239.255.255.250 dev wlp1s0 lladdr 01:00:5e:7f:ff:fa NOARP 
Deleted 224.0.0.251 dev wlp1s0 lladdr 01:00:5e:00:00:fb NOARP 
Deleted 225.0.0.222 dev wlp1s0 lladdr 01:00:5e:00:00:de NOARP 
Deleted 192.168.2.1 dev wlp1s0 lladdr 14:c1:4e:22:e6:4b STALE 
Deleted 192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 DELAY 
Deleted 224.0.0.2 dev wlp1s0 lladdr 01:00:5e:00:00:02 NOARP 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED 
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED

Wonder if that's a clue?

Finally, I noticed

nmcli device reapply wlp1s0

instantly fixed my link, without having to disconnect and reconnect first (which would often take a minute). Perhaps I could use that as a workaround (although I hope that won't be necessary).

Last edited by niisse (2022-02-15 11:06:52)

niisse · 2022-02-15 14:40:24

❯ ping _gateway
PING _gateway(_gateway (fe80::46fb:5aff:fed6:d7b9%wlp1s0)) 56 data bytes
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available

..ad infinitum

--- _gateway ping statistics ---
77 packets transmitted, 0 received, 100% packet loss, time 95997ms

❯ iw dev wlp1s0 station dump
Station 44:fb:5a:d6:d7:ba (on wlp1s0)
	inactive time:	107607 ms
	rx bytes:	601549278
	rx packets:	450857
	tx bytes:	24867732
	tx packets:	127205
	tx retries:	0
	tx failed:	0
	beacon loss:	0
	beacon rx:	0
	rx drop misc:	1213
	signal:  	-57 [-57, -73] dBm
	signal avg:	-56 [-56, -73] dBm
	beacon signal avg:	0 dBm
	tx bitrate:	6.0 MBit/s
	tx duration:	51967057 us
	rx bitrate:	97.6 MBit/s VHT-MCS 2 80MHz short GI VHT-NSS 1
	rx duration:	0 us
	authorized:	yes
	authenticated:	yes
	associated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		no
	TDLS peer:	no
	DTIM period:	1
	beacon interval:100
	short slot time:yes
	connected time:	7500 seconds
	associated at [boottime]:	8.471s
	associated at:	1644927971607 ms
	current time:	1644935471802 ms

I did run the reapply command before above output, I don't know if that influences things.

niisse wrote:

Finally, I noticed nmcli device reapply wlp1s0 instantly fixed my link

This time, it did not work.

seth · 2022-02-15 15:55:24

The system is probably waiting for ACKs
Pass

ath10k_pci.cryptmode=1

to the kernel and if you've not done so far, disable MAC randomization (if this works, try whether you can re-enable HW encryption)
https://wiki.archlinux.org/title/Kernel_parameters

niisse · 2022-02-15 17:57:59

Tried both things, but there's no change, unfortunately.

seth · 2022-02-15 23:24:31

When this happens and you leave the system alone (don't try to reconnect), does the connection magically revive?
If so and in general make sure that windows fast boot is still disabled (cause MS likes to re-enable it w/ updates) - your dhcp server (router) might silently drop you because it awaits a re-lease because a stale windows lease.
Can you inspect the router from a 3rd system to see whether the failing-but-not-disconnected system is still registered there?

niisse · 2022-02-16 11:05:52

From what I've seen, the connection doesn't revive. Windows Fast Boot is still off (I generally don't update Windows much, now that they've added an option to restart without updating hehe).

The router still considers the computer connected.

(By the way, I really appreciate you thinking along with me! Thanks man)

Last edited by niisse (2022-02-16 12:05:50)

seth · 2022-02-16 13:19:35

Let's see whether the tcp buffer gets jammed from a temporary burst … and whether growing the buffer will help with that.

cat /proc/sys/net/core/wmem_max
echo 2097152 > /proc/sys/net/core/wmem_max

nb. that this will not survive a reboot, you'll need a sysctl rule if it's a viable mitigation.

niisse · 2022-02-17 13:55:22

So far so good. Haven't noticed any issues anymore, although that might just be coincidence. Will keep an eye on the lookout and report back later.

How would I go about creating a systctl rule for the buffer size?

seth · 2022-02-17 13:59:49

/etc/sysctl.d/<number>-<name>.conf w/ "net.core.wmem_max = 2097152" inside (just that line)
What value was it before? 212992?

niisse · 2022-02-21 07:33:25

It was 212992 before, yeah. I haven't had any issues for the past few days(!), but just as I'm about to head into that monday-morning Teams call I started having problems again - before the call even started! I've restarted my computer and router, I'll see how it holds up now.

seth · 2022-02-21 13:15:33

You can grow the buffer to even bigger values, but there's probably something else creeping up in that buffer and now just has more space to grow.
When this happens the next time and you've no pressure, try to only reboot the router.

niisse · 2022-02-22 11:03:09

Rebooting the router fixes the connection - it disconnects and reconnects once the router is initalized.

seth · 2022-02-22 14:11:27

it disconnects and reconnects once the router is initalized

Since afaiu reconnecting would fix the situation anyway, this is unfortunately inconclusive.

Check "netstat -s" and raise the buffer even more, see eg. https://support.oracle.com/knowledge/Or … 140_1.html and https://fasterdata.es.net/host-tuning/l … st-tuning/ recommending really high values…

Last edited by seth (2022-02-22 14:11:58)

niisse · 2022-02-24 14:35:58

So, I figured out I had a typo in the sysctl rule and it didn't get properly applied on boot. Fixed that, and I haven't had any issues other than that monday morning... I'll take a look at those articles.

Seeing as increasing the buffer size appears to have solved the issue, should I mark this thread as solved?

Again, thank you for all your help!! Really appreciate it!

seth · 2022-02-24 15:20:19

I'd keep an occasional look at the netstat statistics on whether you're still hitting buffer overruns.
But if you think the problem is solved, you should mark the thread by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.

Arch Linux

#1 2022-02-14 09:54:51

[Solved] WiFi times out but does not disconnect, especially under load

#2 2022-02-14 10:27:14

Re: [Solved] WiFi times out but does not disconnect, especially under load

#3 2022-02-14 10:37:42

Re: [Solved] WiFi times out but does not disconnect, especially under load

#4 2022-02-14 11:09:46

Re: [Solved] WiFi times out but does not disconnect, especially under load

#5 2022-02-14 11:35:49

Re: [Solved] WiFi times out but does not disconnect, especially under load

#6 2022-02-15 09:56:50

Re: [Solved] WiFi times out but does not disconnect, especially under load

#7 2022-02-15 11:05:09

Re: [Solved] WiFi times out but does not disconnect, especially under load

#8 2022-02-15 14:40:24

Re: [Solved] WiFi times out but does not disconnect, especially under load

#9 2022-02-15 15:55:24

Re: [Solved] WiFi times out but does not disconnect, especially under load

#10 2022-02-15 17:57:59

Re: [Solved] WiFi times out but does not disconnect, especially under load

#11 2022-02-15 23:24:31

Re: [Solved] WiFi times out but does not disconnect, especially under load

#12 2022-02-16 11:05:52

Re: [Solved] WiFi times out but does not disconnect, especially under load

#13 2022-02-16 13:19:35

Re: [Solved] WiFi times out but does not disconnect, especially under load

#14 2022-02-17 13:55:22

Re: [Solved] WiFi times out but does not disconnect, especially under load

#15 2022-02-17 13:59:49

Re: [Solved] WiFi times out but does not disconnect, especially under load

#16 2022-02-21 07:33:25

Re: [Solved] WiFi times out but does not disconnect, especially under load

#17 2022-02-21 13:15:33

Re: [Solved] WiFi times out but does not disconnect, especially under load

#18 2022-02-22 11:03:09

Re: [Solved] WiFi times out but does not disconnect, especially under load

#19 2022-02-22 14:11:27

Re: [Solved] WiFi times out but does not disconnect, especially under load

#20 2022-02-24 14:35:58

Re: [Solved] WiFi times out but does not disconnect, especially under load

#21 2022-02-24 15:20:19

Re: [Solved] WiFi times out but does not disconnect, especially under load

Board footer