You are not logged in.
Hey all, first post here.
I'd appreciate any help tracking down my WiFi problem. I have no issues connecting to any WiFi network, and these issues do not persist on Windows. Whenever I'm connected to any network, after anything between 2 - 60 minutes, my connection seems to time out (webpages load indefinitely, occasionally giving a 'host not found' or dns error). These issues seem to get worse when load is high: this morning I was on a Teams call; opening a browser window and loading a page would more or less instantly make my connection time out. The system will still report being connected to the WiFi connection, however.
I can't seem to find relevant errors in the output of 'dmesg', 'systemctl status NetworkManager', or 'journalctl'.
I'd like to know what system info is relevant to share in order to help diagnose the issue - please let me know what commands to run and I'll post their output!
Thanks in advance.
Last edited by niisse (2022-02-25 11:02:55)
Offline
A full unaltered and unfiltered
sudo journalctl -b
covering the issue would be best: https://wiki.archlinux.org/title/List_o … n_services
Since you do mention Windows, a basic precaution should be to disable fast boot: https://wiki.archlinux.org/title/Dual_b … ibernation
Last edited by V1del (2022-02-14 10:28:11)
Online
Here's a link to the journalctl output. This one was taken just after I noticed another timeout had occured.
Fast boot on Windows is already disabled.
Last edited by niisse (2022-02-14 10:38:03)
Offline
loads of BT errors towards the end there. And "under load" and "teams call" likely means both BT and wifi were heavily utilised. That's always a tricky situation that does lead to conflicts and a constellation that Windows (or rather the Windows firmware) often has better mitigations for. The "most" proper way to fix this would be to switch the wireless "out of range" of the bluetooth band, i.e. make that a 5Ghz connection if possible. Or avoid BT and wifi at the same time altogether.
Otherwise from a cursory googler it's also possible that the current standard ath10k firmware does not enable bt coexistance by default, which would definitely show itself in the manner like you're seeing. I've found https://gist.github.com/jmfernandez/0b0 … c774e25fe6 for a procedure to try and enable that for a firmware, but that's of course a bit involved.
Last edited by V1del (2022-02-14 11:10:40)
Online
The BT errors are a seperate issue, actually. I'll open another thread for that sometime.
I'm aware of BT/WiFi coexistance woes - ever since acquiring a BT mouse & keyboard I've configured my router to send out seperate SSIDs for 2.4GHz and 5GHz. That definitely made the coexistance a lot better; but it's not related to my current issues as far as I'm aware. During the Teams call, I used my headphones without bluetooth (good ol' aux cable).
The WiFi issues I'm having persist when I'm not using any bluetooth devices at all.
Offline
When this happens, can you still
ping _gateway
and also obtain
iw dev wlp1s0 station dump
and when in doubt, disable https://wiki.archlinux.org/title/Networ … domization
Offline
It just happened again (before I read your reply, unfortunately). Will report back here with that data once I get the chance.
In the meantime, I poked around with the ip command. 'ip link show wlp1s0' returned
GENERAL.STATE: 100 (Connected)
but 'ip monitor' gave as output:
ip monitor
192.168.2.254 dev wlp1s0 FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 DELAY
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 PROBE
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
192.168.2.254 dev wlp1s0 FAILED
192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 STALE
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
Deleted 2: wlp1s0 inet 192.168.2.9/24 brd 192.168.2.255 scope global dynamic noprefixroute wlp1s0
valid_lft 73443sec preferred_lft 73443sec
Deleted broadcast 192.168.2.255 dev wlp1s0 table local proto kernel scope link src 192.168.2.9
Deleted local 192.168.2.9 dev wlp1s0 table local proto kernel scope host src 192.168.2.9
Deleted 192.168.2.3 dev wlp1s0 lladdr 7c:d9:5c:0e:93:89 STALE
Deleted 239.255.255.250 dev wlp1s0 lladdr 01:00:5e:7f:ff:fa NOARP
Deleted 224.0.0.251 dev wlp1s0 lladdr 01:00:5e:00:00:fb NOARP
Deleted 225.0.0.222 dev wlp1s0 lladdr 01:00:5e:00:00:de NOARP
Deleted 192.168.2.1 dev wlp1s0 lladdr 14:c1:4e:22:e6:4b STALE
Deleted 192.168.2.254 dev wlp1s0 lladdr 44:fb:5a:d6:d7:b9 DELAY
Deleted 224.0.0.2 dev wlp1s0 lladdr 01:00:5e:00:00:02 NOARP
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
fe80::46fb:5aff:fed6:d7b9 dev wlp1s0 router FAILED
Wonder if that's a clue?
Finally, I noticed
nmcli device reapply wlp1s0
instantly fixed my link, without having to disconnect and reconnect first (which would often take a minute). Perhaps I could use that as a workaround (although I hope that won't be necessary).
Last edited by niisse (2022-02-15 11:06:52)
Offline
❯ ping _gateway
PING _gateway(_gateway (fe80::46fb:5aff:fed6:d7b9%wlp1s0)) 56 data bytes
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
..ad infinitum
--- _gateway ping statistics ---
77 packets transmitted, 0 received, 100% packet loss, time 95997ms
❯ iw dev wlp1s0 station dump
Station 44:fb:5a:d6:d7:ba (on wlp1s0)
inactive time: 107607 ms
rx bytes: 601549278
rx packets: 450857
tx bytes: 24867732
tx packets: 127205
tx retries: 0
tx failed: 0
beacon loss: 0
beacon rx: 0
rx drop misc: 1213
signal: -57 [-57, -73] dBm
signal avg: -56 [-56, -73] dBm
beacon signal avg: 0 dBm
tx bitrate: 6.0 MBit/s
tx duration: 51967057 us
rx bitrate: 97.6 MBit/s VHT-MCS 2 80MHz short GI VHT-NSS 1
rx duration: 0 us
authorized: yes
authenticated: yes
associated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 1
beacon interval:100
short slot time:yes
connected time: 7500 seconds
associated at [boottime]: 8.471s
associated at: 1644927971607 ms
current time: 1644935471802 ms
I did run the reapply command before above output, I don't know if that influences things.
Finally, I noticed nmcli device reapply wlp1s0 instantly fixed my link
This time, it did not work.
Offline
The system is probably waiting for ACKs
Pass
ath10k_pci.cryptmode=1
to the kernel and if you've not done so far, disable MAC randomization (if this works, try whether you can re-enable HW encryption)
https://wiki.archlinux.org/title/Kernel_parameters
Offline
Tried both things, but there's no change, unfortunately.
Offline
When this happens and you leave the system alone (don't try to reconnect), does the connection magically revive?
If so and in general make sure that windows fast boot is still disabled (cause MS likes to re-enable it w/ updates) - your dhcp server (router) might silently drop you because it awaits a re-lease because a stale windows lease.
Can you inspect the router from a 3rd system to see whether the failing-but-not-disconnected system is still registered there?
Offline
From what I've seen, the connection doesn't revive. Windows Fast Boot is still off (I generally don't update Windows much, now that they've added an option to restart without updating hehe).
The router still considers the computer connected.
(By the way, I really appreciate you thinking along with me! Thanks man)
Last edited by niisse (2022-02-16 12:05:50)
Offline
Let's see whether the tcp buffer gets jammed from a temporary burst … and whether growing the buffer will help with that.
cat /proc/sys/net/core/wmem_max
echo 2097152 > /proc/sys/net/core/wmem_max
nb. that this will not survive a reboot, you'll need a sysctl rule if it's a viable mitigation.
Offline
So far so good. Haven't noticed any issues anymore, although that might just be coincidence. Will keep an eye on the lookout and report back later.
How would I go about creating a systctl rule for the buffer size?
Offline
/etc/sysctl.d/<number>-<name>.conf w/ "net.core.wmem_max = 2097152" inside (just that line)
What value was it before? 212992?
Offline
It was 212992 before, yeah. I haven't had any issues for the past few days(!), but just as I'm about to head into that monday-morning Teams call I started having problems again - before the call even started! I've restarted my computer and router, I'll see how it holds up now.
Offline
You can grow the buffer to even bigger values, but there's probably something else creeping up in that buffer and now just has more space to grow.
When this happens the next time and you've no pressure, try to only reboot the router.
Offline
Rebooting the router fixes the connection - it disconnects and reconnects once the router is initalized.
Offline
it disconnects and reconnects once the router is initalized
Since afaiu reconnecting would fix the situation anyway, this is unfortunately inconclusive.
Check "netstat -s" and raise the buffer even more, see eg. https://support.oracle.com/knowledge/Or … 140_1.html and https://fasterdata.es.net/host-tuning/l … st-tuning/ recommending really high values…
Last edited by seth (2022-02-22 14:11:58)
Offline
So, I figured out I had a typo in the sysctl rule and it didn't get properly applied on boot. Fixed that, and I haven't had any issues other than that monday morning... I'll take a look at those articles.
Seeing as increasing the buffer size appears to have solved the issue, should I mark this thread as solved?
Again, thank you for all your help!! Really appreciate it!
Offline
I'd keep an occasional look at the netstat statistics on whether you're still hitting buffer overruns.
But if you think the problem is solved, you should mark the thread by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
Offline