You are not logged in.
I've been getting some unusual errors lately where all clients fail to connect to any remote host out of the blue, claiming the host is unreachable.
e.g. when 10.1.0.254 is my gateway
$ ping -n 10.1.0.254
PING 10.1.0.254 (10.1.0.254) 56(84) bytes of data.
From 10.1.0.172 icmp_seq=1 Destination Host Unreachable
From 10.1.0.172 icmp_seq=2 Destination Host Unreachable
[..]
for any client or host. The odd part is that my wireless connection appears fine, ip addr has no change, and even ip route reports routes correctly!
$ ip route get 10.1.0.254
10.1.0.254 dev wlan0 src 10.1.0.172 uid 1000
cache
which has the correct iface and saddr. I'm also not running a firewall of any kind.
It happens once or twice a day with no apparent trigger, if I restart iwd it usually corrects itself, and if I leave it alone for about 10 minutes normal connectivity will be restored. It starts and ends without any additional logging in my journal, so I'm wondering if there are some debugging flags I could set to try and understand why I'm getting EHOSTUNREACH, and why it doesn't affect ip route. Or has anyone seen a similar issue? I'm using iwlwifi+iwd+networkd.
It started sometime in the past month I think. In case it's relevant here's some hardware and package info,
$ lspci -kd::280
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
Subsystem: Intel Corporation Dual Band Wireless-AC 8265
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
$ paclog --package linux --package linux-firmware --package iwd --package systemd | tail
[2021-10-31T18:00:26-0700] [ALPM] upgraded linux (5.14.14.arch1-1 -> 5.14.15.arch1-1)
[2021-11-03T22:17:34-0700] [ALPM] upgraded linux (5.14.15.arch1-1 -> 5.14.16.arch1-1)
[2021-11-04T02:26:54-0700] [ALPM] upgraded iwd (1.18-1 -> 1.19-1)
[2021-11-05T12:17:29-0700] [ALPM] upgraded linux-firmware (20210919.d526e04-1 -> 20211027.1d00989-1)
[2021-11-11T03:11:21-0700] [ALPM] upgraded iwd (1.19-1 -> 1.19-2)
[2021-11-12T11:52:57-0700] [ALPM] upgraded systemd (249.5-3 -> 249.6-3)
[2021-11-12T17:16:12-0700] [ALPM] upgraded linux (5.14.16.arch1-1 -> 5.15.2.arch1-1)
[2021-11-20T10:49:25-0700] [ALPM] upgraded iwd (1.19-2 -> 1.20-1)
[2021-11-21T01:54:03-0700] [ALPM] upgraded linux (5.15.2.arch1-1 -> 5.15.3.arch1-1)
[2021-11-21T14:45:11-0700] [ALPM] upgraded systemd (249.6-3 -> 249.7-1)
Looking at the package list, linux-firmware seems like a possible culprit, but I have no reliable repro, so it's hard to test.
EDIT: Maybe relevant:
$ lspci -kd::280
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
Subsystem: Intel Corporation Dual Band Wireless-AC 8265
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
Last edited by Brocellous (2021-12-04 09:51:37)
Offline
When it fails, check the output of `ip neigh`.
I'm willing to bet that the ARP entry has been lost, and what is really being reported is that there is no L2 route to the host because it's expired due to a lack of ARP replies to requests.
Offline
Good idea, I'll be sure to try it.
Offline
@rsmarples You were right. ip neigh output shows that all the neighbor entries are lost. In fact, here is the output:
$ ip neigh
68.xxx.xxx.xxx dev wlan0 lladdr 40:b0:76:af:15:78 STALE
10.1.0.254 dev wlan0 INCOMPLETE
10.1.0.253 dev wlan0 INCOMPLETE
fe80::42b0:76ff:feaf:1578 dev wlan0 router INCOMPLETE
254 and 253 are my router and dns so those are expected, fe80:: is the ipv6 link local addr of my router so that's fine too.
The 68.xxx.xxx.xxx entry is unexpected. It is the public ipv4 addr of my router, same as is returned by
$ curl --ipv4 ipapi.co/json | jq -r .ip
68.xxx.xxx.xxx
when my connection is working. Not sure why I have a neighbor entry for that.
I tried to see if my laptop was requesting arp info while my connection as bad, and it appears it was
$ tshark arp
1 0.000000000 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.253? Tell 10.1.0.172
2 0.026894775 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.254? Tell 10.1.0.172
3 0.723413432 Grandstr_5f:xx:xx → Broadcast ARP 64 Who has 10.1.0.21? Tell 0.0.0.0
4 1.013461504 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.253? Tell 10.1.0.172
5 1.040195147 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.254? Tell 10.1.0.172
6 2.030595138 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.253? Tell 10.1.0.172
7 2.053669152 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.254? Tell 10.1.0.172
8 3.040061163 IntelCor_cd:xx.xx → Broadcast ARP 42 Who has 10.1.0.253? Tell 10.1.0.172
[...]
I saw a request for 254 and 253 each at a rate of about 1/second, but no response I guess. Not sure what to make of that.
Offline
Today I tested from another machine on the local network while my laptop was having this issue. From there I determined that my router did respond to arping, both broadcast and unicast arp pings.
I also tried to arping my laptop while it was down. Interestingly, it does reliably respond to the broadcast arping requests, but not the unicast ones.
EDIT:
The repetitive arp requests from my laptop are also heard by the other machine in tshark arp. I'm guessing that means my laptop is failing to receive the arp replies somehow.
Last edited by Brocellous (2021-12-04 09:41:18)
Offline