You are not logged in.

#1 2024-03-06 01:16:57

dakota
Member
Registered: 2016-05-20
Posts: 415

[SOLVED] Wireguard Client Loses Connection to Server

Periodically, wireguard peers stop communicating with my wireguard server.

EDIT - I think this might be an ISP issue, but I'm not sure (see next post).
EDIT 2 - I'm pretty sure this was a cellular modem issuse.
EDIT3 - Updating the cellular modem firmware fixed the problem. There have been *no* sustained outages for over 3 months.

For testing purposes

  • both peers are located in my office

  • both peers are connected to the same office router

  • the office router is connected to a cellular modem that connects to my ISP

  • the ISP connects to my cloud-based VPS (which is where the wireguard server is located)

Symptoms

  • they do *not* both fail at the same time

  • when a peer fails, it can still ping:  _gateway, 9.9.9.9, and archlinux.org

  • "ip a" and "ip r" (see below) look the same whether the peer is communicating or not

  • tcpdump shows data leaving the peer

  • tcpdump shows the same data *not* arriving at the server

  • occasionally, a peer will reconnect and start communicating again, but mostly not

Firewall

  • the VPS is configured with an external* firewall which

  • ... drops all connections except those specifically permitted

  • ... allows all incoming IPv4, UDP connections, on the specified port

  • ... allows *some* incoming TCP subnets (for ssh connections)

*external, meaning "cloud-based", before the network traffic reaches the VPS (where there is an inetrnal firewall)

Actions

Beyond this, nothing makes sense. All of the following have worked at least once, but none reliably (and none of them *should* work).

  • taking wg0 down and bringing back up

  • disabling the firewall, then enabling it again

  • enabling PersistentKeepalive (at 25 sec)

  • adding a script on the peer to ping the server every 30 sec, using the wireguard interface (inside the tunnel)

  • sometimes none of these things work separately and I have to use them serially... or in parallel

Additional Actions

  • added a PostUp condition to the [Interface] section of the peer .conf

  • added PersistentKeepalive to the [Peer] section of the peer .conf

  • reconigured the cellular modem to act as a pass-through, so all decision-making is on the router

It does not make any difference whether wireguard is started manually, using wg-quick, or as a systemd unit. Either way, I'm using wg-quick on the peers and the server.

wg-quick
# wg-quick up wg0
# wg-quick down wg0

systemd
# systemctl enable --now wg-quick@wg0
# systemctl stop wg-quick@wg0

Wireguard config

Server
--------------------
[Interface]
# <serverName>
Address = 10.10.64.1/24
ListenPort = <portNum>
PrivateKey = <server_privateKey>
PostUp = iptables -A FORWARD -i %i -j ACCEPT; iptables -A FORWARD -o %i -j ACCEPT; iptables -t nat -A POSTROUTING -o wg0 -j MASQUERADE
PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -D FORWARD -o %i -j ACCEPT; iptables -t nat -D POSTROUTING -o wg0 -j MASQUERADE

[Peer]
# <peer1_name>
PublicKey = <peer1_publicKey>
AllowedIPs = 10.10.64.2/32

[Peer]
# <peer2_name>
PublicKey = <peer2_publicKey>
AllowedIPs = 10.10.64.3/32

[Peer]
# <peer3_name>
PublicKey = <peer3_publicKey>
AllowedIPs = 10.10.64.4/32

[Peer]
# <peer4_name>
PublicKey = <peer4_publicKey>
AllowedIPs = 10.10.64.5/32
Peer (typical)
--------------------
[Interface]
# <peer1_name>
Address = 10.10.64.<nodeNum>/24
ListenPort = <portNum>
PrivateKey = <peer1_privateKey>
PostUp = ip route add <server_public_ipAddress> via 10.10.32.1 dev eno1

[Peer]
# <server_name>
PublicKey = <server_publicKey>
AllowedIPs = 10.10.64.1/32, 10.10.64.2/32, 10.10.64.3/32, 10.10.64.4/32, 10.10.64.5/32
Endpoint = <server_public_ipAddress>:<portNum>
PersistentKeepalive = 25
ip r (typical)
--------------------
default via 10.10.32.1 dev eno1 proto dhcp src 10.10.32.64 metric 1002
10.10.32.0/24 dev eno1 proto dhcp scope link src 10.10.32.64 metric 1002
10.10.64.0/24 dev wg0 proto kernel scope link src 10.10.64.3
<server_public_ipAddress> via 10.10.32.1 dev eno1

Cheers,

Last edited by dakota (2024-11-01 03:13:18)


"Before Enlightenment chop wood, carry water. After Enlightenment chop wood, carry water." -- Zen proverb

Offline

#2 2024-05-07 17:54:31

dakota
Member
Registered: 2016-05-20
Posts: 415

Re: [SOLVED] Wireguard Client Loses Connection to Server

Two things bother me:

  • what breaks the wireguard connection in the first place?

  • why doesn't the wireguard connection re-establish itself when the primary cause is resolved?

Breakage

I have seen evidence of the following:

  • cellular radio crash and subsequent reboot (determined from looking at the radio logs)

  • poor cell signal (a known problem)

  • ISP lease change (but not yet updated on the router, w/ remaining lease time = 4 hr)*

Edit 1 - I have now seen multiple cases of ISP lease change that do not cause wg breakage, so I doubt this is the problem.
Edit 2 - cellular radio crash might have been the root cause of network failure (see next post), but poor cell signal is still  a possibility.

Reconnection

Edit 3 - I've gone down a lot of rabbit holes in the last 6 months and followed a lot of dead ends.

My gut feeling is that this is a modem or ISP problem, compounded by a bad wireguard configuration. I discovered that if a route was already defined when wg0 was restarted, then wireguard would fail. I added the following to the Peer [Interface] section of the wg0.conf file.

PostDown = ip route delete <server_public_ipAddress> via 10.10.32.1 dev eno1

Hopefully this will keep that from happening again.

Edit 4 - Also, added a Restart option to the wg-quick@.service file located at /usr/lib/systemd/system/wg-quick@.service

RestartSec=2min
Restart=on-failure

Questions

Q. Is it possible that the root cause of what I'm seeing is an ISP disconnect (either though the cellular modem crashing and rebooting, or the ISP lease expiring), and that reconnection fails because the IP address of the peer has changed and is being blocked by the firewall?
A. Yes and No. I believe the root cause is a cellular modem/ISP disconnect and that reconnection happens automatically (unless there are lingering modem/ISP problems).

Q. Does wireguard use TCP for the *initial* connection and then UDP for subsequent data exchange? (Why else would the TCP firewall rules matter?) And if this is true, why doesn't it affect both peers equally?
A. No. The initial connection is through UDP. I think that reconnection after changing the firewall rules was a coincidence.

Cheers,

Last edited by dakota (2024-08-18 22:13:46)


"Before Enlightenment chop wood, carry water. After Enlightenment chop wood, carry water." -- Zen proverb

Offline

Board footer

Powered by FluxBB