You are not logged in.
Pages: 1
I have a local file server that I manage through OpenSSH.
Sometimes, after a reboot, sshd fails to bind to port 22.
I assume this is a timing issue (that it starts before the network is ready), because it always helps to just restart the service after it fails - but I'm struggling to understand how I can ensure it doesn't happen.
My sshd.service is just the default OpenSSH service definition, with an override I added in accordance with the Wiki, in order to ensure it waits for the network.
I also changed the type to exec because
It is recommended to use Type=exec for long-running services, as it ensures that process setup errors (e.g. errors such as a missing service executable, or missing user) are properly tracked.
I have no idea if proper tracking will increase stability, though ![]()
$ systemctl cat sshd
# /usr/lib/systemd/system/sshd.service
[Unit]
Description=OpenSSH Daemon
Wants=sshdgenkeys.service
After=sshdgenkeys.service
After=network.target
[Service]
ExecStart=/usr/bin/sshd -D
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=always
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/sshd.service.d/override.conf
[Unit]
Wants=network-online.target
After=network-online.target
[Service]
Type=exec
$ I use NetworkManager for networking, and
$ systemctl is-enabled NetworkManager-wait-online.service
enabled
$ But all this does not foolproof the sshd startup; sometimes both addresses bind, sometimes one or the other won't bind, and sometimes neither IP address will bind.
Here's the systemctl status when one bind fails (and I log in on the other):
$ systemctl status sshd
● sshd.service - OpenSSH Daemon
Loaded: loaded (/usr/lib/systemd/system/sshd
.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/sshd.service.d
└─override.conf
Active: active (running) since Sun 2024-01-21 15:19:21 CET; 13s ago
Main PID: 394 (sshd)
Tasks: 1 (limit: 9458)
Memory: 5.2M (peak: 5.9M)
CPU: 139ms
CGroup: /system.slice/sshd.service
└─394 "sshd: /usr/bin/sshd -D [listener] 0 of 10-100 startups"
Jan 21 15:19:21 <servername> systemd[1]: Started OpenSSH Daemon.
Jan 21 15:19:22 <servername> sshd[394]: error: Bind to port 22 on <IP-address2-server> failed: Cannot assign requested address.
Jan 21 15:19:22 <servername> sshd[394]: Server listening on <IP-address1-server> port 22.
Jan 21 15:19:27 <servername> sshd[451]: Accepted publickey for <username> from <IP-address-client> port 38126 ssh2: ED25519 SHA256:<hash>
Jan 21 15:19:27 <servername> sshd[451]: pam_unix(sshd:session): session opened for user <username>(uid=1000) by <username>(uid=0)
$ I hid identifiable data, but server and client uses addresses in 10.0.0.0 with 24 bit subnet masks. One address is tied to a normal NIC and the other is used by zerotier-one - but the failing sshd binding doesn't seem to be occurring more frequent on either.
Just for reference, my zerotier-one unit is not modified and zerotier works just fine otherwise:
$ systemctl cat zerotier-one
# /usr/lib/systemd/system/zerotier-one.service
[Unit]
Description=ZeroTier One
After=network-online.target network.target
Wants=network-online.target
[Service]
ExecStart=/usr/bin/zerotier-one
Restart=always
KillMode=process
[Install]
WantedBy=multi-user.target
$ As sshd doesn't exit, the Restart directive doesn't come into play, but I guess I could use ExecStartPost to check that sshd is listening on the port and fail the service if it isn't?
Is that the way to go, or is there anything else I should do to ensure both sshd binds are successful?
Last edited by Ferdinand (2024-01-29 07:53:51)
Offline
Did you try removing the -s from ExecStart for NetworkManager-wait-online (see: https://wiki.archlinux.org/title/Networ … it-online). Be aware of the "other issues" described with doing that.
Offline
Thank's for the tip - that got me investigating a bit ![]()
It turns out removing -s doesn't cause much actual difference, although it seems a potentially good idea:
-s | --wait-for-startup
Wait for NetworkManager startup to complete, rather than waiting for network connectivity specifically.
What I ended up doing was to override sshd.service with:
[Service]
IgnoreSIGPIPE=false
ExecStartPost=/usr/bin/bash -c '/usr/bin/ss -ltn | /usr/bin/grep -q "<IP-address1-server>:22" && /usr/bin/ss -ltn | /usr/bin/grep -q "<IP-address2-server>:22"'
RestartSec=30sThe ExecStartPost returns a non-zero exit code unless something is listening to port 22 on both <IP-address1-server> and <IP-address2-server>, thereby failing the service, and causes a restart after 30 seconds. That long delay is actually needed; trying with 10 seconds got the service started on the forth restart ![]()
The IgnoreSIGPIPE=false is needed to get the pipes to work. I think it's used to close the pipes at the end, when data runs out (man 7 pipe), and is ignored in systemd execs by default
Edit: Just a further improvement; I allow 280 seconds to make 25 attempts at 10 second intervals, and reboot it none of those succeeds.
It will go on and on until it either succeeds, or I return from never-never land ![]()
[Unit]
Wants=network-online.target
After=network-online.target
StartLimitIntervalSec=280
StartLimitBurst=25
StartLimitAction=reboot
[Service]
Type=exec
IgnoreSIGPIPE=false
ExecStartPost=/usr/bin/bash -c '/usr/bin/ss -ltn | /usr/bin/grep -q "<IP-address1-server>:22" && /usr/bin/ss -ltn | /usr/bin/grep -q "<IP-address2-server>:22"'
RestartSec=10sLast edited by Ferdinand (2024-01-29 16:23:00)
Offline
Pages: 1