You are not logged in.
Both wifi and ethernet ceases to work in random instances of returning after sleep and I have to reboot to work again.
Here's the output of "systemctl status NetworkManager"
● NetworkManager.service - Network Manager
Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled; preset: disabled)
Active: active (running) since Thu 2024-03-07 10:18:43 -03; 1 day 5h ago
Docs: man:NetworkManager(8)
Main PID: 444 (NetworkManager)
Tasks: 4 (limit: 9160)
Memory: 38.2M (peak: 39.5M)
CPU: 2min 196ms
CGroup: /system.slice/NetworkManager.service
└─444 /usr/bin/NetworkManager --no-daemon
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3989] device (wlo1): state change: deactivating -> disconnected (reason 'sleeping', sys-iface-state: 'managed')
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3992] dhcp4 (wlo1): canceled DHCP transaction
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3992] dhcp4 (wlo1): activation: beginning transaction (timeout in 45 seconds)
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3992] dhcp4 (wlo1): state changed no lease
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3993] dhcp6 (wlo1): canceled DHCP transaction
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3993] dhcp6 (wlo1): activation: beginning transaction (timeout in 45 seconds)
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.3993] dhcp6 (wlo1): state changed no lease
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.4146] device (wlo1): set-hw-addr: set MAC address to xx:xx:xx:xx:xx:xx (scanning)
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.4361] device (wlo1): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Mar 08 05:58:16 host NetworkManager[444]: <info> [1709888296.5643] device (wlo1): set-hw-addr: reset MAC address to xx:xx:xx:xx:xx:xx (unmanage)
Thanks.
Offline
The NM status isn't really useful, please post your complete system journal for the boot covering the incident, eg
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st
for the previous one.
Offline
Offline
Mar 07 10:18:51 host NetworkManager[444]: <info> [1709817531.3514] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 07 12:06:14 host systemd[1]: Reached target Sleep.
Mar 07 14:03:25 host systemd[1]: Stopped target Sleep.
Mar 07 14:03:32 host NetworkManager[444]: <info> [1709831012.7394] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 07 17:19:30 host systemd[1]: Reached target Sleep.
Mar 07 18:23:33 host systemd[1]: Stopped target Sleep.
Mar 07 18:23:41 host NetworkManager[444]: <info> [1709846621.9889] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 07 18:55:49 host systemd[1]: Reached target Sleep.
Mar 07 19:53:26 host systemd[1]: Stopped target Sleep.
Mar 07 19:53:31 host NetworkManager[444]: <info> [1709852011.9353] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 07 20:55:21 host systemd[1]: Reached target Sleep.
Mar 08 04:58:29 host systemd[1]: Stopped target Sleep.
Mar 08 04:58:36 host NetworkManager[444]: <info> [1709884716.3021] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 08 05:58:16 host systemd[1]: Reached target Sleep.
Mar 08 15:23:39 host systemd[1]: Stopped target Sleep.wlo1
The only case where you don't reach a global connection is at 15:23:39, correct?
(You afterward get a bunch of network failures and ultimately shut down)
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Killing process 448 (systemd-logind) with signal SIGABRT.
Mar 08 15:23:39 host kernel: Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
Mar 08 15:23:39 host kernel: Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
…
Mar 08 15:23:39 host systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Mar 08 15:23:39 host systemd[1]: systemd-journald.service: Main process exited, code=killed, status=6/ABRT
Mar 08 15:23:39 host systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
Mar 08 15:23:39 host systemd[1]: systemd-journald.service: Consumed 1.284s CPU time.
Mar 08 15:23:39 host systemd[1]: systemd-machined.service: Main process exited, code=killed, status=6/ABRT
Mar 08 15:23:39 host systemd[1]: systemd-machined.service: Failed with result 'watchdog'.
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Main process exited, code=killed, status=6/ABRT
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Mar 08 15:23:39 host systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
So the problem looks like
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Killing process 448 (systemd-logind) with signal SIGABRT.
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Main process exited, code=killed, status=6/ABRT
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
Mar 08 15:23:39 host systemd-logind[1166468]: New seat seat0.
Does a re-login (in doubt restart the DM) instead of a reboot suffice?
Does restarting NM?
As for the cause, check the logind service status and https://wiki.archlinux.org/title/Core_d … _core_dump
Wild guess:
Mar 07 10:20:11 host systemd[1]: Dependency failed for /HDD.
Mar 07 10:21:00 host CROND[6207]: (user) CMDOUT (/home/user/.local/bin/custom/sync-home: line 2: mountpoint -q /HDD: No such file or directory)
Disable that?
Offline
Wild guess:
Mar 07 10:20:11 host systemd[1]: Dependency failed for /HDD. Mar 07 10:21:00 host CROND[6207]: (user) CMDOUT (/home/user/.local/bin/custom/sync-home: line 2: mountpoint -q /HDD: No such file or directory)
Disable that?
"/home/user/.local/bin/custom/sync-home" is a script that executes "rsync -av --delete /home/user/ /HDD/sync-home/" whenever "/HDD" is a mount point. I defined a cronjob to execute this file every 1 hour.
In the last weeks I didn't have "/HDD" as a mount point, but it's still an existing directory, so "mountpoint -q /HDD: No such file or directory" is strange.
By disabling it, you mean removing this cronjob?
Offline
Yes.
Also systemd still tries to mount /HDD, so there's probably an fstab entry.
Otherwise, do the coredumps reveal why logind aborts?
There were a bunch of threads about weird time shifts (rtc and/or system) after suspends, did/can you notice anything like that?
("sudo hwclock" and "date" and a clock on your wall)
Offline
Also systemd still tries to mount /HDD, so there's probably an fstab entry.
I removed both fstab's entry and the cronjob now. I'll wait to see whether the problem returns.
There were a bunch of threads about weird time shifts (rtc and/or system) after suspends, did/can you notice anything like that?
Never noticed.
I have an external HDD for backup, I was using the fstab and cronjob+script for this purpose, you know of an alternative way to do this?
Offline
udev rule that triggers the backup when you plug the HDD? (maybe after asking back first)
Offline
My post turned from "bug in internet network" to something quite different.
After what you said I tried messing with udev, but it wasn't working right and I'd be missing some consistency due to my lack of knowledge; I decided to do the following:
AUTO-MOUNT PART
1-Install
pacman -S udisks2 udiskie
2-Create the file "/etc/systemd/user/udiskie.service" with the following content:
[Unit]
Description=Handle automounting
[Service]
Type=simple
ExecStart=/usr/bin/udiskie
[Install]
WantedBy=default.target
3 - Do
systemctl --user enable udiskie
The devices you plug in the usb still will then all be mounted automatically
4 - I have two specific scripts (which I don't know the original source) that I had to edit to work with udiskie, make sure to have "dmenu" and "simple-mtpfs" (if you plan to mount android devices):
4.1 - mounter: Open "dmenu" to choose available devices to mount
#!/bin/bash
# Mounts Android Phones and USB drives (encrypted or not). This script will
# replace the older `dmenumount` which had extra steps and couldn't handle
# encrypted drives.
# TODO: Try decrypt for drives in crtypttab
# TODO: Add some support for connecting iPhones (although they are annoying).
IFS='
'
# Function for escaping cell-phone names.
escape(){ echo "$@" | iconv -cf UTF-8 -t ASCII//TRANSLIT | tr -d '[:punct:]' | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | sed "s/-\+/-/g;s/\(^-\|-\$\)//g" ;}
# Check for phones.
phones="$(simple-mtpfs -l 2>/dev/null | sed "s/^/?/")"
mountedphones="$(grep "simple-mtpfs" /etc/mtab)"
# If there are already mounted phones, remove them from the list of mountables.
[ -n "$mountedphones" ] && phones="$(for phone in $phones; do
for mounted in $mountedphones; do
escphone="$(escape "$phone")"
[[ "$mounted" =~ "$escphone" ]] && break 1
done && continue 1
echo "$phone"
done)"
# Check for drives.
lsblkoutput="$(lsblk -rpo "uuid,name,type,size,label,mountpoint,fstype")"
# Get all LUKS drives
allluks="$(echo "$lsblkoutput" | grep crypto_LUKS)"
# Get a list of the LUKS drive UUIDs already decrypted.
decrypted="$(find /dev/disk/by-id/dm-uuid-CRYPT-LUKS2-* | sed "s|.*LUKS2-||;s|-.*||")"
# Functioning for formatting drives correctly for dmenu:
filter() { sed "s/ /:/g" | awk -F':' '$7==""{printf "%s%s (%s) → %s\n",$1,$3,$5,$6}' | sed 's/\\x20/ /g' ; }
# Get only LUKS drives that are not decrypted.
unopenedluks="$(for drive in $allluks; do
uuid="${drive%% *}"
uuid="${uuid//-}" # This is a bashism.
[ -n "$decrypted" ] && for open in $decrypted; do
[ "$uuid" = "$open" ] && break 1
done && continue 1
echo "? $drive"
done | filter)"
# Get all normal, non-encrypted or decrypted partitions that are not mounted.
normalparts="$(echo "$lsblkoutput"| grep -v crypto_LUKS | grep 'part\|rom\|crypt' | sed "s/^/? /" | filter )"
# Add all to one variable. If no mountable drives found, exit.
alldrives="$(echo "$phones
$unopenedluks
$normalparts" | sed "/^$/d;s/ *$//")"
# Quit the script if a sequential command fails.
set -e
test -n "$alldrives"
# Feed all found drives to dmenu and get user choice.
chosen="$(echo "$alldrives" | dmenu -p "Mount which drive?" -i)"
# Function for prompting user for a mountpoint.
getmount(){
mp="$(find /media/ -type d 2>/dev/null | dmenu -i -p "Mount this drive where?")"
test -n "$mp"
if [ ! -d "$mp" ]; then
mkdiryn=$(printf "No\\nYes" | dmenu -i -p "$mp does not exist. Create it?")
[ "$mkdiryn" = "Yes" ] && (mkdir -p "$mp" || sudo -A mkdir -p "$mp")
fi
}
attemptmount(){
# Attempt to mount without a mountpoint, to see if drive is in fstab.
sudo -A udiskie-mount "$chosen" || return 1
# notify-send "?Drive Mounted." "$chosen mounted."
exit
}
case "$chosen" in
?*)
chosen="${chosen%% *}"
chosen="${chosen:1}" # This is a bashism.
attemptmount || getmount
sudo -A udiskie-mount "$chosen"
# sudo -A mount "$chosen" "$mp" -o uid="$(id -u)",gid="$(id -g)"
# notify-send "?Drive Mounted." "$chosen mounted to $mp."
;;
?*)
chosen="${chosen%% *}"
chosen="${chosen:1}" # This is a bashism.
# Number the drive.
while true; do
[ -f "/dev/mapper/usb$num" ] || break
num="$(printf "%02d" "$((num +1))")"
done
# Decrypt in a terminal window
${TERMINAL:-st} -n floatterm -g 60x1 -e sudo cryptsetup open "$chosen" "usb$num"
# Check if now decrypted.
test -b "/dev/mapper/usb$num"
attemptmount || getmount
sudo -A udiskie-mount "$chosen"
# sudo -A mount "/dev/mapper/usb$num" "$mp" -o uid="$(id -u)",gid="$(id -g)"
# notify-send "?Decrypted drive Mounted." "$chosen decrypted and mounted to $mp."
;;
?*)
notify-send "❗Note" "Remember to allow file access on your phone now."
getmount
number="${chosen%%:*}"
number="${chosen:1}" # This is a bashism.
sudo -A simple-mtpfs -o allow_other -o fsname="simple-mtpfs-$(escape "$chosen")" --device "$number" "$mp"
# notify-send "? Android Mounted." "Android device mounted to $mp."
notify-send "Android Mounted." "Android device mounted to $mp."
;;
esac
4.2 - umounter: Open "dmenu" to choose available devices to umount
#!/bin/sh
# Unmount USB drives or Android phones. Replaces the older `dmenuumount`. Fewer
# prompt and also de-decrypts LUKS drives that are unmounted.
set -e
mounteddroids="$(grep simple-mtpfs /etc/mtab | awk '{print "" $2}')"
lsblkoutput="$(lsblk -nrpo "name,type,size,mountpoint")"
mounteddrives="$(echo "$lsblkoutput" | awk '($2=="part"||$2="crypt")&&$4!~/\/boot|\/var\/log|\/home$|SWAP/&&length($4)>1{printf "%s (%s) → %s\n",$1,$3,$4}' | sed 's/\\x20/ /g')"
allunmountable="$(echo "$mounteddroids
$mounteddrives" | sed "/^$/d;s/ *$//")"
test -n "$allunmountable"
chosen="$(echo "$allunmountable" | dmenu -i -p "Unmount which drive?")"
chosen="${chosen%% *}"
test -n "$chosen"
abc="$(echo "$chosen" | sed 's/://')"
sudo -A udiskie-umount "$abc"
# notify-send "Device unmounted." "$chosen has been unmounted."
# Close the chosen drive if decrypted.
cryptid="$(echo "$lsblkoutput" | grep "/${chosen#*/}$")"
cryptid="${cryptid%% *}"
test -b /dev/mapper/"${cryptid##*/}"
sudo -A cryptsetup close "$cryptid"
# notify-send "?Device dencryption closed." "Drive is now securely locked again."
5 - I went to "/etc/fstab" and wrote a line:
UUID=<my-device-uuid> /HDD/ btrfs defaults,noauto 0 0
The "noauto" avoid my device to be mounted automatically on boot, and specifying the /HDD/ directory makes it getting mount in that directory by default, while the other mounts get mounted in a distinct directory.
6 - create the file "/etc/udev/rules.d/99-udisks2.rules" with the following content:
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{UDISKS_FILESYSTEM_SHARED}="1"
This is to make the devices not specified in "/etc/fstab" to get all mounted in "/media/" rather than "/run/media/<your-user>"
BACK-UP PART
1 - I judge as an advice: you can set two distinct backup schedules at the same time:
1.1 - For the whole system in general, which will be large.
1.2 - For the things that considerably changes more frequently than others (in my case, it's less than 130mb), to avoid losing progress in what you work on for any unusual reason.
I have two backup schedules: one for the whole system in general, that runs each 2 days; and another for all the non-directories and selected directories of my home directory, that runs each 15min.
2 - Install
pacman -S restic
3 - You can then follow https://wiki.archlinux.org/title/Restic#Scheduling; unfortunately, I don't have much time to make a nice guide of everything here.
Offline
It has returned, now I see that I have to manually restart NetworkManager.service for it to work again.
Also, the custom "udiskie" service I have also stops working and I have to restart it manually for it to work again.
The udiskie service is the following:
[Unit]
Description=Handle automounting
[Service]
Type=simple
ExecStart=/usr/bin/udiskie
[Install]
WantedBy=default.target
Offline
The ultimate problem is going to remain
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Killing process 448 (systemd-logind) with signal SIGABRT.
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Main process exited, code=killed, status=6/ABRT
Mar 08 15:23:39 host systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
(shows up in your recent logs?)
The hope was that this was caused by the "sync-home" cronjob, but apparently it's not.
Maybe the (now cleaner) log can hint at the abort.
https://wiki.archlinux.org/title/Core_d … _core_dump
…
Otherwise, do the coredumps reveal why logind aborts?
Offline
shows up in your recent logs?
Mar 17 11:27:45 host systemd[1]: systemd-logind.service: Main process exited, code=killed, status=6/ABRT
Mar 17 11:27:45 host systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Mar 17 11:27:45 host systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
Mar 17 11:27:45 host systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Mar 17 11:27:45 host systemd[1]: systemd-logind.service: Killing process 459 (systemd-logind) with signal SIGABRT.
Otherwise, do the coredumps reveal why logind aborts?
Although the coredumps show the timestamp as being 1 week 3 days ago (maybe it didn't update yet?).
[user@host ~]$ sudo coredumpctl debug /usr/lib/systemd/systemd-logind
PID: 460 (systemd-logind)
UID: 0 (root)
GID: 0 (root)
Signal: 6 (ABRT)
Timestamp: Thu 2024-03-07 10:02:36 -03 (1 week 3 days ago)
Command Line: /usr/lib/systemd/systemd-logind
Executable: /usr/lib/systemd/systemd-logind
Control Group: /system.slice/systemd-logind.service
Unit: systemd-logind.service
Slice: system.slice
Boot ID: 0d0e498251454a928298c0bf58c10403
Machine ID: 76c6c4e0767844ac82fc8eead6d3f2e5
Hostname: host
Storage: /var/lib/systemd/coredump/core.systemd-logind.0.0d0e498251454a928298c0bf58c10403.460.1709816556000000.zst (present)
Size on Disk: 214.7K
Message: Process 460 (systemd-logind) of user 0 dumped core.
Stack trace of thread 460:
#0 0x0000735a28b26e27 epoll_wait (libc.so.6 + 0x108e27)
#1 0x0000735a28ed0162 sd_event_wait (libsystemd-shared-255.4-2.so + 0x2d0162)
#2 0x0000735a28ed18dc sd_event_run (libsystemd-shared-255.4-2.so + 0x2d18dc)
#3 0x00005f908f034d6c n/a (systemd-logind + 0xad6c)
#4 0x0000735a28a43cd0 n/a (libc.so.6 + 0x25cd0)
#5 0x0000735a28a43d8a __libc_start_main (libc.so.6 + 0x25d8a)
#6 0x00005f908f035ae5 n/a (systemd-logind + 0xbae5)
ELF object binary architecture: AMD x86-64
GNU gdb (GDB) 14.2
[...]
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/systemd/systemd-logind...
(No debugging symbols found in /usr/lib/systemd/systemd-logind)
warning: Can't open file /usr/lib/liblzma.so.5.6.0 during file-backed mapping note processing
warning: Can't open file /usr/lib/libkmod.so.2.4.1 during file-backed mapping note processing
[New LWP 460]
warning: .dynamic section for "/usr/lib/libkmod.so.2" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib/libaudit.so.1" is not at the expected address (wrong library or version mismatch?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/usr/lib/systemd/systemd-logind'.
Program terminated with signal SIGABRT, Aborted.
#0 0x0000735a28b26e27 in epoll_wait () from /usr/lib/libc.so.6
(gdb) bt
#0 0x0000735a28b26e27 in epoll_wait () from /usr/lib/libc.so.6
#1 0x0000735a28ed0162 in sd_event_wait () from /usr/lib/systemd/libsystemd-shared-255.4-2.so
#2 0x0000735a28ed18dc in sd_event_run () from /usr/lib/systemd/libsystemd-shared-255.4-2.so
#3 0x00005f908f034d6c in ?? ()
#4 0x0000735a28a43cd0 in ?? () from /usr/lib/libc.so.6
#5 0x0000735a28a43d8a in __libc_start_main () from /usr/lib/libc.so.6
#6 0x00005f908f035ae5 in ?? ()
Last edited by user11 (2024-03-17 17:25:08)
Offline
https://github.com/systemd/systemd/issues/23032 (semi-related)
What happens if you edit the systemd-logind.service and simply disable that ("WatchdogSec=0")?
a) does logind still abort (hopefully not)
b) your network survives the S3?
c) other symptoms that might hint at the cause of the watchdog timeout?
Offline
What happens if you edit the systemd-logind.service and simply disable that ("WatchdogSec=0")?
Nothing different is happening, the logind didn't abort, so the network still lives; I would need to wait to see if it aborts in the next days or if the internet dies.
How should I proceed now?
Offline
Wait and see - the watchdog is either some false positive (I'm still betting on sth. related to the system time) or something™ else will flare up (because something would have to stall it for 3+ minutes)
If not and if you really can't spot any system time issue (which might be transient and very short only, though) you'd have to file a bug against systemd.
Even iff you don't have a reliable RTC, the watchdogs probably should™ account for that and stay down for a moment after an S3 to give the system a chance to sync the time.
Speaking of which, you could try to S3 w/o any network access and see whether that results in a time shift.
Offline
Something distinct happened today.
I used "WatchdogSec=0" for about 1 week 2 days, nothing out of the ordinary happened.
I decided to switch it back to the default value (remove the overlaping file in "/etc/systemd/system/")
Today the following happened:
1 - After opening my lid, I realized my bluetooth mouse didn't work.
2 - "bluetoothctl" is giving me "No default controller available".
3 - Checked "systemctl status bluetooth", but nothing seems to indicates the problem.
4 - After "systemctl restart bluetooth", my mouse worked again (thus bluetooth worked again).
5 - Here is my journalctl: https://0x0.st/XskJ.txt
I set "WatchdogSec=0" again to see if anything of the sort happens.
you could try to S3 w/o any network access and see whether that results in a time shift
I'll test it now.
edit: I disabled network connection, put the laptop to sleep, waited till S3, resumed in, used "sudo hwclock" and "date", no time shift.
Last edited by user11 (2024-03-28 19:43:50)
Offline
Offline