You are not logged in.
Hey,
I'm having a problem with my computer where when it resumes from suspend or hibernate, it freezes for 30 seconds, and then another 60 seconds before continuing.
pam_unix(runuser-l:session) opens a session at 22:40:56 and exactly 30 seconds late, at 22:41:26, we see a timeout: "pam_systemd(runuser-l:session): Failed to create session: Connection timed out". Exactly 60 seconds after that, at 22:42:26, the other services start.
Given the exact delays here, this looks like a series of successive timeouts that unfortunately add a minute and a half each time I open my lid, making using the computer quite inconvenient.
I've filed a ticket on systemd's Github (link: https://github.com/systemd/systemd/issues/33020) that references spurious occurrences of this going back to about 2014. I also see https://bbs.archlinux.org/viewtopic.php?id=205452 and https://bbs.archlinux.org/viewtopic.php?id=223930 here, but those threads are quite old. That said, these and the discussions on Github suggest a bad user service, combined with a systemd bug. This has been patched before (link: https://github.com/systemd/systemd/issues/2863) but maybe it's here again, or perhaps this is a different bug.
Jun 10 22:40:56 <REDACTED> kernel: PM: suspend exit
Jun 10 22:40:56 <REDACTED> rtkit-daemon[816]: The canary thread is apparently starving. Taking action.
Jun 10 22:40:56 <REDACTED> systemd-logind[620]: Lid opened.
Jun 10 22:40:56 <REDACTED> systemd-sleep[56916]: logname: no login name
Jun 10 22:40:56 <REDACTED> rtkit-daemon[816]: Demoting known real-time threads.
Jun 10 22:40:56 <REDACTED> systemd-sleep[56808]: System returned from sleep operation 'suspend-then-hibernate'.
Jun 10 22:40:56 <REDACTED> rtkit-daemon[816]: Successfully demoted thread 41894 of process 41879.
Jun 10 22:40:56 <REDACTED> rtkit-daemon[816]: Demoted 1 threads.
Jun 10 22:40:56 <REDACTED> bluetoothd[618]: Controller resume with wake event 0x0
Jun 10 22:40:56 <REDACTED> runuser[56917]: pam_unix(runuser-l:session): session opened for user root(uid=0) by root(uid=0)
Jun 10 22:41:00 <REDACTED> kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: ACK failed (-110)
Jun 10 22:41:26 <REDACTED> runuser[56812]: pam_systemd(runuser-l:session): Failed to create session: Connection timed out
Jun 10 22:41:26 <REDACTED> runuser[56812]: pam_unix(runuser-l:session): session closed for user root
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) shutting down
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: listenerSupervisor@dynamic+https://relays.syncthing.net/endpoint: service dynamic+https://relays.syncthing.net/endpoint failed: Get "[url]https://relays.syncthing.net/endpoint[/url]": dial tcp: lookup relays.syncthing.net: Temporary failure in name resolution
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) shutting down
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: listenerSupervisor@dynamic+https://relays.syncthing.net/endpoint: service dynamic+https://relays.syncthing.net/endpoint failed: Get "[url]https://relays.syncthing.net/endpoint[/url]": dial tcp: lookup relays.syncthing.net: Temporary failure in name resolution
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) shutting down
Jun 10 22:42:26 <REDACTED> syncthing[790]: [SXTJO] INFO: listenerSupervisor@dynamic+https://relays.syncthing.net/endpoint: service dynamic+https://relays.syncthing.net/endpoint failed: Get "[url]https://relays.syncthing.net/endpoint[/url]": dial tcp: lookup relays.syncthing.net: Temporary failure in name resolution
Jun 10 22:42:26 <REDACTED> systemd[1]: systemd-suspend-then-hibernate.service: Deactivated successfully.
Jun 10 22:42:26 <REDACTED> systemd[1]: Finished System Suspend then Hibernate.
Jun 10 22:42:26 <REDACTED> systemd[1]: Stopped target Sleep.
Jun 10 22:42:26 <REDACTED> systemd[1]: Reached target Suspend; Hibernate if not used for a period of time.
Jun 10 22:42:26 <REDACTED> systemd-logind[620]: Operation 'suspend-then-hibernate' finished.
Last edited by nroth (2024-06-15 02:27:38)
Offline
Please use [code][/code] tags. Edit your post in this regard.
While at it, also remove the redundant plea for help.
Jun 10 22:41:00 <REDACTED> kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: ACK failed (-110)
That's likely the USB host on your nvidia GPU?
https://download.nvidia.com/XFree86/Lin … sAndW6426e
Itr, the runuser is likely cause by some sleep hook, is it the nvidia VRAM preservation one?
Online
Please use tags....
Updated formatting and removed "help me!!!!" messages.
kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: ACK failed (-110)
That's likely the USB host on your nvidia GPU?
This is a Framework 16 laptop, which is fully working according to the wiki (link: https://wiki.archlinux.org/title/Framework_Laptop_16) and the GPU is an AMD RX 7700S. I don't have any NVIDIA drivers or hardware installed. That's an interesting clue though. The GPU has a USB-C port on the back to support direct video output to external devices. Reading the docs, this also doesn't appear to be explicitly NVIDIA-specific:
As of this writing, The USB xHCI Host controller and USB Type-C UCSI controller drivers present in most Linux distributions do not fully support runtime power management. Several upstream kernel changes are being done to fix the issues. In the interim, these two PCI functions have to be disabled for this feature to work properly. This can be done using the following command.
echo 1 > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.2/remove echo 1 > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.3/remove
For example:
echo 1 > /sys/bus/pci/devices/0000\:01\:00.2/remove echo 1 > /sys/bus/pci/devices/0000\:01\:00.3/remove
It would probably be best for me to fix this anyway, but it's unclear why this would cause a hang. Also, these docs are quite old-- from August 2019. A more recent version of this document at https://download.nvidia.com/XFree86/Lin … ement.html adds:
Full support for runtime power management in these drivers is available in kernel version 5.5. For kernel versions before 5.5, these two PCI functions have to be disabled for this feature to work properly.
Since I'm running 6.9.3 as of today, this is unlikely to resolve the error, though it may be worth a try if the driver for the controller in the Framework 16 does not support this properly.
Itr, the runuser is likely cause by some sleep hook, is it the nvidia VRAM preservation one?
I doubt an NVIDIA sleep hook would be affecting my system. However, I do see "fw-fanctrl-suspend" in /usr/lib/systemd/system-sleep. It looks like a quick fix for this might possibly be to remove the sleep hook that turns off the fans:
#!/bin/sh
case $1 in
pre) runuser -l $(logname) -c "fw-fanctrl sleep" ;;
post) runuser -l $(logname) -c "fw-fanctrl defaultStrategy" ;;
esac
Even if this works, I'd still be interested in the root cause, and why this bug has been regressing on and off in systemd for the last 10 years.
Last edited by nroth (2024-06-13 01:05:28)
Offline
EDIT: This works!!! Thank you so much-- this was such a pain in my neck.
I've also updated the systemd bug to include context that removing sleep hooks solves the issue. Any idea if this even is a systemd bug, and where to file if not? It looks like https://github.com/util-linux/util-linux owns runuser, but
pam_systemd
appears to be what's failing.
I've also updated the archwiki page for the laptop that recommends the userspace fan control AUR package that installed this, and on the AUR package itself.
Last edited by nroth (2024-06-13 01:11:22)
Offline
The upstream hook looks somewhat different?
https://github.com/TamtamHero/fw-fanctr … rl-suspend
It was subject to multiple changes in May, https://github.com/TamtamHero/fw-fanctr … rl-suspend
https://github.com/TamtamHero/fw-fanctr … 017730da89
And idk know what $(logname)is supposed to be or why runuser was used at all - the hooks run as root.
Online
This fixes the issue-- but I'm still not sure if pam+runuser work correctly... $LOGNAME is the username of the current user. So `runuser -l "$LOGNAME" -c "fw-fanctrl --pause"` should expand to `runuser -l "nroth" -c "fw-fanctrl --pause"` in my case, which looks valid per https://www.man7.org/linux/man-pages/ma … ser.1.html
Updated the wiki, AUR packaging bug, and AUR package comment.
Last edited by nroth (2024-06-14 01:26:46)
Offline
$LOGNAME is the username of the current user.
"in the context of the sleep hook"?
Can you in an interactive shell
sudo -i
runuser -l $LOGNAME -c whoami
And what is
echo $LOGNAME > /tmp/sleeping.user
in that sleep hook script?
Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
Online
Ok, good point-- it's root, but it does work. I'm not saying that runuser is necessary there-- just that it doesn't seem like it should fail. And in the interactive shell, as you can see, it works. Just apparently not in a systemd sleep hook.
[REDACTED@REDACTED ~]$ sudo -i
Place your right index finger on the fingerprint reader
[root@REDACTED ~]# runuser -l $LOGNAME -c whoami
root
And what is
echo $LOGNAME > /tmp/sleeping.user
in that sleep hook script?
$ cat /usr/lib/systemd/system-sleep/broken-fanctrl-suspend
#!/bin/sh
case $1 in
pre) runuser -l $(logname) -c "echo \"$LOGNAME $(logname)\" > /tmp/sleeping.user.pre" ;;
post) runuser -l $(logname) -c "echo \"$LOGNAME $(logname)\" > /tmp/sleeping.user.post" ;;
esac
That definitely triggers the issue. The system is both very slow to suspend (I opened the lid after a minute and the picture was still on it) and very slow to wake up (eventually the screen blacked out, an image came on the screen, and much later, I got control back). Interesting-- it looks like this variable might not be present when systemd invokes sleep hooks:
[REDACTED@REDACTED ~]$ cat /tmp/sleeping.user.pre
[REDACTED@REDACTED ~]$ cat /tmp/sleeping.user.post
cat: /tmp/sleeping.user.post: No such file or directory
It looks like the pre-hook runs, but this is somehow empty. The post hook clearly does not even start. Strange.
Offline
The plan would have been to just
#!/bin/sh
case $1 in
pre) echo \"$LOGNAME $(logname)\" > /tmp/sleeping.user.pre;;
post) echo \"$LOGNAME $(logname)\" > /tmp/sleeping.user.post ;;
esac
to debug the condition, the runuser pam issue will hit you no matter what command you pass to runuser, but that deosn't explain why runuser doesn't work itc.
Online
Sleep hook:
[REDACTED@REDACTED ~]$ cat /usr/lib/systemd/system-sleep/diagnose-runuser
#!/bin/sh
case $1 in
pre) echo "$LOGNAME $(logname)" > /tmp/sleeping.user.pre ;;
post) echo "$LOGNAME $(logname)" > /tmp/sleeping.user.post ;;
esac
The hook outputs (note the single space in each):
[REDACTED@REDACTED ~]$ cat /tmp/sleeping.user.pre
[REDACTED@REDACTED ~]$ cat /tmp/sleeping.user.post
Note that runuser should work fine with an empty "-l" argument:
[REDACTED@REDACTED ~]$ sudo runuser -l -c "echo Hello"
Hello
What makes you think runuser isn't working? It looks like runuser encounters a systemd-pam issue while running in the sleep hook, but I don't see anything to suggest a problem with runuser itself beyond perhaps its systemd-pam integration, if there's anything special required for that.
Offline
For context, https://github.com/systemd/systemd/issues/33020
What makes you think runuser isn't working?
Your log, though I'm not thinking that "it's not working" but that it's hitting a wall in the context.
You could try to 'runuser -l root -c "echo foo"' but the problem is most likely that pam_systemd tries to access the system bus and that'll likely time out at this point
#!/bin/sh
case $1 in
pre) strace -ftt -o /tmp/runuser.pre.strace runuser -l $LOGNAME -c "echo foo" ;;
post) strace -ftt -o /tmp/runuser.post.strace runuser -l $LOGNAME -c "echo foo" ;;
esac
should™ reveal that.
Online
TL;DR-- This appears to time out trying to read from "/run/dbus/system_bus_socket".
So it seems like a bug, if runuser is supposed to work at that stage of sleep. Otherwise, I suppose this is just a somewhat counter-intuitive quirk. This feels like a bug-- something as fundamental as runuser seems like it should just work, even if the system is changing power states or coming out of boot. If systemd is going to give something a shell, the system should be initialized enough for most things to work, and dbus seems like something quite fundamental. Curious-- what are your thoughts on this?
Here's a relevant excerpt of the log from /tmp/runuser.pre.strace:
...
8218 18:36:51.415551 access("/run/systemd/seats/", F_OK) = 0
8218 18:36:51.415569 getrandom("\x20\xf3\x33\x23\x8a\x96\xef\x05", 8, GRND_INSECURE) = 8
8218 18:36:51.415584 futex(0x7e6853764af8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
8218 18:36:51.415598 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
8218 18:36:51.415614 prctl(PR_GET_NAME, "runuser") = 0
8218 18:36:51.415628 getrandom("\x8c\x3d\xe4\xe6\xc1\x2e\x5f\xcd", 8, GRND_INSECURE) = 8
8218 18:36:51.415642 bind(4, {sa_family=AF_UNIX, sun_path=@"cd5f2ec1e6e43d8c/bus/runuser/s
ystem"}, 38) = 0
8218 18:36:51.415656 getsockopt(4, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
8218 18:36:51.415670 setsockopt(4, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
8218 18:36:51.415684 getsockopt(4, SOL_SOCKET, SO_RCVBUF, [14680064], [4]) = 0
8218 18:36:51.415698 setsockopt(4, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = 0
8218 18:36:51.415712 getsockopt(4, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
8218 18:36:51.415726 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
8218 18:36:51.415739 getsockopt(4, SOL_SOCKET, SO_SNDBUF, [14680064], [4]) = 0
8218 18:36:51.415754 setsockopt(4, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
8218 18:36:51.415768 connect(4, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"
}, 30) = 0
8218 18:36:51.415786 getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) =
0
8218 18:36:51.415801 getsockopt(4, SOL_SOCKET, SO_PEERSEC, 0x58d54635dca0, [64]) = -1 ENOP
ROTOOPT (Protocol not available)
8218 18:36:51.415816 getsockopt(4, SOL_SOCKET, SO_PEERGROUPS, 0x58d54635cd00, [256 => 0])
= 0
8218 18:36:51.415830 getpeername(4, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_soc
ket"}, [127 => 30]) = 0
8218 18:36:51.415846 fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
8218 18:36:51.415859 getsockopt(4, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
8218 18:36:51.415873 getsockname(4, {sa_family=AF_UNIX, sun_path=@"cd5f2ec1e6e43d8c/bus/ru
nuser/system"}, [128 => 38]) = 0
8218 18:36:51.415888 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH
EXTERNAL\r\nDATA\r\n", iov_len=22}, {iov_base="NEGOTIATE_UNIX_FD\r\n", iov_len=19}, {iov_ba
se="BEGIN\r\n", iov_len=7}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG
_NOSIGNAL) = 48
8218 18:36:51.415911 pidfd_open(8218, 0) = 5
8218 18:36:51.415928 fcntl(5, F_DUPFD_CLOEXEC, 3) = 6
8218 18:36:51.415944 close(5) = 0
8218 18:36:51.415957 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="DATA\r\
nOK 4204df748b47176d31d73cb"..., iov_len=256}], msg_iovlen=1, msg_controllen=0, msg_flags=M
SG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 58
8218 18:36:51.415980 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/fre"..., iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 128
8218 18:36:51.416001 recvmsg(4, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
8218 18:36:51.416016 ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=24, tv_nsec=999964000}, NULL, 8) = 1 ([{fd=4, revents=POLLIN}], left {tv_sec=24, tv_nsec=999956576})
8218 18:36:51.416041 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\n\0\0\0\377\377\377\377?\0\0\0\5\1u\0\1\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
8218 18:36:51.416060 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0"..., iov_len=66}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 66
8218 18:36:51.416085 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\210\0\0\0\2\0\0\0\260\0\0\0\1\1o\0\27\0\0\0/org/fre"..., iov_len=192}, {iov_base="\0\0\0\0\0\0\0\0\t\0\0\0runuser-l\0\0\0\v\0\0\0unsp"..., iov_len=136}], msg_iovlen=2, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[6]}], msg_controllen=24, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 328
8218 18:36:51.416126 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\n\0\0\0\377\377\377\377\217\0\0\0\7\1s\0\24\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
8218 18:36:51.416151 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="org.freedesktop.DBus\0\0\0\0\6\1s\0\5\0\0\0"..., iov_len=146}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 146
8218 18:36:51.416173 recvmsg(4, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
8218 18:36:51.416187 ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999936000}, NULL, 8) = ? ERESTARTNOHAND (To be restarted if no handler)
8218 18:38:27.444223 ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=29, tv_nsec=898920146}, NULL, 8) = 0 (Timeout)
8218 18:38:57.345071 openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 5
8218 18:38:57.345460 fstat(5, {st_mode=S_IFREG|0644, st_size=2998, ...}) = 0
8218 18:38:57.345617 read(5, "# Locale name alias data base.\n#"..., 4096) = 2998
8218 18:38:57.345695 read(5, "", 4096) = 0
8218 18:38:57.345721 close(5) = 0
8218 18:38:57.345753 openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
...
It looks like the sequence goes something like:
1. Open domain socket connection to "/run/dbus/system_bus_socket" and set some socket parameters
2. Send a message over the socket
3. Attempt to receive on the socket, then enter a polling state until timing out 1 minute, 30 seconds later
Here's some similar output from the post-hook, which looks like it gets killed by the system (probably why we did not see an output earlier, though I'm curious as to why):
246 18:38:27.474301 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="DATA\r\nOK 4204df748b47176d31d73cb"..., iov_len=256}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 58
8246 18:38:27.474324 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/fre"..., iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 128
8246 18:38:27.474346 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\v\0\0\0\377\377\377\377?\0\0\0\5\1u\0\1\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
8246 18:38:27.474369 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0"..., iov_len=67}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 67
8246 18:38:27.474397 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\210\0\0\0\2\0\0\0\260\0\0\0\1\1o\0\27\0\0\0/org/fre"..., iov_len=192}, {iov_base="\0\0\0\0\0\0\0\0\t\0\0\0runuser-l\0\0\0\v\0\0\0unsp"..., iov_len=136}], msg_iovlen=2, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[6]}], msg_controllen=24, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 328
8246 18:38:27.474474 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\v\0\0\0\377\377\377\377\217\0\0\0\7\1s\0\24\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
8246 18:38:27.474517 recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="org.freedesktop.DBus\0\0\0\0\6\1s\0\6\0\0\0"..., iov_len=147}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 147
8246 18:38:27.474547 recvmsg(4, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
8246 18:38:27.474568 ppoll([{fd=4, events=POLLIN}], 1, {tv_sec=119, tv_nsec=999880000}, NULL, 8) = ? ERESTARTNOHAND (To be restarted if no handler)
8246 18:39:57.468837 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
8246 18:39:57.469425 +++ killed by SIGTERM +++
Feel free to DM me for the full strace logs-- I'll move them off of tmpfs in case anyone wants them, but don't want to post these here for the whole Internet to see.
Offline
The most simple explanation would be that systemd executes the hooks sequentially and blocks for the execution, causing a recursive lock, but at least the wiki explicitly warns that they're all fired cocurrently.
The manpage warns of them as "hacks" (not sure what's special about that with systemd ) and wants you to use the inhibitor dbus API, but oc. that makes no sense in this context as this isn't some long running process that needs to respond to or inhibit the suspend - it is a "hack" to get parts of the system bed-time ready.
Can you
dbus-send --system --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus.ListNames
instead of "runuser anything"?
Online
Script:
#!/bin/sh
case $1 in
pre) dbus-send --system --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus.ListNames &> /tmp/dbus.pre ;;
post) dbus-send --system --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus.ListNames &> /tmp/dbus.post ;;
esac
Output:
[REDACTED@REDACTED ~]$ cat /tmp/dbus.pre
method return time=1718556886.306202 sender=org.freedesktop.DBus -> destination=:1.57 serial=4294967295 reply_serial=2
array [
string "org.freedesktop.DBus"
string ":1.0"
string ":1.1"
string ":1.2"
string ":1.3"
string ":1.4"
string ":1.6"
string ":1.10"
string ":1.15"
string ":1.16"
string ":1.17"
string ":1.18"
string ":1.19"
string ":1.20"
string ":1.21"
string ":1.22"
string ":1.24"
string ":1.25"
string ":1.26"
string ":1.27"
string ":1.28"
string ":1.31"
string ":1.33"
string ":1.34"
string ":1.35"
string ":1.37"
string ":1.38"
string ":1.39"
string ":1.56"
string ":1.57"
string "net.connman.iwd"
string "net.hadess.PowerProfiles"
string "org.bluez"
string "org.freedesktop.NetworkManager"
string "org.freedesktop.PolicyKit1"
string "org.freedesktop.RealtimeKit1"
string "org.freedesktop.UPower"
string "org.freedesktop.UPower.PowerProfiles"
string "org.freedesktop.login1"
string "org.freedesktop.nm_dispatcher"
string "org.freedesktop.systemd1"
]
[REDACTED@REDACTED ~]$ cat /tmp/dbus.post
method return time=1718556889.749607 sender=org.freedesktop.DBus -> destination=:1.59 serial=4294967295 reply_serial=2
array [
string "org.freedesktop.DBus"
string ":1.0"
string ":1.1"
string ":1.2"
string ":1.3"
string ":1.4"
string ":1.6"
string ":1.10"
string ":1.15"
string ":1.16"
string ":1.17"
string ":1.18"
string ":1.19"
string ":1.20"
string ":1.21"
string ":1.22"
string ":1.24"
string ":1.25"
string ":1.26"
string ":1.27"
string ":1.28"
string ":1.31"
string ":1.33"
string ":1.34"
string ":1.35"
string ":1.37"
string ":1.38"
string ":1.39"
string ":1.56"
string ":1.59"
string "net.connman.iwd"
string "net.hadess.PowerProfiles"
string "org.bluez"
string "org.freedesktop.NetworkManager"
string "org.freedesktop.PolicyKit1"
string "org.freedesktop.RealtimeKit1"
string "org.freedesktop.UPower"
string "org.freedesktop.UPower.PowerProfiles"
string "org.freedesktop.login1"
string "org.freedesktop.nm_dispatcher"
string "org.freedesktop.systemd1"
]
And here's what I get running that same command from a terminal emulator in a graphical session:
[nroth@nickadeck ~]$ dbus-send --system --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus.ListNames
method return time=1718556725.946102 sender=org.freedesktop.DBus -> destination=:1.54 serial=4294967295 reply_serial=2
array [
string "org.freedesktop.DBus"
string ":1.0"
string ":1.1"
string ":1.2"
string ":1.3"
string ":1.4"
string ":1.6"
string ":1.10"
string ":1.15"
string ":1.16"
string ":1.17"
string ":1.18"
string ":1.19"
string ":1.20"
string ":1.21"
string ":1.22"
string ":1.24"
string ":1.25"
string ":1.26"
string ":1.27"
string ":1.28"
string ":1.31"
string ":1.33"
string ":1.34"
string ":1.35"
string ":1.37"
string ":1.38"
string ":1.39"
string ":1.54"
string "net.connman.iwd"
string "net.hadess.PowerProfiles"
string "org.bluez"
string "org.freedesktop.NetworkManager"
string "org.freedesktop.PolicyKit1"
string "org.freedesktop.RealtimeKit1"
string "org.freedesktop.UPower"
string "org.freedesktop.UPower.PowerProfiles"
string "org.freedesktop.login1"
string "org.freedesktop.systemd1"
]
So perhaps everything is there but not fully initialized? In the strace output, there did appear to be some back-and-forth communication with dbus services. I wish I knew how to get strace to show the entire payload of these syscalls. I'd like enough information to at least file a good bug report of what's happening, and know what's causing the issue. I don't currently need a runuser-based hook, but it's conceivable that I might sometime in the future, or that this might break something else on my system.
Last edited by nroth (2024-06-16 17:01:13)
Offline
The system bus responds, the problem seems to be pam_systemd specific.
Locally, only runuser*, system-remote-login and systemd-user source system-login which is the only pam config using pam_systemd so this is probably not a common scenario.
Eg. sudo/su would/should not be affected.
Online