You are not logged in.
A month or two ago I noticed an issue on several of my servers -- pacman worked, but started taking about 1-2 minutes to check keys in keyring, and even longer to check package integrity.
Today I finally investigated this issue, and found some definite weirdness.
First, an strace for
sudo strace pacman -Sy --noconfirm bash
: https://gist.github.com/anonymous/a4c0a … 4c87346eab (from 'proceed to installation?' to 'loading package files'). I've inserted newlines where there was a pause of over 2 sec.
Here I noticed that some read()s were taking a while, and also that pacman kept changing its pid. So I ran
sudo strace -f pacman -Sy --noconfirm bash
, which follows forks. The results were even weirder: https://gist.github.com/anonymous/52df2 … 4048789f34
Now there is a loop of
[pid 27108] close(<n>) = -1 EBADF (Bad file descriptor)
, with n starting from 0, and continuing indefinitely. (for reference,
ulimit -n
is 33554432)
I also noticed that the PID of the process kept incrementing, but there are no coredumps from pacman in coredumpctl. So I assume it's forking and killing the parent, or something like that.
I've tried clearing /etc/pacman.d/gnupg and regenerating with pacman-keygen, and it didn't help.
The systems with these issues are all current, with pacman 5.0.1.
> uname -a
Linux flexo.slact 4.4.30-1-lts #1 SMP Tue Nov 1 22:09:20 CET 2016 x86_64 GNU/Linux
> pacman -V 1 ↵
.--. Pacman v5.0.1 - libalpm v10.0.1
/ _.-' .-. .-. .-. Copyright (C) 2006-2016 Pacman Development Team
\ '-. '-' '-' '-' Copyright (C) 2002-2006 Judd Vinet
'--'
This program may be freely redistributed under
the terms of the GNU General Public License.
Any ideas and suggestions welcome.
Last edited by slact (2017-11-25 06:39:17)
Offline
what GPGme version do you run (works for me™ appears to be 1.7.1-2)?
also, actually, you should filter out the close() calls, they're something the library appears to do, which is bad but not technically harmful.
we might see better what exactly the problem is, once this is out of the way.
the thing that worries me is that clearing the gpgme directory didn't fix it. maybe if you grep for all open() calls during the strace -f execution, we might see what other things the library loads.
could you also provide us with as many configuration files as you can come up with? pacman.conf, your mirror, the generated gpg.conf, etc.
and finally, um, I have to say this is rather suspicious. How well did you check for intruders?
EDIT: for completeness, I decided to create a "healthy " strace log so we can compare the results. https://gist.github.com/mar77i/f13f80d8 … 0d5273b958
Last edited by mar77i (2016-11-21 08:47:56)
<rickh> f(Arch) = ((Gentoo - Compiletime) + (Slackware + Depency resolution and Downloading))
Offline
actually, could you post the output of `ls /proc/self/fd` in the environment you run strace?
<rickh> f(Arch) = ((Gentoo - Compiletime) + (Slackware + Depency resolution and Downloading))
Offline
Uh, well, it looks like gpgme is the source of the problem. I've updated some of the machines, and gpgme got upgraded from 1.6.0-3 to 1.7.1-2 . Downgrading gpgme to 1.6.0-3 brings the problem back, and upgrading to 1.7.1-2 fixes it. Still, I'd like to figure out what was going on. Both packages have legit checksums.
grep for all open() calls during the strace -f execution
waiting an hour to finish the close() calls now. Seems to get stuck pretty hard on that close() loop when running with strace, so I don't know what happens afterwards. Will update if it ever does finish.
as many configuration files as you can come up with?
/etc/pacman.conf
/etc/pacman.d/mirrorlist
/etc/pacman.d/gnupg/gpg.conf
I have to say this is rather suspicious. How well did you check for intruders?
I've seen nothing else suspicious on the machnies or at the firewall to suggest an intrusion has taken place. No other suspicious behavior as well. And it seems like upgrading gpgme fixes it, and the old version is legit. Technically, of course I can't rule out an intrusion, but this seems more like a weird bug to me right now. (Which is, of course, what they'd want me to think...)
could you post the output of `ls /proc/self/fd` in the environment you run strace?
> pgrep pacman 1 ↵
22835 <- strace attached here
28559
> sudo ls /proc/22835/fd
0 1 2 3 4 5 6 7
> sudo ls /proc/28559/fd
8
Offline
Uh, well, it looks like gpgme is the source of the problem. I've updated some of the machines, and gpgme got upgraded from 1.6.0-3 to 1.7.1-2 . Downgrading gpgme to 1.6.0-3 brings the problem back, and upgrading to 1.7.1-2 fixes it. Still, I'd like to figure out what was going on. Both packages have legit checksums.
Nothing really stands out in the logs, so if you can reproduce it you can do a git-bisect to figure out which commit fixed it.
Offline
Necro-bumping my old thread, but I finally figured out what the issue was. It had returned in gpgme version 1.9.0, however, it was fixed in gpgme-master. I finally tracked the fix commit to https://git.gnupg.org/cgi-bin/gitweb.cg … b19c59f738 .
This is when I realized what the core issue had been. The machines affected are primarily servers, and they have very large max open fds (10M or more), whereas unaffected machines had max fds on the order of 10-100K. So the way gpgme was handling iterating through file descriptors, coupled with the very large max fd value, made gpgme, and therefore pacman, take forever. With the linux optimization in the linked commit, the issue goes away.
In short, the next release of gpgme to include that commit will fix the issue.
Offline