Systemd 210-2 segfaults

Ibex · 2014-03-07 11:59:05

It seems that since the update to systemd 210-2 it segfaults on one of my machines. This is about 6 to 10 hours after a reboot.

...
Mar 06 07:50:01 angara kernel: type=1006 audit(1394088601.290:235): pid=11582 uid=0 old auid=4294967295 new auid=998 old ses=4294967295 new ses=234 res=1
Mar 06 07:50:46 angara CROND[11582]: pam_unix(crond:session): session closed for user munin
Mar 06 07:55:01 angara crond[12549]: pam_unix(crond:session): session opened for user munin by (uid=0)
Mar 06 07:55:01 angara CROND[12550]: (munin) CMD (test -x /usr/bin/munin-cron && /usr/bin/munin-cron)
Mar 06 07:55:01 angara kernel: type=1006 audit(1394088901.697:236): pid=12549 uid=0 old auid=4294967295 new auid=998 old ses=4294967295 new ses=235 res=1
Mar 06 07:55:05 angara kernel: systemd[1]: segfault at 0 ip           (null) sp 00007fff86ef4868 error 14 in systemd[400000+106000]
Mar 06 07:55:05 angara systemd[1]: Caught <SEGV>, dumped core as pid 13023.
Mar 06 07:55:05 angara systemd[1]: Freezing execution.
Mar 06 07:55:05 angara systemd-coredump[13025]: Process 13023 (systemd) dumped core.
Mar 06 07:55:46 angara CROND[12549]: pam_unix(crond:session): session closed for user munin
Mar 06 08:00:01 angara crond[13521]: pam_unix(crond:session): session opened for user munin by (uid=0)
Mar 06 08:00:01 angara crond[13520]: pam_unix(crond:session): session opened for user root by (uid=0)
Mar 06 08:00:01 angara CROND[13522]: (munin) CMD (test -x /usr/bin/munin-cron && /usr/bin/munin-cron)
Mar 06 08:00:01 angara CROND[13523]: (root) CMD (ntpdate be.pool.ntp.org)
Mar 06 08:00:01 angara kernel: type=1006 audit(1394089201.887:237): pid=13520 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=236 res=1
...
Mar 07 06:01:01 angara kernel: type=1006 audit(1394168461.245:131): pid=32196 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=130 res=1
Mar 07 06:01:01 angara CROND[32196]: pam_unix(crond:session): session closed for user root
Mar 07 06:05:01 angara kernel: type=1006 audit(1394168701.257:132): pid=32210 uid=0 old auid=4294967295 new auid=998 old ses=4294967295 new ses=131 res=1
Mar 07 06:05:01 angara crond[32210]: pam_unix(crond:session): session opened for user munin by (uid=0)
Mar 07 06:05:01 angara CROND[32211]: (munin) CMD (test -x /usr/bin/munin-cron && /usr/bin/munin-cron)
Mar 07 06:05:46 angara CROND[32210]: pam_unix(crond:session): session closed for user munin
Mar 07 06:10:01 angara crond[793]: pam_unix(crond:session): session opened for user munin by (uid=0)
Mar 07 06:10:01 angara CROND[794]: (munin) CMD (test -x /usr/bin/munin-cron && /usr/bin/munin-cron)
Mar 07 06:10:01 angara kernel: type=1006 audit(1394169001.764:133): pid=793 uid=0 old auid=4294967295 new auid=998 old ses=4294967295 new ses=132 res=1
Mar 07 06:10:10 angara kernel: systemd[1]: segfault at 7fa1ea8f6af8 ip 00007fa1ea8f6af8 sp 00007ffffd8bb7f8 error 15 in libc-2.19.so[7fa1ea8f6000+2000]
Mar 07 06:10:10 angara systemd[1]: Caught <SEGV>, dumped core as pid 1532.
Mar 07 06:10:10 angara systemd[1]: Freezing execution.
Mar 07 06:10:10 angara systemd-coredump[1534]: Process 1532 (systemd) dumped core.
Mar 07 06:10:46 angara CROND[793]: pam_unix(crond:session): session closed for user munin
Mar 07 06:15:01 angara crond[1937]: pam_unix(crond:session): session opened for user munin by (uid=0)
Mar 07 06:15:01 angara CROND[1938]: (munin) CMD (test -x /usr/bin/munin-cron && /usr/bin/munin-cron)
Mar 07 06:15:01 angara kernel: type=1006 audit(1394169301.801:134): pid=1937 uid=0 old auid=4294967295 new auid=998 old ses=4294967295 new ses=133 res=1
...

This raises a few issues on it's turn;

Munin relies on systemd for sending a wait() to the zombie processes it leaves behind
A normal reboot doesn't work anymore
Controlling other systemd related stuff

Anybody who can point me in the right direction?

Gcool · 2014-03-07 12:21:42

Potentially related?

Archlinux bugreport.
Upstream bugreport.

brain0 · 2014-03-07 12:34:12

Can you try systemd 210-3 from testing? If I understand this correctly, It should fix this crash.

Ibex · 2014-03-07 13:05:29

210-3 is installed, now it's waiting whether it will occur again or not. Thanks for the feedback

Last edited by Ibex (2014-03-07 13:05:42)

viky · 2014-03-07 14:16:26

Hello, I have same issue on my server.

Mar  6 21:02:38 localhost kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Mar  7 01:16:22 localhost kernel: type=1326 audit(1394151381.540:2): auid=4294967295 uid=99 gid=99 ses=4294967295 pid=8810 comm="sshd" sig=31 sy
scall=48 compat=0 ip=0x7f74ddc39e67 code=0x0
Mar  7 02:28:20 localhost kernel: systemd[1]: segfault at b8 ip 000000000048101e sp 00007fff77d9c920 error 4 in systemd[400000+106000]
Mar  7 02:28:20 localhost systemd-coredump[12958]: Process 12957 (systemd) dumped core.

But my actual problem is: How can I reboot without systemd? I don't have physical access to the server.
And is it save to downgrade to last working systemd? (version 208-11)

Many thanks

Ibex · 2014-03-07 14:26:28

I just did a hard reset of my machine trough the IPMI console. But now that I'm thinking of it, you might try to run:

# telinit 6

I don't know whether it's safe to downgrade to the previous version of systemd, unfortunately.

Last edited by Ibex (2014-03-07 14:27:15)

brain0 · 2014-03-07 14:29:59

viky wrote:

But my actual problem is: How can I reboot without systemd? I don't have physical access to the server.

Save your data, run 'sync' and then '/usr/lib/initcpio/busybox reboot -f'. However, using IPMI is probably safer.

viky wrote:

And is it save to downgrade to last working systemd? (version 208-11)

I wouldn't do it. It's better to upgrade to 210-3.

viky · 2014-03-07 15:03:27

brain0 wrote:

Save your data, run 'sync' and then '/usr/lib/initcpio/busybox reboot -f'. However, using IPMI is probably safer.

Thanks, that one worked. IPMI doesn't.
Everything looks fine now, but I'll see if systemd 210-3 really helped in few hours.
Again many thanks.

normaldotcom · 2014-03-07 17:59:02

I have the same problem---running systemd 210-2, which segfaulted overnight after I updated yesterday. I was checking my munin logs and noticed that I had a ridiculous quantity of zombie processes (mostly munin-html, ironically enough). I'll wait and see if it segfaults again; if so, I'll install 210-3 from testing and try it out.

normaldotcom · 2014-03-07 18:33:12

Also, for what it's worth, the issue looks like it started exactly at midnight for me (when zombie processes started accumulating).
Munin graph over systemd segfault period

usch · 2014-03-07 20:33:21

I had the same issue after updating to 210-2.
systemd segfaulted on specific circumstances reliably.

Upgrade to 210-3 from testing fixed this.

twelveeighty · 2014-03-08 14:49:29

As FYI - I noticed 210-3 is now in [core], by the way.

brain0 · 2014-03-08 15:58:31

Of course it is - it contained important bugfixes compared to 210-2, so it had to be rushed to [core].

Ibex · 2014-03-09 06:14:48

And after quite a while without any trouble, it seems that 210-3 indeed fixes the issue. Thanks all for the help.

Arch Linux

#1 2014-03-07 11:59:05

Systemd 210-2 segfaults

#2 2014-03-07 12:21:42

Re: Systemd 210-2 segfaults

#3 2014-03-07 12:34:12

Re: Systemd 210-2 segfaults

#4 2014-03-07 13:05:29

Re: Systemd 210-2 segfaults

#5 2014-03-07 14:16:26

Re: Systemd 210-2 segfaults

#6 2014-03-07 14:26:28

Re: Systemd 210-2 segfaults

#7 2014-03-07 14:29:59

Re: Systemd 210-2 segfaults

#8 2014-03-07 15:03:27

Re: Systemd 210-2 segfaults

#9 2014-03-07 17:59:02

Re: Systemd 210-2 segfaults

#10 2014-03-07 18:33:12

Re: Systemd 210-2 segfaults

#11 2014-03-07 20:33:21

Re: Systemd 210-2 segfaults

#12 2014-03-08 14:49:29

Re: Systemd 210-2 segfaults

#13 2014-03-08 15:58:31

Re: Systemd 210-2 segfaults

#14 2014-03-09 06:14:48

Re: Systemd 210-2 segfaults

Board footer