You are not logged in.

#1 2014-02-19 21:03:20

oktobermoon
Member
Registered: 2008-02-26
Posts: 5

A Flaw with LXC and Systemd

I am running quite a few machines with LXC and systemd.  I noticed one day that one of my more heavily accessed machines was very slow to respond when I tried to log in through ssh.  It sat there for a good 15-30 seconds before letting me in.  No errors were presented, but upon inspection I found that /sbin/init was consuming 100% of a cpu.  We hoped this was a fluke, and that the init was just borked, and monitored it closely.  The time came to reboot the server and when it rebooted, it was fine, but as soon as traffic started hitting it, the /usr/lib/systemd/systemd-logind process was going crazy,pegging at 100%, and then after a day I found it had switched to the /sbin/init taking over as the culprit.

After investigating systemd and finding that I can list all the units by just typing systemd at the prompt, I found that there were upwards of 15000 scope sessions. (this taking several days)

systemctl | grep session-c | wc -l

The scope sessions are identified by the following lines in the systemd unit list:

session-c43459.scope              loaded active running   Session c43459 of user root
session-c43460.scope              loaded active running   Session c43460 of user root

After digging further and finding out how to investigate these sessions, it was as I suspected, these are zombie sessions that have persisted.

[root@mybox ~]# systemctl status session-c43460.scope
session-c43460.scope - Session c43460 of user root
   Loaded: loaded (/run/systemd/system/session-c43460.scope; static)
  Drop-In: /run/systemd/system/session-c43460.scope.d
           `-90-After-systemd-user-sessions\x2eservice.conf, 90-Description.conf, 90-KillMode.conf, 90-SendSIGHUP.conf, 90-Slice.conf, 90-TimeoutStopUSec.conf
   Active: active (running) since Wed 2014-02-19 13:18:44 MST; 29min ago

Feb 19 13:18:44 mybox systemd[1]: Started Session c43460 of user root.
Feb 19 13:18:45 mybox sshd[27105]: Received disconnect from 10.10.10.10: disconnecte...user
Feb 19 13:18:45 mybox sshd[27105]: pam_unix(sshd:session): session closed for user root
Hint: Some lines were ellipsized, use -l to show in full.

This clearly shows me that the session was started for an ssh connection, but then persisted, even after the user disconnected.
After stopping the 15000+ zombie sessions, Voila! no more CPU grab by init.  The box is running significantly better, and everything is as it should be.

I know I have a unique environment, with an application that is connecting up to this server several times a minute just to get data, but that's a problem for everyone.  All my other LXC instances have zombie sessions on them too, and while my connects are not nearly as much, within a year I expect to see degraded performance. 

I've written a script to find and kill these zombie sessions, as can be done by issuing

systemctl stop session-c43460.scope

, but it would be nice if these scope sessions were exiting normally as they should when an ssh disconnect occurrs

So, I believe this is a problem between LXC and systemd, as the hosts are perfectly fine and no zombie sessions are present.

Anyone else out there having problems with LXC and systemd? or any suggestions or solutions are welcome.

Last edited by oktobermoon (2014-02-20 22:36:21)

Offline

#2 2014-02-20 21:55:47

clfarron4
Member
From: London, UK
Registered: 2013-06-28
Posts: 2,163
Website

Re: A Flaw with LXC and Systemd

PLease use code tags... I can't tell what's what.

Last edited by clfarron4 (2014-02-20 21:56:15)


Claire is fine.
Problems? I have dysgraphia, so clear and concise please.
My public GPG key for package signing
My x86_64 package repository

Offline

#3 2014-02-20 22:36:43

oktobermoon
Member
Registered: 2008-02-26
Posts: 5

Re: A Flaw with LXC and Systemd

Sorry, it's fixed

Offline

Board footer

Powered by FluxBB