You are not logged in.
I have a system that is set to auto-login-in a user (it will be a standalone system, without need of physical security at that level.) When I boot this system, it takes about a minute and a half, which is about 80 seconds more than I think it should take. The system is a mini-ITX board with a quad core Core i5-2400, 8 GB of RAM, and a 32 GB SATA SSD, running latest-and-greatest x86_64 Arch Linux.
I've been trying to debug this for a while. For example, I thought it was my webcams, because the sound driver for them was the last thing before the long empty wait, but unplugging these didn't improve things. However, googling around, I have found out about systemd-analyze. Most of the textual dumps focus on CPU time, rather than wallclock time, and my problem is unfortunately a wallclock time problem, for reasons I don't understand.
bootchart says that the long pole is dhcpcd on the Ethernet interface, which takes a little over 7 seconds. Then it starts network, and then it starts sshd. At that point, it's about 10 seconds into wallclock time of the boot.
Then system system does ... nothing, for over a minute!
Then it starts systemd-journal-flush.service, and my graphical session, and everything comes up the way it should.
Using "systemd-analyze plot," here's an overview of the problem. The things on the left all complete in less than ten seconds. Then the long pole of nothing. Then the system starts on the right and everything's fine. I'd like to identify and remove this long pole, but I don't know how to do that.
http://watte.net/slow-boot-b.png
Here's a readable detail of the bottom-left of that diagram (before the long pole):
http://watte.net/slow-boot-a.png
And the detail of the bottom-right of that diagram (after the long pole):
http://watte.net/slow-boot-c.png
Any suggestions for how I can identify what's causing the long pole and eliminate it would be very welcome!
-- mod edit: read the Forum Etiquette and only post thumbnails http://wiki.archlinux.org/index.php/For … s_and_Code [jwr] --
Offline
Does anybody know why Arch Linux takes 90 seconds to boot into LXDE, with very little trace of what's taking the time in systemd?
Does nobody know how to debug a slow Arch boot past using systemd-analyze like described above?
Offline
Did you look through the logs?
Offline
Thought not directly helpful I'm afraid, I have a VERY long boot time as well (> 1 min 20 seconds) -- but ONLY if something starts DBUS. Booting with GDM takes forever. Booting with LightDM Shaves off about 1 minute.
My unscientific method for arriving at DBUS being the issue:
Booting with GDM into Gnome3 -- slow (dbus runs at boot time)
Booting with GDM into KDE -- slow (dbus runs at boot time)
Booting LightDM into Gnome -- boot is about 1 min faster, but after entering login info and Gnome starting -- about 1 minute delay
Booting LightDM into E17 (no Gnome env autostart stuff) -- boot is about 1 min faster and logging in is fast. Run Nautilus or something (seemingly) needing DBus -- 30+ seconds delay.
YMMV, but I just resolved to live with it because I really have exhausted my knowledge on what makes DBus so darn slow. The only thing my small brain can relate this to is that it seemed to get slower after I upgraded my RAM. I went from 8GB to 16GB of RAM. Is it possible that this much RAM is causing some issue? Does Dbus (or something DBus calls) try to create a 16GB swap file and take forever? Does it try to do something else?
Sorry I am not of more help.
Last edited by mrunion (2014-01-03 15:19:36)
Matt
"It is very difficult to educate the educated."
Offline
Thanks for this anecdote; it may help point at a culprit (but still not solve anything.)
I'm using the plain LXDE environment, because I pretty much only need support for LXTerminal, my own FLTK apps, and to run Chromium.
I think I'm using things that need dbus, though, and dbus is supposed to solve a bunch of problems that we've been having previously in the Linux environment. I can't believe it would take a minute to start -- it would have to have a lot of sleep() calls in it if that were the case...
Offline
Did you look through the logs?
Yes, I did. Although, with journald, "logs" isn't what they are anymore. Rather than use any text editor of my choosing and "grep" through /var/log, I now have to use various options to "journalctl," and if journalctl doesn't have an option for what I want to do, well, I'm screwed. So I think journalctl is a step backwards. But that's not the problem in this question.
I looked through the logs. Things just ... stop. For 80-90 seconds. Then they start up again, with journald-flush.service as the first thing.
There's a bug files against Fedora 20 where apparently journald is copying each log item, one at a time, out of /run/log/journal into /var/log/journal on start, on that OS, and that took a while for people. However, not 80 seconds. And I have a SSD -- operations like that "should" be fast.
Offline
IMO, the journal is awesome. It provides everything in one place with all kinds of filter options, and if that isn't to your liking, you can simply pipe it to the tools of your choice. Sending it through a pipe will remove any coloring/formatting that might screw with grepping. I admit that it did take some getting used to, and I had to do away with any preconceived ideas of what I thought was right.
As far as your issue, what is the last thing that happens before the stall? Have you tried booting with debug level logging and see if you can find anythign there? Turning that on will inevitably slow the system down, but maybe it can shed light on why it is freezing.
You might also check to make sure that there aren't any devices that it is expecting to show up that never do. This can occur when a service 'BindsTo' something that doesn't show. An example would be the dhcpcd@.service has 'BindsTo=sys-subsystem-net-devices-%i.device'. So if you enable dhcpcd@wlp3s0.service, it will sit and wait for /sys/subsystem/net/devices/wlp3s0 to show up before continuing.
Offline
There is a happy solution to this problem. It turns out that the journal flush was somehow waiting for a slow NFS server on start-up.
I don't know why it's doing this, but that's what it is. Deferring those mounts until later (removing them from fstab) solved this problem.
The longest pole is now getting a carrier and a lease for the Ethernet interface, which I'll live with.
That being said, why doesn't the systemd-startup show this dependency? Why is the graph just 80 seconds of "nothing"?
Regarding journalctl: Yes, it has a number of selectors. That's not the right direction to go, though. The Windows Event Viewer also has a number of selectors, but it's a pretty darn crappy way of trying to figure out what's going on with a system. Putting the selectors in the extractor tool means that EVERY TOOL needs to support selectors, which in turn means every tool does it differently. Using text files, and the standard "grep," for finding things, brings a level of consistency and composability which is a real strength of Unix over Windows Server. I'm sad to see it gone.
Offline
Not every tool needs to support those filtering options in the journal. In fact, the design is such that a standard text stream can be had when putting it though a pipe, which will allow you to use the regular tools you enjoy. Additionally, pointing a regular syslog daemon at the runtime journal will result in the traditional logging you are used to. Even further, you can use the systemd-journal-gatewayd.service to have the logs output in plain text over a socket. So I don't quite agree that this functionality is gone at all.
In regard to your issue. Instead of removing the nfs mounts from the fstab entirely, you can add 'noauto,x-systemd.automount' to the options and then it will only attempt to mount them on first access. It is similar to autofs, but without the umount capabilities. You may also want to put 'nofail' in the options as well so that you won't be dropped to the emergency.target in the event that the nfs shares may not be available.
Offline
you can add 'noauto,x-systemd.automount' to the options
Thanks; I will try that!
Offline
@jwatte: Do you have the -git version of libplist or libimobiledevice from AUR installed?
Last edited by migerh (2014-01-08 09:30:17)
Offline
@jwatte: Do you have the -git version of libplist or libimobiledevice from AUR installed?
No.
Offline
I had the very same symptoms: Didn't use my desktops over the holidays and last weekend it suddenly took ~90s (60 to gdm, another 30 for loading awesome) until the startup was finished and I was able to use my desktop - before the holidays it was about 10s. Weird thing was it started at the same time with both desktop machines I am using. Removing nfs mounts from /etc/fstab didn't fix this (at least on Desktop #1, haven't tested it yet on Desktop #2). Here's what fixed this on Desktop #1:
I have the -git versions of libplist, libimobiledevice, usbmuxd and libusbmuxd installed on my desktops and recently libplist-git stopped providing /usr/lib/libplist.so.1. But libplist.so.1 is required by /usr/lib/upower/upowerd which is started by dbus and apparently dbus has a timeout that blocks the entire boot process as soon as something goes wrong. To verify that this really is the reason for the massive boot delay I made a quick and very dirty hack and added a symlink /usr/lib/libplist.so.1 -> /usr/lib/libplist.so.2.0.0. After that upowerd is starting without any problems and boot time is back at 10s.
Offline