You are not logged in.

#1 2016-08-07 09:58:39

akobel
Member
From: Saarbrücken, Germany
Registered: 2016-02-12
Posts: 22
Website

Converting /var to a btrfs subvolume

Dear all,

until recently I used to have a stable system with a single hard disk and a quite basic LVM-on-LUKS partitioning scheme.
Besides a boot partition, it had two separate btrfs partitions for / and /home as well as a swap partition, all inside the same volume group on one LUKS partition.

After (successfully, as it seems) setting up snapper and snap-pac, I decided to try to make /var a subvolume of the root btrfs partition, to avoid snapshotting (in particular) /var/cache and /var/log.
Problem is, now whenever I log out and in again (from i3, via the LightDM login manager), I get various crashes that I cannot hunt down.

My procedure was as follows:

  • btrfs subvolume snapshot / /newvar

  • delete everything except var from /newvar

  • move everything inside /newvar/var to /newvar

  • mv /var /var.org; mv /newvar /var

At this point, my live system looked as before, except that /var is now on a subvolume. (I later found out that there is another subvolume installed as /var/lib/machines; I moved this over to the new place, too.) The corresponding output of `btrfs sub list -a /` is (without automated snapshots from snapper)

ID 274 gen 45595 top level 381 path <FS_TREE>/var/lib/machines
ID 289 gen 46764 top level 5 path .snapshots
ID 381 gen 46790 top level 5 path var

Everything kinda works, unless I log out and re-login from LightDM without rebooting in between. i3bar does not show up, feh cannot load a background, sometimes (seemingly randomly), no keyagent is loaded, and journalctl shows crashes like the following:

Aug 07 10:45:37 s9 systemd-coredump[6922]: Process 881 (xss-lock) of user 1000 dumped core.
                                           
                                           Stack trace of thread 881:
                                           #0  0x00007f00eedc470b g_logv (libglib-2.0.so.0)
                                           #1  0x00007f00eedc487f g_log (libglib-2.0.so.0)
                                           #2  0x00000000004034bb n/a (xss-lock)
                                           #3  0x000000000040381f n/a (xss-lock)
                                           #4  0x00007f00eedbdc8a g_main_context_dispatch (libglib-2.0.so.0)
                                           #5  0x00007f00eedbe040 n/a (libglib-2.0.so.0)
                                           #6  0x00007f00eedbe362 g_main_loop_run (libglib-2.0.so.0)
                                           #7  0x0000000000402890 main (xss-lock)
                                           #8  0x00007f00ee3c32d1 __libc_start_main (libc.so.6)
                                           #9  0x0000000000402989 _start (xss-lock)
                                           
                                           Stack trace of thread 885:
                                           #0  0x00007f00ee4824cd poll (libc.so.6)
                                           #1  0x00007f00eedbdfd6 n/a (libglib-2.0.so.0)
                                           #2  0x00007f00eedbe0ec g_main_context_iteration (libglib-2.0.so.0)
                                           #3  0x00007f00eedbe131 n/a (libglib-2.0.so.0)
                                           #4  0x00007f00eede42b5 n/a (libglib-2.0.so.0)
                                           #5  0x00007f00ed6e0474 start_thread (libpthread.so.0)
                                           #6  0x00007f00ee48b81f __clone (libc.so.6)
                                           
                                           Stack trace of thread 887:
                                           #0  0x00007f00ee4824cd poll (libc.so.6)
                                           #1  0x00007f00eedbdfd6 n/a (libglib-2.0.so.0)
                                           #2  0x00007f00eedbe362 g_main_loop_run (libglib-2.0.so.0)
                                           #3  0x00007f00ef3b9726 n/a (libgio-2.0.so.0)
                                           #4  0x00007f00eede42b5 n/a (libglib-2.0.so.0)
                                           #5  0x00007f00ed6e0474 start_thread (libpthread.so.0)
                                           #6  0x00007f00ee48b81f __clone (libc.so.6)
-- Subject: Process 881 (xss-lock) dumped core

Aug 07 10:45:38 s9 systemd-coredump[6931]: Process 626 (Xorg) of user 0 dumped core.
                                           
                                           Stack trace of thread 626:
                                           #0  0x00007ff2fd2251e7 n/a (intel_drv.so)
                                           #1  0x00000000005188d7 n/a (Xorg)
                                           #2  0x0000000000518a22 n/a (Xorg)
                                           #3  0x00000000005199c6 present_event_notify (Xorg)
                                           #4  0x00007ff2fd30a5bd n/a (intel_drv.so)
                                           #5  0x00007ff2fd26a8f0 n/a (intel_drv.so)
                                           #6  0x00007ff2fd26c172 n/a (intel_drv.so)
                                           #7  0x00007ff2fd26e4a8 n/a (intel_drv.so)
                                           #8  0x000000000047b2df AbortDDX (Xorg)
                                           #9  0x00000000005a5072 n/a (Xorg)
                                           #10 0x00000000005a5e7d FatalError (Xorg)
                                           #11 0x000000000059cc8e n/a (Xorg)
                                           #12 0x00007ff301c7b0f0 __restore_rt (libc.so.6)
                                           #13 0x00007ff2fd2251e7 n/a (intel_drv.so)
                                           #14 0x00000000005188d7 n/a (Xorg)
                                           #15 0x00000000005189a5 n/a (Xorg)
                                           #16 0x0000000000519112 n/a (Xorg)
                                           #17 0x0000000000517ce3 n/a (Xorg)
                                           #18 0x00000000004c8f38 n/a (Xorg)
                                           #19 0x0000000000515fa4 n/a (Xorg)
                                           #20 0x000000000043a881 n/a (Xorg)
                                           #21 0x00007ff301c682d1 __libc_start_main (libc.so.6)
                                           #22 0x00000000004246e9 _start (Xorg)
                                           
                                           Stack trace of thread 640:
                                           #0  0x00007ff301a3812f pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                                           #1  0x00007ff2fd2a3d69 n/a (intel_drv.so)
                                           #2  0x00007ff301a32474 start_thread (libpthread.so.0)
                                           #3  0x00007ff301d3081f __clone (libc.so.6)
-- Subject: Process 626 (Xorg) dumped core

Aug 07 10:45:50 s9 systemd-coredump[7075]: Process 7071 (i3bar) of user 1000 dumped core.
                                           
                                           Stack trace of thread 7071:
                                           #0  0x00007f529539408f raise (libc.so.6)
                                           #1  0x00007f52953954ba abort (libc.so.6)
                                           #2  0x00007f529591f1fe n/a (libev.so.4)
                                           #3  0x00007f5295923479 ev_run (libev.so.4)
                                           #4  0x0000000000404277 main (i3bar)
                                           #5  0x00007f52953812d1 __libc_start_main (libc.so.6)
                                           #6  0x0000000000404359 _start (i3bar)
-- Subject: Process 7071 (i3bar) dumped core

First idea: revert to working state, by exchanging /var and /var.org - didn't work. Creating an fstab entry for the new subvolume to mount in on /var did not help, either. (Lack of understanding here. According to how I understand the SysAdmin guide on the btrfs wiki, that makes the difference between the flat (mounted) and nested (unmounted) layout, but should be functionally equivalent?)


So, I assume a major PEBKAC in the sense that I

  • Lack background about btrfs subvolumes despite digging through the Arch Btrfs docs and the Btrfs wiki. Then I'd wish for a hint what I possibly missed...

  • Should have created a fresh subvolume and moved the old /var there. I wanted to avoid duplication of all data therein; maybe that was too ambitious?

  • I overlooked something important about moving/converting /var which is totally unrelated to btrfs subvolumes.

Can anyone help me out there? Thanks in advance!

Last edited by jasonwryan (2016-08-07 16:29:40)

Offline

#2 2016-08-07 16:00:32

Awebb
Member
Registered: 2010-05-06
Posts: 6,282

Re: Converting /var to a btrfs subvolume

I might be completely wrong and guilty of spreading FUD, but I have had such bad experiences with LightDM, that I suggest you try something else (like SDDM or .xinitrc+startx), before you lose your mind over what looks like a clusterfuck right now.

your btrfs docs link wrote:

A btrfs subvolume is not a block device (and cannot be treated as one) instead, a btrfs subvolume can be thought of as a POSIX file namespace. This namespace can be accessed via the top-level subvolume of the filesystem, or it can be mounted in its own right.

I suspect (but don't know) that this somehow collides with the LVM/LUKS business. The Arch wiki article titled "dm-crypt/Encrypting an entire system" suggests btrfs on top of LUKS, but not with LVM in between. It doesn't do so explicitly, there are reasons to not use btrfs in conjunction with LVM. Read 6.1 Btrfs has subvolumes, does this mean I don't need a logical volume manager and I can create a big Btrfs filesystem on a raw partition? on the Btrfs FAQ on kernel.org.

However, as an indication ofa possible PEBKAC, there is one thing I'm not sure about: I have no clue, what "btrfs subvolume snapshot / /newvar" does in detail. Wouldn't this "copy" everything from / into /newvar?
... duplicate / into /newvar?
... mean, that deleting everything but /newvar/var and moving everything from /newvar /var resulted in /newvar being identical to /var?
... mean, that moving /var to /var.org.
... move /newvar out of the namespace of the subvolume and move it back to / as /var, but you simply don't notice, because copying files on the same physical device does not actually duplicate data, thanks to some file system magic in btrfs?

It's very likely, that this has not much to do with your problem. In any case, try collection journalctl -u (as well as systemctl status) of whatever luks/lvm/btrfs related units you have running, as well as LightDM) and try to get a log from i3.

Offline

#3 2016-08-07 16:20:43

akobel
Member
From: Saarbrücken, Germany
Registered: 2016-02-12
Posts: 22
Website

Re: Converting /var to a btrfs subvolume

Awebb wrote:

I might be completely wrong and guilty of spreading FUD, but I have had such bad experiences with LightDM, that I suggest you try something else (like SDDM or .xinitrc+startx), before you lose your mind over what looks like a clusterfuck right now.

Hm, okay. That makes me wonder why I actually went to LightDM in the first place; .xinitrc+startx never failed on me. Maybe it's time to get back to the good ol' days...

The Arch wiki article titled "dm-crypt/Encrypting an entire system" suggests btrfs on top of LUKS, but not with LVM in between. It doesn't do so explicitly, there are reasons to not use btrfs in conjunction with LVM.

Good point. However, I never had problems with the two separate btrfs partitions for / and /home in the same LUKS+LVM-volume group, and I want to keep the encrypted swap in there (for hibernate).

However, as an indication ofa possible PEBKAC, there is one thing I'm not sure about: I have no clue, what "btrfs subvolume snapshot / /newvar" does in detail. Wouldn't this "copy" everything from / into /newvar?
... duplicate / into /newvar?
... mean, that deleting everything but /newvar/var and moving everything from /newvar /var resulted in /newvar being identical to /var?
... mean, that moving /var to /var.org.
... move /newvar out of the namespace of the subvolume and move it back to / as /var, but you simply don't notice, because copying files on the same physical device does not actually duplicate data, thanks to some file system magic in btrfs?

That was exactly my intention. I read this random StackExchange and figured that's a good idea, because I keep the files at the same extents, so it will be reasonably safe that no deeply hidden issue occurs (e.g., w.r.t. modification times, user rights etc.). Maybe it wasn't a good idea to try it on-line on /var...

It's very likely, that this has not much to do with your problem. In any case, try collection journalctl -u (as well as systemctl status) of whatever luks/lvm/btrfs related units you have running, as well as LightDM) and try to get a log from i3.

Hmmm... Good catch. There's not much fs-related in there, except fsck, but...

Aug 07 11:11:00 s9 systemd[1]: var.mount: Directory /var to mount over is not empty, mounting anyway.
Aug 07 11:11:00 s9 systemd[1]: Mounting /var...
Aug 07 11:11:01 s9 systemd[1]: Mounted /var.
Aug 07 12:53:45 s9 systemd[1]: Unmounting /var...
Aug 07 12:53:45 s9 umount[12908]: umount: /var: target is busy
Aug 07 12:53:45 s9 umount[12908]:         (In some cases useful info about processes that
Aug 07 12:53:45 s9 umount[12908]:          use the device is found by lsof(8) or fuser(1).)
Aug 07 12:53:45 s9 systemd[1]: var.mount: Mount process exited, code=exited status=32
Aug 07 12:53:45 s9 systemd[1]: Failed unmounting /var.

looks suspiciously like I should look into it...


Thanks!

Offline

#4 2016-08-07 16:53:30

Awebb
Member
Registered: 2010-05-06
Posts: 6,282

Re: Converting /var to a btrfs subvolume

You did this live? A brave martyr, indeed!

I'm curious enough to sub to this topic. Let us know, if you find out anything.

EDIT: You're from Kirkel? If it's the one I think, I'd actually live close enough to come over and have a look myself :-D

Last edited by Awebb (2016-08-07 16:55:00)

Offline

#5 2016-08-07 17:21:43

akobel
Member
From: Saarbrücken, Germany
Registered: 2016-02-12
Posts: 22
Website

Re: Converting /var to a btrfs subvolume

Well, isn't this snapshot stuff meant for assisting the brave at open-heart surgeries? ;-)

News 1: var.mount only hints to the fact that I should not have an /etc/fstab entry for /var since the subvolume is immediately mounted there. Still, I'm not 100% convinced that I understood the background between the nested and flat schemes, and the subvolume naming schemes.

News 2: LightDM is at least part of the culprit. Cleared out lightdm* and reinstalled all of it. Result: After a proper systemctl restart lightdm.service, I can re-login perfectly; keeping the same instance of lightdm still doesn't work. And: the classic startx works like a charm. So, for the time being, replacing LightDM seems to be a good solution.

And there is only the one Kirkel. ;-)

Offline

#6 2016-08-07 17:53:42

Awebb
Member
Registered: 2010-05-06
Posts: 6,282

Re: Converting /var to a btrfs subvolume

1: I don't have enough Btrfs under my belt to be of assistance here. My setups have magically worked so far, but then again I'm more of the "LVM/ext4" type. :-)

2: Thank you for reinforcing my prejudice against LightDM. However, it is still possible that your open heart surgery left some files in /var in an unwanted state. Make sure to have another backup, before you selectively start nuking files and better run some "find ... -exec pacman -Qo", to see if anything is directly owned by a package. The most sensitive data in /var is logs, your production data and the pacman state files.

Kirkel: The one I was talking about, indeed, often drove by Kirkel on my way to St. Ingbert. I've clicked your profile website link, it is entirely in the realm of the possible that we have met at some point, I was on campus in 2012 for some CoLi. Sometimes the world is not as huge as it seems.

Last edited by Awebb (2016-08-07 17:55:50)

Offline

#7 2016-08-07 18:41:36

akobel
Member
From: Saarbrücken, Germany
Registered: 2016-02-12
Posts: 22
Website

Re: Converting /var to a btrfs subvolume

As to 2.), actually that's what I did already - clearing everything lightdm-related in /var (and everywhere else), and set it up from a freshly downloaded package. To no avail though. Which leaves two options, AFAICS: either the LightDM issue is entirely unrelated to the /var subvolume story, and I only recognized the problem now (no clue whether I ever switched users via LightDM on that machine after setting it up years ago...)/ Or something only remotely related to LightDM broke.
If it's the first, I'll find out sooner or later; if it's the second, I'm afraid I'll find out sooner or later with some nasty side effects. Anyway, for the time being, I have my workaround ready. Thanks for your pointers!

And indeed, if you were on CoLi campus in 2012, it's actually likely that we've met - a close friend of mine finished her PhD in CoLi in 2013, and my PhD advisor's wife became a research group leader around that time. Thus, I've seen a few of the people...

Offline

Board footer

Powered by FluxBB