You are not logged in.
I've lost the ability to log in to my systemd-homed user account.
If I log in as root and try to authenticate:
# homectl authenticate azymohliad
Then after typing my password I get the following output:
Operation on home azymohliad failed: Not enough disk space for home azymohliad
It also produces the following system logs:
сер 22 09:11:08 az-wolf-pc systemd-homed[425]: azymohliad: changing state inactive → authenticating
сер 22 09:11:08 az-wolf-pc systemd-homework[1215]: None of the supplied plaintext passwords unlocks the user record's hashed passwords.
сер 22 09:11:08 az-wolf-pc systemd-homed[425]: Authentication failed: Required key not available
сер 22 09:11:08 az-wolf-pc systemd-homed[425]: azymohliad: changing state authenticating → inactive
сер 22 09:11:23 az-wolf-pc systemd-homed[425]: azymohliad: changing state inactive → authenticating
сер 22 09:11:23 az-wolf-pc systemd-homework[1216]: Provided password unlocks user record.
сер 22 09:11:23 az-wolf-pc systemd-homed[425]: Authentication failed: No space left on device
сер 22 09:11:23 az-wolf-pc systemd-homed[425]: azymohliad: changing state authenticating → inactive
And here is the full log since the last boot.
My root filesystem is BTRFS, home is LUKS-encrypted BTRFS on a loopback file. Here's the details:
# homectl inspect azymohliad
User name: azymohliad
State: inactive
Disposition: regular
Last Change: Thu 2020-06-25 17:41:52 EEST
Last Passw.: Thu 2020-06-04 19:04:43 EEST
Login OK: yes
Password OK: yes
UID: 60265
GID: 60265 (azymohliad)
Aux. Groups: audio
docker
wheel
Real Name: Andrii Zymohliad
Directory: /home/azymohliad
Storage: luks (strong encryption)
Image Path: /home/azymohliad.home
Removable: no
Shell: /usr/bin/fish
LUKS Discard: online=no offline=yes
LUKS UUID: 4ed4c05040e4429ca0163bb40587ec2d
Part UUID: 3ed283c030ab42778c1fb75aeeccc88e
FS UUID: 4ffae38b42c94e5389a13d21cd862938
File System: btrfs
LUKS Cipher: aes
Cipher Mode: xts-plain64
Volume Key: 256bit
Mount Flags: nosuid nodev exec
Disk Size: 402.7G
Disk Floor: 256.0M
Disk Ceiling: 429.3G
Good Auth.: 362
Last Good: Fri 2020-08-21 19:23:27 EEST
Bad Auth.: 128
Last Bad: Sat 2020-08-22 09:45:32 EEST
Next Try: anytime
Auth. Limit: 30 attempts per 1min
Passwords: 1
Local Sig.: yes
Service: io.systemd.Home
Before I try to authenticate, my root filesystem usage looks like this (every time after boot):
# btrfs fi usage /
Overall:
Device size: 476.44GiB
Device allocated: 352.02GiB
Device unallocated: 124.42GiB
Device missing: 0.00B
Used: 302.98GiB
Free (estimated): 173.01GiB (min: 173.01GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 68.67MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:351.01GiB, Used:302.42GiB (86.16%)
/dev/nvme0n1p2 351.01GiB
Metadata,single: Size:1.01GiB, Used:574.05MiB (55.62%)
/dev/nvme0n1p2 1.01GiB
System,single: Size:4.00MiB, Used:64.00KiB (1.56%)
/dev/nvme0n1p2 4.00MiB
Unallocated:
/dev/nvme0n1p2 124.42GiB
What's interesting, allocated size is only 352G, although home image size is 400G (with a lot of free space inside).
# ls -lh /home
total 257G
drwx------ 1 root root 0 чер 4 17:50 azymohliad/
-rw------- 1 root root 403G сер 21 19:26 azymohliad.home
I wonder if that's some BTRFS magic or does it already mean my home is corrupted?
After trying to authenticate, the root filesystem usage is this:
# btrfs fi usage /
Overall:
Device size: 476.44GiB
Device allocated: 476.44GiB
Device unallocated: 1.00MiB
Device missing: 0.00B
Used: 302.98GiB
Free (estimated): 173.00GiB (min: 173.00GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 68.67MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:475.43GiB, Used:302.42GiB (63.61%)
/dev/nvme0n1p2 475.43GiB
Metadata,single: Size:1.01GiB, Used:574.08MiB (55.63%)
/dev/nvme0n1p2 1.01GiB
System,single: Size:4.00MiB, Used:80.00KiB (1.95%)
/dev/nvme0n1p2 4.00MiB
Unallocated:
/dev/nvme0n1p2 1.00MiB
Most likely this is unrelated, but the last thing I did on a working system is I added /etc/security/limits.d/audio.conf file according to this article. Then I logged out and wasn't able to log in again. Although I did the same on another computer with the same homed configuration (but more disk space) and it didn't break anything.
More likely I've just filled up the root partition I guess (that was probably stupid to allocate 400G for home and leave only 75G for root in the first place), but I tried to clean the pacman cache, removed some flatpak runtimes, etc, so overall I think I should've released more space than I could occupy during the last working session. Still it doesn't help...
Thanks for reading this far. Any help or ideas would be very appreciated!
Last edited by andrii_zymohliad (2020-08-24 09:20:19)
Offline
Ok, at least I can confirm that my home is not corrupted. I'm able to mount it using:
losetup -fP /home/azymohliad.home
cryptsetup open /dev/loop0p1 home
mount /dev/mapper/home /mnt
btrfs check on both home and root reports no errors.
Offline
btrfs fi usage /mnt ?
Does it have free space?
Maybe homed sees the oposite...
I’m not savy with btrfs but at least you have your password and home loop ok...
Offline
btrfs fi usage /mnt ?
Does it have free space?
Yes, there are plenty here
# btrfs fi usage /mnt
Overall:
Device size: 402.72GiB
Device allocated: 258.02GiB
Device unallocated: 144.70GiB
Device missing: 0.00B
Used: 221.92GiB
Free (estimated): 179.53GiB (min: 179.53GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 297.11MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:256.01GiB, Used:221.18GiB (86.40%)
/dev/mapper/home 256.01GiB
Metadata,single: Size:2.01GiB, Used:749.92MiB (36.47%)
/dev/mapper/home 2.01GiB
System,single: Size:4.00MiB, Used:48.00KiB (1.17%)
/dev/mapper/home 4.00MiB
Unallocated:
/dev/mapper/home 144.70GiB
Offline
The audio thing is maybe unrelated, did you had a systemd update recently?
Or the pam update that deprecated tally2?
You might need to check for the .pacnews in /etc/pam.d
If that’s not it then try downgrading systemd until the issue doesn’t replicate, too bad that you wiped your pacman cache...
You can use the ArchLinux Rollback Machine.
If a systemd downgrade makes it stop replicating then report to systemd as an issue, or keep watching an already existing one if any...
Now that I think of it, can’t you use btrfs snapshots to go back a couple of versions?
If you can do so without breaking something that may make it faster for you...
Offline
During the last working session I didn't upgrade any packages, so I doubt downgrade would help.
I've had system-auth.pacnew in /etc/pam.d, I already applied it, it didn't change anything.
Unfortunately, I didn't create any btrfs snapshots before the issue...
I've asked on Systemd mailing list, and with debug logs they figured out that fallocate call fails when I'm trying to authenticate. They suggested to double check it manually:
fallocate -l 403G -n /home/azymohliad.home
and if it fails, it's better to ask on BTRFS list.
It fails, so I'm going to BTRFS mailing list now.
If that won't lead to anything, I'll try downgrading I think...
Thanks for your ideas!
Offline
So thanks to people on BTRFS ans Systemd mailing lists, I can log in again. The workaround was to enable discard on home partition:
homectl update --luks-discard=on azymohliad
Although, discard on encrypted data is often not recommended.
In my case, /home/azymohliad.home image got sparse before somehow (403GiB internal size, but takes 257GiB on root fs), possibly due to discard on log out.
I had this line in my homectl inspect:
LUKS Discard: online=no offline=yes
And from homectl --help it seems that "offline discard" here means that discard is performed on log out. But if "online discard" is disabled, systemd tries to allocate full home filesystem size (i.e. 403GiB here), which failed (still not clear why, rootfs seems to have enough free space). If "online discard" is enabled though, it doesn't try to allocate full home size, so it fixes the issue for me.
The investigation is still going on here on systemd mailing list (whether or not it is an issue of systemd-homed and what to do about it).
And here is the relevant discussion on BTRFS mailing list.
Last edited by andrii_zymohliad (2020-08-24 11:32:43)
Offline
Although, discard on encrypted data is often not recommended.
You have a gigantic loop file and discard operation in this case simply means telling the backing filesystem to free unused space (aka hole punching). This is unrelated to SSD, it would work the same way on HDD, since it's just the filesystem managing the space allocated to the loop file.
Using loop files this way is a bit risky, it's twice the filesystem overhead and fragmentation, in case of filesystem corruption you can't recover anything, with sparse loop file you can inadvertently overallocate and run into unexpected errors - no space left on the backing filesystem translates into arbitrary I/O errors on the loop filesystem.
systemd-homed preventing you from using it at all while the backing filesystem is uncooperative, might be a stroke of luck actually.
Can't really recommend this approach for production use, if you want it reliable, use a genuine block device (LVM or similar) instead of looping a 400GB file.
Otherwise just be sure to have a good backup at all times.
Offline
with sparse loop file you can inadvertently overallocate and run into unexpected errors - no space left on the backing filesystem translates into arbitrary I/O errors on the loop filesystem.
Yeah, this was also mentioned in that systemd mailing list thread. I think to homectl resize azymohliad 300G for now to reduce that risk.
Using loop files this way is a bit risky, it's twice the filesystem overhead and fragmentation, in case of filesystem corruption you can't recover anything
Can't really recommend this approach for production use, if you want it reliable, use a genuine block device (LVM or similar) instead of looping a 400GB file.
Yeah, i've been thinking about risks of this approach too. It's interesting that this is the most default way of systemd-homed (as I understand from the wiki). I would love to have it on LVM volume or a separate partition, but as I understand for LUKS storage option, systemd-homed requires a full device (with partition table) and not just one partition. Or am I wrong?
Offline
You're correct systemd-homed requires a whole disk. See systemd issue 15273.
Please take the time to document your issue in the wiki.
Offline
Please take the time to document your issue in the wiki.
Ok, I will try (never edited wiki before). I'd wait for more replies on systemd mailing list, because it's still not entirely clear why this happened.
Offline
but as I understand for LUKS storage option, systemd-homed requires a full device (with partition table) and not just one partition. Or am I wrong?
You're right, it was not clear to me (from documentation), which simply mentions that block devices are possible... it does give USB stick as an example and I guess it makes sense to partition that, but isn't that a bit too specific a use case?
Wonder what happens when you symlink /home/user.home -> /dev/VG/LV but ... it's probably better to not even make the attempt. ;-)
Offline
Wonder what happens when you symlink /home/user.home -> /dev/VG/LV but ... it's probably better to not even make the attempt. ;-)
This is almost exactly what was tried in the systemd issue #15273 (that nl6720 posted above). See the last snippet in the description. The only difference, it uses --image-path and not a symlink. And as Lennart Poettering replied, it wouldn't work because systemd-homed wants the top-level disk device to put partition table on it for exclusive use. The good news is that he seems to agree that partition/volume support would be good to have, so we might see it in future versions hopefully.
Last edited by andrii_zymohliad (2020-08-24 12:15:55)
Offline