You are not logged in.
Hello,
I'm using a setup based on an encrypted zfs partition. This worked many years until the zpool gets somehow corrupted. I've added an additional disk, created a new zpool and copied over the original pool as much as possible. Afterwards I've also recreated my UEFI partition on the new disk, formatted the old one and added the old one as additional vdev to the new pool (smartctl did not report any errors for the old disk). I've also reinstalled all packages.
Since then I'm observing the following issues:
NetworkManager and cups units does not start. They are shown as dead but starting them is no problem:
○ cups.service - CUPS Scheduler
Loaded: loaded (/usr/lib/systemd/system/cups.service; enabled; preset: disabled)
Active: inactive (dead)
TriggeredBy: ○ cups.socket
Docs: man:cupsd(8)
/boot is not mounted. I've already found a post stating that partition was dirty. I've fixed it with dosfsck but it does not help. Now event the unit boot.mount does not exist anymore.
The only failing unit is tpm2-abrmd.service. I've found out that permission of /dev/tpm0 are not correct:
crw-rw---- 1 963 root 10, 224 May 10 10:36 /dev/tpm0
My tss user has uid 974 and my udev rules are in place. After calling
udevadm trigger
permissions are correctly set so it seems that even udev is not running as it should be.
I've looked at
journalctl -xw
and tried different systemctl commands. I've scrolled through
dmesg
but I'm not getting what is going on here.
Any diagnostic held or idea would be helpful. Thank you!
Last edited by lepokle (2024-06-10 12:26:56)
Offline
TriggeredBy: ○ cups.socket
The service starts when you try to use it.
Now event the unit boot.mount does not exist anymore.
fstab/lsblk -f ?
my udev rules are in place
Try to change the rule to sth. more obvious, eg. "touch /tmp/tpm.udev" and see whether that's generated (ie. is the rule not applied or does the rule use the wrong UID)
Offline
Hi,
unfortunately the service seems not to start anymore when I want to print. I have to call
systemctl start cups
first (this was not necessary before).
Here is the ouptut:
leo@lepobookng ~ % lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
nvme1n1
├─nvme1n1p1
├─nvme1n1p2 zfs_member 5000 root_pool_2m4iog 9068665034321893512
└─nvme1n1p3
nvme0n1
├─nvme0n1p1 vfat FAT32 91C7-E83D 3.9G 2% /boot
├─nvme0n1p2 zfs_member 5000 root_pool_2m4iog 9068665034321893512
└─nvme0n1p3
leo@lepobookng ~ % cat /etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.
# <file system> <dir> <type> <options> <dump> <pass>
# ESP/BOOT partition
UUID=91C7-E83D /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 2
# SWAP space
/dev/mapper/swap none swap defaults 0 0
Running
mount /boot
after boot works without any errors.
The rule contains the right UID:
leo@lepobookng ~ % cat /usr/lib/udev/rules.d/60-tpm-udev.rules
# tpm devices can only be accessed by the tss user but the tss
# group members can access tpmrm devices
KERNEL=="tpm[0-9]*", TAG+="systemd", MODE="0660", OWNER="tss"
KERNEL=="tpmrm[0-9]*", TAG+="systemd", MODE="0660", GROUP="tss"
It seems that some basic things are not run on startup.
Offline
That's not a UID but a user/groupname but since fstab is apparently not parsed (or does the swap activate?)…
Please post your complete system journal for the boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
Offline
No, swap is not activated as well. I've setup crypt swap but it isn't loaded either.
root@lepobookng ~ # cat /etc/crypttab
# Configuration for encrypted block devices.
# See crypttab(5) for details.
# NOTE: Do not list your root (/) partition here, it must be set up
# beforehand by the initramfs (/etc/mkinitcpio.conf).
# <name> <device> <password> <options>
# home UUID=b8ad5c18-f445-495d-9095-c9ec4f9d2f37 /etc/mypassword1
# data1 /dev/sda3 /etc/mypassword2
# data2 /dev/sda5 /etc/cryptfs.key
# swap /dev/sdx4 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256
# vol /dev/sdb7 none
swap /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B00_S677NF0W500229-part3 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256
Here is the log:
http://0x0.st/X8Ym.txt
Offline
It looks like zed runs very late and you might end up looking at a different FS than the starting system?
What does the system look like if you're only booting the rescue.target?
Offline
Thanks for the tip. I had the following findings after booting into rescue.target:
systemd had some initial setup questions -> answered
owner/group on /dev/tpm* are correct
/etc/fstab is empty
/etc/crypttab is empty
journalctl -b is available at http://0x0.st/XKD4.txt
no failed systemd units
/boot not mounted (that should be expected since /etc/fstab was empty, correct?)
After switching to graphical.target I get the old symptoms:
no /boot
wrong ownership of /dev/tpm* !?!
some units are not starting
...
I've rechecked mkinitcpio.conf and added things present in .pacnew. I've regenerated initramfs-linux.img and checked it:
root@lepobookng /tmp/t # lsinitcpio -x /boot/initramfs-linux.img
root@lepobookng /tmp/t # ll
total 52
-rw-r--r-- 1 root root 4 May 15 06:47 VERSION
lrwxrwxrwx 1 root root 7 May 15 06:47 bin -> usr/bin
-rw-r--r-- 1 root root 3306 May 15 06:47 buildconfig
-rw-r--r-- 1 root root 122 May 15 06:47 config
-rw-r--r-- 1 root root 10558 May 15 06:47 consolefont.psfu
drwxr-xr-x 2 root root 40 May 15 06:47 dev
-rw-r--r-- 1 root root 2 May 15 06:47 early_cpio
drwxr-xr-x 4 root root 220 May 15 06:47 etc
drwxr-xr-x 2 root root 120 May 15 06:47 hooks
-rwxr-xr-x 1 root root 3325 May 15 06:47 init
-rw-r--r-- 1 root root 15577 May 15 06:47 init_functions
drwxr-xr-x 3 root root 60 May 15 06:47 kernel
-rw-r--r-- 1 root root 2567 May 15 06:47 keymap.bin
-rw-r--r-- 1 root root 0 May 15 06:47 keymap.utf8
lrwxrwxrwx 1 root root 7 May 15 06:47 lib -> usr/lib
lrwxrwxrwx 1 root root 7 May 15 06:47 lib64 -> usr/lib
drwxr-xr-x 2 root root 40 May 15 06:47 new_root
drwxr-xr-x 2 root root 40 May 15 06:47 proc
drwxr-xr-x 2 root root 40 May 15 06:47 run
lrwxrwxrwx 1 root root 7 May 15 06:47 sbin -> usr/bin
drwxr-xr-x 2 root root 40 May 15 06:47 sys
drwxr-xr-x 2 root root 40 May 15 06:47 tmp
drwxr-xr-x 5 root root 140 May 15 06:47 usr
drwxr-xr-x 2 root root 60 May 15 06:47 var
root@lepobookng /tmp/t # cat etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.
# <file system> <dir> <type> <options> <dump> <pass>
# ESP/BOOT partition
UUID=91C7-E83D /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 2
# SWAP space
/dev/mapper/swap none swap defaults 0 0
root@lepobookng /tmp/t #
So fstab seems to be correctly included. I've deleted my UKI files to be sure they got regenerated:
root@lepobookng /tmp/t # ll /boot/EFI/Linux/
total 150272
-rwxr-xr-x 1 root root 76936192 May 9 14:53 archlinux-debug.efi
-rwxr-xr-x 1 root root 76936192 May 9 14:53 archlinux-linux.efi
root@lepobookng /tmp/t #
However, I have no idea why date is 9th of May and not the current one.
I've ensured that the correct, current image is booted:
root@lepobookng /tmp/t # efibootmgr
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001,001A,001B,001C,001D,001E,001F,0020,0021,0022,0023,0024
Boot0001* Arch Linux (new disk) HD(1,GPT,8450fbdb-532a-4a6c-bd81-4bc03c18f71d,0x800,0x800800)/EFI\Linux\archlinux-linux.efi
Boot0010 Setup FvFile(721c8b66-426c-4e86-8e99-3457c46ab0b9)
Boot0011 Boot Menu FvFile(126a762d-5758-4fca-8531-201a7f57f850)
Boot0012 Diagnostic Splash Screen FvFile(a7d8d9a6-6ab0-4aeb-ad9d-163e59a7a380)
Boot0013 Lenovo Diagnostics FvFile(3f7e615b-0d45-4f80-88dc-26b234958560)
...
root@lepobookng /tmp/t # blkid
/dev/nvme0n1p3: PARTUUID="07e27fe0-1050-435d-9107-5ab644b645fc"
/dev/nvme0n1p1: UUID="9E19-6832" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="9b096c68-7885-498c-a5e7-1f364fad8c4c"
/dev/nvme0n1p2: LABEL="root_pool_2m4iog" UUID="9068665034321893512" UUID_SUB="7445512687706962544" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="41c47b83-6df2-40d0-983f-8fc35a1eca64"
/dev/nvme1n1p2: LABEL="root_pool_2m4iog" UUID="9068665034321893512" UUID_SUB="9685926068953788758" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="9932d973-bfee-4bce-b423-b57f6e60e3d4"
/dev/nvme1n1p3: PARTUUID="11d82a81-50b3-41e7-ab6c-d96706038fe2"
/dev/nvme1n1p1: UUID="91C7-E83D" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="8450fbdb-532a-4a6c-bd81-4bc03c18f71d"
I've even reformatted partition 1 on the old disk (nvme1n1p1) to be sure that no image/kernel from the old partition could be loaded.
Zpool is ok as well:
root@lepobookng /tmp/t # zpool status
pool: root_pool_2m4iog
state: ONLINE
scan: resilvered 676G in 00:17:36 with 0 errors on Fri Mar 29 13:06:28 2024
config:
NAME STATE READ WRITE CKSUM
root_pool_2m4iog ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme1n1p2 ONLINE 0 0 0
nvme0n1p2 ONLINE 0 0 0
errors: No known data errors
Offline
It looks like zed runs very late and you might end up looking at a different FS than the starting system
I can't really explain https://wiki.archlinux.org/title/ZFS#Automatic_Start or what should™ be done here, but if you end up ignoring the fstab from the pre-zed environment you'd have to update the /boot mountpoint after loading the pool that changes the root FS *somehow*
I assume a similar issue will affect /dev/tpm/* because zed will just create/mount a new devfs?
Do you have the zfs hook in your mkinitcpio.conf?
Offline
Hi all,
everything was correctly configured except of one thing: unfortunately I had put /etc on a separate dataset on my new disk. This was obviously not mounted during boot. Therefore it started with wrong configuration.
Thanks seth for your ideas which helped me to get things sorted and pointed me to the right direction.
Offline