You are not logged in.
Hi!
I currently have a 8TB slow mechanical HDD cached by a 500GB NVMe SSD.
Changing the cache mode of lvm-cache might or might not finish cleanly. This is like throwing some dice. User may be left with a non-mountable root and needs to detach and re-create the cache. I've had a similar error to todays one once before - but then, as I'm a bit new to LVM, thinking I had made an error, I restarted from the beginning (learning from previous experiences, I have automatic backups in place, so this was trivial / quite easy).
Example of a problem: Yesterday, I noticed that the cache has been recently filled up completely (this is fine and intended, in principle). Cache has been in writethrough mode up until this point (for some weeks). Thinking I do not want any subsequent writes to go straight to the cache (and possibly demoting data I actually read more often, preferring using it's algorithms to only promote data which is frequently being red), I changed the mode to passthrough:
lvchange --cachemode passthrough root/root
Then I noticed that all reads are going straight to the origin. I noticed my false assumption quite quickly by re-reading the documentation: I though passtrough is equivalent to writearound: i.e. no written data to the LV is written to cache but to origin only, and only red data may be promoted to cache (depending on the rules). But this if false - seems lvcache is missing a writearound cache mode altogether. With passtrough, the cache is skipped altogether for all write and read operations (as per documentation).
Noticing my mistake, I tried to change back to writethrough (cache was clean at this point, as reported by lvs/lvmdisplay). Here is where the problems begin:
lvchange --cachemode writethrough root/root
Result: The command never (in nearly an hour - frustrated, I went to the grocery store in the meantime) finished. The situation was recoverable by SysRQ (reboot). After reboot, I re-tried the command, with the same result.
Thinking the cache is in some weird state, I tried to change cache mode back to passthrough (although, at that time lvdisplay/lvs still showed cache mode as "passthrough") -> result: lvchange still hangs (recoverable with SysRQ)! After yet another reboot and trying to set mode to passtrough (despite it being in passtrough!), lvchange now reports flushing a huge number of dirty blocks (although there should be 0, as the state was clean and the transition to writethrough never finished - and even if it did, there should have been just few) - with the number never decrementing. At a loss on what to do, I did my final attempt at changing to writethrough (which is the end result I want).
After yet another reboot, rootfs was not found but I found myself in a busybox. From here, I could see that lvm does not (for some reason I could not determine) want to activate root/root. I uncached the rootfs (from the busybox):
lvconvert --uncache root/root
and was able to boot normally afterwards. I re-created the cache:
$ lvcreate --type cache --cachemode writethrough -l 100%PV -n root_cache root/root /dev/nvme0n1
$ lvchange --cachepolicy cleaner root/root
But now I'm left with no filled cache! In the end, everything was recoverable, except the time I lost to this, and time needed to re-fill the cache. Certainly, this is not still optimal!
Current setup is this:
$ sudo pvdisplay -m /dev/nvme0n1 && sudo lvdisplay -am root/root && sudo vgdisplay -v root
--- Physical volume ---
PV Name /dev/nvme0n1
VG Name root
PV Size 465,76 GiB / not usable 4,02 MiB
Allocatable yes (but full)
PE Size 4,00 MiB
Total PE 119234
Free PE 0
Allocated PE 119234
PV UUID Kmy3bA-Z8st-jwrd-eoxL-qFxZ-N5L4-SlB9KZ
--- Physical Segments ---
Physical extent 0 to 11:
Logical volume /dev/root/lvol0_pmspare
Logical extents 0 to 11
Physical extent 12 to 23:
Logical volume /dev/root/root_cache_cmeta
Logical extents 0 to 11
Physical extent 24 to 119233:
Logical volume /dev/root/root_cache_cdata
Logical extents 0 to 119209
--- Logical volume ---
LV Path /dev/root/root
LV Name root
VG Name root
LV UUID c2aKsQ-LayB-1HL0-ew1p-RzvO-eIfX-Cs6x24
LV Write Access read/write
LV Creation host, time ArkkiVille, 2020-12-15 14:53:42 +0200
LV Cache pool name root_cache
LV Cache origin name root_corig
LV Status available
# open 1
LV Size <7,28 TiB
Cache used blocks 0,73%
Cache metadata blocks 23,36%
Cache dirty blocks 0,00%
Cache read hits/misses 9799 / 368872
Cache wrt hits/misses 31726 / 127572
Cache demotions 0
Cache promotions 0
Current LE 1907465
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:3
--- Segments ---
Logical extents 0 to 1907464:
Type cache
Chunk size 512,00 KiB
Metadata format 2
Mode writethrough
Policy cleaner
--- Volume group ---
VG Name root
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 107
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 2
Act PV 2
VG Size 7,73 TiB
PE Size 4,00 MiB
Total PE 2026699
Alloc PE / Size 2026699 / 7,73 TiB
Free PE / Size 0 / 0
VG UUID OOoKiy-WJnu-pI0y-nBSW-n4kQ-sDaN-nrdQtw
--- Logical volume ---
LV Path /dev/root/root
LV Name root
VG Name root
LV UUID c2aKsQ-LayB-1HL0-ew1p-RzvO-eIfX-Cs6x24
LV Write Access read/write
LV Creation host, time ArkkiVille, 2020-12-15 14:53:42 +0200
LV Cache pool name root_cache
LV Cache origin name root_corig
LV Status available
# open 1
LV Size <7,28 TiB
Cache used blocks 0,73%
Cache metadata blocks 23,36%
Cache dirty blocks 0,00%
Cache read hits/misses 9800 / 368872
Cache wrt hits/misses 31726 / 127572
Cache demotions 0
Cache promotions 0
Current LE 1907465
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:3
--- Physical volumes ---
PV Name /dev/sdd3
PV UUID BSbjN3-4p5r-9I32-LK1r-EVvh-yxb8-rouvQJ
PV Status allocatable
Total PE / Free PE 1907465 / 0
PV Name /dev/nvme0n1
PV UUID Kmy3bA-Z8st-jwrd-eoxL-qFxZ-N5L4-SlB9KZ
PV Status allocatable
Total PE / Free PE 119234 / 0
This is identical to the starting point, with the exception I have now changed to "cleaner" policy and the cache is still mostly empty (before the "crash", it was 100% full, in passtrough, and with "smq" policy).
Now, because of these experiences I believe there is certainly some stability issues with lvm-cache and maybe I should file a bug report (in Arch, or upstream?). I can not see anything pointing towards H/W errors in my journal logs (which I can paste to pastebin, in their entirety if needed). It seems that lvm cache works just fine as long it does - but touch the cache settings, and things might just break.
I'm at bit of a loss as to what information might be relevant in this situation to make a bug report upstream - or how to confirm if this is indeed a bug. Or, have I done something stupid above / made an user error? Any other ideas?
What I can provide are the exact commands I've done (these are in my syslog as sudo logs them) save for the command from the busybox sessions (for these we have to rely on my memory). I could post the contents of /etc/lvm/archive/root_000* files.
I would appreciate any thoughts!
The commands I've used (along with mistyped ones - speaking of which, even the wrong --cache-mode causes lvchange to hang):
$ LANG=C journalctl --since 'yesterday' | grep -i Arkkiville\ sudo | grep -i 'lvc'
Jan 14 14:43:37 ArkkiVille sudo[880884]: ville : TTY=pts/9 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode passtrough root/root
Jan 14 14:43:44 ArkkiVille sudo[880898]: ville : TTY=pts/9 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode passthrough root/root
Jan 15 15:08:29 ArkkiVille sudo[7703]: ville : TTY=pts/11 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode writethrough
Jan 15 15:08:31 ArkkiVille sudo[7711]: ville : TTY=pts/11 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode writethrough root/root
Jan 15 15:55:56 ArkkiVille sudo[3149]: ville : TTY=pts/10 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode writethrough root/root
Jan 15 15:59:04 ArkkiVille sudo[2766]: ville : TTY=pts/12 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode passtrough root/root
Jan 15 15:59:08 ArkkiVille sudo[2768]: ville : TTY=pts/12 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode passthrough root/root
Jan 15 16:05:05 ArkkiVille sudo[1090]: ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode writethrough
Jan 15 16:05:07 ArkkiVille sudo[1092]: ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode writethrough root/root
Jan 15 16:25:01 ArkkiVille sudo[1048]: ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --help
Jan 15 16:32:10 ArkkiVille sudo[1387]: ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvcreate --type cache --cachemode writethrough -l 100%PV -n root_cache root/root /dev/nvme0n1
Jan 15 16:43:33 ArkkiVille sudo[4676]: ville : TTY=pts/7 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --help
Jan 15 16:44:24 ArkkiVille sudo[4830]: ville : TTY=pts/7 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachepolicy cleaner root/rdoot
Jan 15 16:44:26 ArkkiVille sudo[4837]: ville : TTY=pts/7 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachepolicy cleaner root/root
Cheers!
EDIT: I'm fairly certain I can reproduce this, since I've already managed to trigger this error twice!
p.s. A bit of background: I'm mostly using my computer for regular desktop usage and gaming. I would like the system to be snappy when starting applications and games - as I presume, most people would like. My 500GB (465GiB) SSD is not enough to have all S/W I want to be loaded fast in it (even a single game takes 100GiB+). Even 1TiB SSD could be filled quite easily. Solution is still (IMHO) to use an SSD as a cache, with the aim of having it as a hot-spot cache (primarily read cache!). Previously, I've used bcache. But it stopped caching reads altogether (still caching writes successfully, but only writes). I tried to find a solution via this forum and the mailing list, but didn't find any. Since then I've moved to have rootFS and parts of my home dir on the SSD, but because of space constraints, recently moved to having the rootFS on an lvm-cached LV. LVMcache works for reads .... until it doesn't, as I've described here!
Last edited by Wild Penguin (2021-01-15 16:14:58)
Offline