You are not logged in.

#1 2021-01-15 16:01:47

Wild Penguin
Member
Registered: 2015-03-19
Posts: 197

lvmcache: changing cache mode seems to not finish cleanly

Hi!

I currently have a 8TB slow mechanical HDD cached by a 500GB NVMe SSD.

Changing the cache mode of lvm-cache might or might not finish cleanly. This is like throwing some dice. User may be left with a non-mountable root and needs to detach and re-create the cache. I've had a similar error to todays one once before - but then, as I'm a bit new to LVM, thinking I had made an error, I restarted from the beginning (learning from previous experiences, I have automatic backups in place, so this was trivial / quite easy).

Example of a problem: Yesterday, I noticed that the cache has been recently filled up completely (this is fine and intended, in principle). Cache has been in writethrough mode up until this point (for some weeks). Thinking I do not want any subsequent writes to go straight to the cache (and possibly demoting data I actually read more often, preferring using it's algorithms to only promote data which is frequently being red), I changed the mode to passthrough:

lvchange --cachemode passthrough root/root

Then I noticed that all reads are going straight to the origin. I noticed my false assumption quite quickly by re-reading the documentation: I though passtrough is equivalent to writearound: i.e. no written data to the LV is written to cache but to origin only, and only red data may be promoted to cache (depending on the rules). But this if false - seems lvcache is missing a writearound cache mode altogether. With passtrough, the cache is skipped altogether for all write and read operations (as per documentation).

Noticing my mistake, I tried to change back to writethrough (cache was clean at this point, as reported by lvs/lvmdisplay). Here is where the problems begin:

lvchange --cachemode writethrough root/root

Result: The command never (in nearly an hour - frustrated, I went to the grocery store in the meantime) finished. The situation was recoverable by SysRQ (reboot). After reboot, I re-tried the command, with the same result.

Thinking the cache is in some weird state, I tried to change cache mode back to passthrough (although, at that time lvdisplay/lvs still showed cache mode as "passthrough") -> result: lvchange still hangs (recoverable with SysRQ)! After yet another reboot and trying to set mode to passtrough (despite it being in passtrough!), lvchange now reports flushing a huge number of dirty blocks (although there should be 0, as the state was clean and the transition to writethrough never finished - and even if it did, there should have been just few) - with the number never decrementing. At a loss on what to do, I did my final attempt at changing to writethrough (which is the end result I want).


After yet another reboot, rootfs was not found but I found myself in a busybox. From here, I could see that lvm does not (for some reason I could not determine) want to activate root/root. I uncached the rootfs (from the busybox):

lvconvert --uncache root/root

and was able to boot normally afterwards. I re-created the cache:

$ lvcreate --type cache --cachemode writethrough -l 100%PV -n root_cache root/root /dev/nvme0n1
$ lvchange --cachepolicy cleaner root/root

But now I'm left with no filled cache! In the end, everything was recoverable, except the time I lost to this, and time needed to re-fill the cache. Certainly, this is not still optimal!

Current setup is this:

$ sudo pvdisplay -m /dev/nvme0n1 && sudo lvdisplay -am root/root &&  sudo vgdisplay -v root 
  --- Physical volume ---
  PV Name               /dev/nvme0n1
  VG Name               root
  PV Size               465,76 GiB / not usable 4,02 MiB
  Allocatable           yes (but full)
  PE Size               4,00 MiB
  Total PE              119234
  Free PE               0
  Allocated PE          119234
  PV UUID               Kmy3bA-Z8st-jwrd-eoxL-qFxZ-N5L4-SlB9KZ
   
  --- Physical Segments ---
  Physical extent 0 to 11:
    Logical volume      /dev/root/lvol0_pmspare
    Logical extents     0 to 11
  Physical extent 12 to 23:
    Logical volume      /dev/root/root_cache_cmeta
    Logical extents     0 to 11
  Physical extent 24 to 119233:
    Logical volume      /dev/root/root_cache_cdata
    Logical extents     0 to 119209
   
  --- Logical volume ---
  LV Path                /dev/root/root
  LV Name                root
  VG Name                root
  LV UUID                c2aKsQ-LayB-1HL0-ew1p-RzvO-eIfX-Cs6x24
  LV Write Access        read/write
  LV Creation host, time ArkkiVille, 2020-12-15 14:53:42 +0200
  LV Cache pool name     root_cache
  LV Cache origin name   root_corig
  LV Status              available
  # open                 1
  LV Size                <7,28 TiB
  Cache used blocks      0,73%
  Cache metadata blocks  23,36%
  Cache dirty blocks     0,00%
  Cache read hits/misses 9799 / 368872
  Cache wrt hits/misses  31726 / 127572
  Cache demotions        0
  Cache promotions       0
  Current LE             1907465
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:3
   
  --- Segments ---
  Logical extents 0 to 1907464:
    Type                cache
    Chunk size          512,00 KiB
    Metadata format     2
    Mode                writethrough
    Policy              cleaner
   
   
  --- Volume group ---
  VG Name               root
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  107
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               7,73 TiB
  PE Size               4,00 MiB
  Total PE              2026699
  Alloc PE / Size       2026699 / 7,73 TiB
  Free  PE / Size       0 / 0   
  VG UUID               OOoKiy-WJnu-pI0y-nBSW-n4kQ-sDaN-nrdQtw
   
  --- Logical volume ---
  LV Path                /dev/root/root
  LV Name                root
  VG Name                root
  LV UUID                c2aKsQ-LayB-1HL0-ew1p-RzvO-eIfX-Cs6x24
  LV Write Access        read/write
  LV Creation host, time ArkkiVille, 2020-12-15 14:53:42 +0200
  LV Cache pool name     root_cache
  LV Cache origin name   root_corig
  LV Status              available
  # open                 1
  LV Size                <7,28 TiB
  Cache used blocks      0,73%
  Cache metadata blocks  23,36%
  Cache dirty blocks     0,00%
  Cache read hits/misses 9800 / 368872
  Cache wrt hits/misses  31726 / 127572
  Cache demotions        0
  Cache promotions       0
  Current LE             1907465
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:3
   
  --- Physical volumes ---
  PV Name               /dev/sdd3     
  PV UUID               BSbjN3-4p5r-9I32-LK1r-EVvh-yxb8-rouvQJ
  PV Status             allocatable
  Total PE / Free PE    1907465 / 0
   
  PV Name               /dev/nvme0n1     
  PV UUID               Kmy3bA-Z8st-jwrd-eoxL-qFxZ-N5L4-SlB9KZ
  PV Status             allocatable
  Total PE / Free PE    119234 / 0

This is identical to the starting point, with the exception I have now changed to "cleaner" policy and the cache is still mostly empty (before the "crash", it was 100% full, in passtrough, and with "smq" policy).

Now, because of these experiences I believe there is certainly some stability issues with lvm-cache and maybe I should file a bug report (in Arch, or upstream?). I can not see anything pointing towards H/W errors in my journal logs (which I can paste to pastebin, in their entirety if needed). It seems that lvm cache works just fine as long it does - but touch the cache settings, and things might just break.

I'm at bit of a loss as to what information might be relevant in this situation to make a bug report upstream - or how to confirm if this is indeed a bug. Or, have I done something stupid above / made an user error? Any other ideas?

What I can provide are the exact commands I've done (these are in my syslog as sudo logs them) save for the command from the busybox sessions (for these we have to rely on my memory). I could post the contents of /etc/lvm/archive/root_000* files.

I would appreciate any thoughts!

The commands I've used (along with mistyped ones - speaking of which, even the wrong --cache-mode causes lvchange to hang):

$ LANG=C journalctl --since 'yesterday' | grep -i Arkkiville\ sudo | grep -i 'lvc'
Jan 14 14:43:37 ArkkiVille sudo[880884]:    ville : TTY=pts/9 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode passtrough root/root
Jan 14 14:43:44 ArkkiVille sudo[880898]:    ville : TTY=pts/9 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode passthrough root/root
Jan 15 15:08:29 ArkkiVille sudo[7703]:    ville : TTY=pts/11 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode writethrough
Jan 15 15:08:31 ArkkiVille sudo[7711]:    ville : TTY=pts/11 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachemode writethrough root/root
Jan 15 15:55:56 ArkkiVille sudo[3149]:    ville : TTY=pts/10 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode writethrough root/root
Jan 15 15:59:04 ArkkiVille sudo[2766]:    ville : TTY=pts/12 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode passtrough root/root
Jan 15 15:59:08 ArkkiVille sudo[2768]:    ville : TTY=pts/12 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode passthrough root/root
Jan 15 16:05:05 ArkkiVille sudo[1090]:    ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode writethrough
Jan 15 16:05:07 ArkkiVille sudo[1092]:    ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cache-mode writethrough root/root
Jan 15 16:25:01 ArkkiVille sudo[1048]:    ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --help
Jan 15 16:32:10 ArkkiVille sudo[1387]:    ville : TTY=tty1 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvcreate --type cache --cachemode writethrough -l 100%PV -n root_cache root/root /dev/nvme0n1
Jan 15 16:43:33 ArkkiVille sudo[4676]:    ville : TTY=pts/7 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --help
Jan 15 16:44:24 ArkkiVille sudo[4830]:    ville : TTY=pts/7 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachepolicy cleaner root/rdoot
Jan 15 16:44:26 ArkkiVille sudo[4837]:    ville : TTY=pts/7 ; PWD=/home/ville ; USER=root ; COMMAND=/usr/bin/lvchange --cachepolicy cleaner root/root

Cheers!

EDIT: I'm fairly certain I can reproduce this, since I've already managed to trigger this error twice!

p.s. A bit of background: I'm mostly using my computer for regular desktop usage and gaming. I would like the system to be snappy when starting applications and games - as I presume, most people would like. My 500GB (465GiB) SSD is not enough to have all S/W I want to be loaded fast in it (even a single game takes 100GiB+). Even 1TiB SSD could be filled quite easily. Solution is still (IMHO) to use an SSD as a cache, with the aim of having it as a hot-spot cache (primarily read cache!). Previously, I've used bcache. But it stopped caching reads altogether (still caching writes successfully, but only writes). I tried to find a solution via this forum and the mailing list, but didn't find any. Since then I've moved to have rootFS and parts of my home dir on the SSD, but because of space constraints, recently moved to having the rootFS on an lvm-cached LV. LVMcache works for reads .... until it doesn't, as I've described here!

Last edited by Wild Penguin (2021-01-15 16:14:58)

Offline

Board footer

Powered by FluxBB