You are not logged in.

#1 2019-11-06 16:00:15

Wild Penguin
Registered: 2015-03-19
Posts: 72

bcache and very poor (nonexistent) read caching performance


I seem to be having problems getting bcache working properly at the moment - seems like read caching performance is very bad.

The problem is that no matter what I've tried, the performance for reads is as if the cache was not there. Things I've tried (including):

  • align with -w 4k and --bucket 2M (partially guesswork as it is difficult to say what the EBS size is on the SSD),

  • with and without discard, and

  • decrease sequential_cutoff to 0

  • thresholds to 0 (in /sys/fs/bcache/...)

  • change write cache mode (writeback -> writethorough and IIRC also writearound, but well, it should work regardless)

  • recently deleted all partitions on the SSD and made the whole 500GB device a backing device

The last thing was done in the hopes it is an alignment issue. This time, I didn't touch the default bucket and block sizes (as there should be decent performance increase even with non-optimal alignment - it's mainly meant to reduce wear leveling in any case, unless I'm mistaken). However, seems the performance is still atrociously bad, as in the SSD makes no effect whatsoever (for reads; I don't actually care for much for writes) to it being absent. Summary of the setup:

  • kernel-5.3.8 (and various older ones, but recent-ish kernels)

  • caching device: Samsung 960EVO, whole device as the single cache device (nvme0n1)

  • The NVME SSD has been on the Motherboard (Asus Maximus VII Gene) and on a separate PCIe card in a 3.0 PCIe slot (no effect which one is used)

  • backing device: many 5400RPM HDDs and partitions (the main one is shown below as /dev/sdc2). Formatted as ext4.

  • Current set (backing and cache) was created with default options, except discard was enabled enabled (make-bcache --discard -C ; make-bcache -B).

bcache-super show listings:

$ sudo bcache-super-show /dev/sdc2
sb.magic                ok
sb.first_sector         8 [match]
sb.csum                 FC6434BAB97C1B37 [match]
sb.version              1 [backing device]

dev.label               (empty)
dev.uuid                063814f0-a14b-4db5-9cd5-b98ef658993f
dev.sectors_per_block   1
dev.sectors_per_bucket  1024   16     1 [writeback]    2 [dirty]

cset.uuid               d6420bd9-a45f-4688-a9c0-217c88072449


$ sudo bcache-super-show /dev/nvme0n1 
sb.magic                ok
sb.first_sector         8 [match]
sb.csum                 70AE2DCA768AC61A [match]
sb.version              3 [cache device]

dev.label               (empty)
dev.uuid                7e099d6e-3426-49d8-bf55-1e79eacd59a4
dev.sectors_per_block   1
dev.sectors_per_bucket  1024
dev.cache.first_sector  1024
dev.cache.cache_sectors 976772096
dev.cache.total_sectors 976773120
dev.cache.ordered       yes
dev.cache.discard       yes
dev.cache.pos           0
dev.cache.replacement   0 [lru]

cset.uuid               d6420bd9-a45f-4688-a9c0-217c88072449

Curiously, there still seems to be decent amount of cache hits. Maybe the cache device is reading too slow? Or some bug prevents the cached data from being used (and the HDD is read)? I can see from a LED (on the NVMe adapter card) the SSD is trying to read (or write) data, but performance is just bad.

In stats_total:

$ grep -H . *

Rough speed test (to see the interfaces are working as expected):

$ sudo hdparm -Tt --direct /dev/nvme0n1

 Timing O_DIRECT cached reads:   4988 MB in  2.00 seconds = 2495.19 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing O_DIRECT disk reads: 5052 MB in  3.00 seconds = 1683.84 MB/sec

And for comparison:

$ sudo hdparm -Tt --direct /dev/sdc

 Timing O_DIRECT cached reads:   988 MB in  2.00 seconds = 493.55 MB/sec
 Timing O_DIRECT disk reads: 552 MB in  3.00 seconds = 183.97 MB/sec

Any ideas are welcome!


p.s. Some more (not so relevant) background: This is a desktop computer (mainly a toy but also for occasional serious stuff). I used to have a SATA SSD (Samsung EVO 840 IIRC). It was replaced by an NVME SSD, Samsung 960 EVO (500GB). I know roughly what the performance should be with SSD read caching working; with the previous SATA SSD, boot from power off (after Linux Kernel has been loaded) was ~15 seconds until the desktop environment has settled, and ~60seconds+ with a bare mechanical 5400RPM HDD (with KDE Plasma and some applications autostarting, including Firefox and tvheadend in the background). I actually got a bit of real-life benchmarks left from the days the caching used to work (things such as loading StarCraft II, starting LibreOffice, starting Blender etc. with and without the data in bcache - I can provide these numbers in case semone is interested; depending on application / test, loading times were cut into 1/4 -> 1/6th of the time). The only change in this setup is that the cache moved from SATA -> NVME, after which I've had these problems. IIRC I never got bcache to work properly for read caches with the NVME SSD, although it should be better than the previous SATA SSD!

EDIT: Moved stuff from my setup -> "things I've tried". Also noted I'm using ext4.

Last edited by Wild Penguin (2019-11-06 18:44:57)


#2 2019-11-12 05:53:29

Registered: 2016-06-02
Posts: 2

Re: bcache and very poor (nonexistent) read caching performance


I have had similar problems and am still diagnosing, but, have you checked the latency of read/writes to your caching device?

I am experiencing quite high latencies to my nvme (50-200ms) and the congestion controls in bcache seems to be kicking in. I am experimenting with turning off congestion control in bcache, and this seems to be having positive results for me at least.

Also, as a side note: I try to keep discard off, on both bcache and filesystem. I then use fstrim (as systemd timer) with '-m 1M' in (maybe vain) attempt at keeping free space fragmentation and write amplification on the nvme down.


#3 2019-11-12 16:37:19

Wild Penguin
Registered: 2015-03-19
Posts: 72

Re: bcache and very poor (nonexistent) read caching performance

Hi digitus, thanks for your reply!

I'm not sure how to test for latency. Actually, I'm not aware of any good HDD benchmarks for Linux. I know of fio and hdparm -T and -t, but I'm not sure how to use fio (maybe there is a fio for Dummies guide somewhere? I'm trying to study it right now). Hdparm -t test is too simple to be useful for most situation (I guess it is useful only as a rough sequential read speed test, to test if the link/HDD is working at all as it should).

Also, as stated in my OP, I have turned off congestion control (set congested_write_threshold_us and congested_read_threshold_us to 0), and I have also tried with discard off (for a prolonged time, actually for a few months). However, I just realized I'm not 100% sure I did test without "discard" in fstab for the ext4 volumes (only for the bcache cache device). AFAIK it should not make a big difference, since for example during a normal (and repeated) boots, there should be little deleting taking place. Also, not having a SSD with discard enabled, the option should not do anything.

None of these have any (discernible) effect.

What I did after my post, I've (hopefully temporarily) moved (well, copied) root to the SSD (on a bcache thinly provisioned volume) to actually get some benefit from having it in the computer in the first place. Now boot-up (to functional desktop) is <10seconds. I'm hoping I can figure out why the hot data caching isn't working, and move the root back to a mechanical HDD - the other alternative being, make an actual ext4 partition on the SSD (not one sitting on top of bcache), or use some other SSD caching method, such as LVM caching.

FWIW, I've also posted on bcache mailing list.

Please let me know in case you find something else!

(p.s. I've been under the assumption that with modern SSDs  and Kernels enabling discard should be fine, and actually preferable as it reduces the need for overprovisioning, Kernel and SSDs don't take a penalty hit be enabling it, and subsequently all the space is available for writes without erase, which should actually increase performance; but that's getting a bit off-topic, and disabling discard is something I've already tried. I'll see if disabling discard (for the ext4 fs) makes any difference).

Last edited by Wild Penguin (2019-11-12 16:44:11)


#4 2019-11-12 17:38:42

Wild Penguin
Registered: 2015-03-19
Posts: 72

Re: bcache and very poor (nonexistent) read caching performance


I moved root back to the mechanical HDD, and I disabled discard in the ext4 options and, not surprisingly, there is no effect.

I also did some very crude tests with dd. With sequential_bypass at 0, I created a 1GB file (as I'm not sure how to test non-sequential reads). I noticed this file will never be written to cache, unless cache mode is set to writethrough or writeback at creation time. More specifically:

  • cache_mode = none -> create file -> all reads (tested by flushing Kernel RAM cache by echo 3 > /proc/sys/vm/drop_caches) will result in ~100-150MB/s read. Despite changing cache_mode after file creation!

  • cache_mode = writethrough or writeback -> create file -> all reads (with cache flushed as above) (despite changing cache mode) will result in ~600-700MB/s read rate! Despite changing cache_mode after file creation.

So, it seems, that no data will be put into the cache if (just) red (or, sequential_cutoff is not working for reads). Data will be only put into cache if written to.

This just does not seem to be right, but does explain the (lack of) performance I'm observing. I definitely care much more about hot data than write caching (latter being mostly useless for desktop usage, as data is much more often red than written to)!


Board footer

Powered by FluxBB