You are not logged in.

#1 2020-11-17 20:22:11

Registered: 2020-01-14
Posts: 2

raid0.luks.xfs stack tuning and SSD page/block sizes

Hi. I hope you're all doing great.

Is it productive to determine a filesystem's block size, and the underlying mdadm-RAID0 chunk size, based on the array disks' SSD geometry? By "SSD geometry" I mean the NAND page and erase unit (block) sizes. I'm asking this because I'm configuring 2 DRAM-less SATA SSDs (Crucial BX500) to host a QCOW2 storage pool on a mdadm-RAID0 setup.

It's worth mentioning that the previous deployment of this system degenerated into a crawl for writing speed, and barely after a month of operation:

  • The storage pool was running on a RAID0.LUKS1.LVM2.EXT4 stack

  • There were a couple of daily swap (8 GB) dumps. I had to hibernate the machines "the old way" because saving machines with GPU passthrough seemed too tricky

  • No discard/TRIM/unmap was configured

  • I had accepted the SSD firmware's default overprovisioning of just 1/16 of physical capacity (it's a 256 GB drive that shows up as 240 GB)

  • MD chunk was 64 KiB, and ext4 block size was 4 KiB

I intend to move the swap disks of production VMs to a GPT.LUKS1.LVM2 partition from a single SSD (this one with DRAM), but I still would like to be able to regularly hibernate a few machines in the DRAM-less RAID0 array (a "VM scratch pad" of sorts).

With that said, here's what I'm planning to change so far:

  • Simplify the stack to RAID0.LUKS1.XFS

  • Enable continuous discard/TRIM/unmap support across the board (I know it weakens LUKS, but I accept the compromise)

  • Increase the SSD overprovisioning from 16 GB to 32 GB (the last 16 GB will not be partitioned)

  • Tweak QEMU cache strategies and QCOW2 image cluster_size

  • Take into consideration SSD geometry when setting filesystem block and mdadm chunk sizes

So here we are. After experimenting a bit with flashbench-git (AUR), it seems the NAND erase unit is 8 MiB long, and the page size amounts to 16 KiB.

Here's a very default configuration as a starting point:

mdadm_chunk_size_kib = mdadm_default_chunk_kib = 512
xfs_block_size_kib = xfs_default_block_size_kib = 4
stride = mdadm_chunk_size_kib / xfs_block_size_kib = 512 / 4 = 128
stripe_width = number_of_disks * stride = 2 * 128 = 256

From there, I set the XFS block to NAND page size, and also the MD chunk to NAND erase unit:

mdadm_chunk_size_kib = nand_erase_unit_kib = 8192
xfs_block_size_kib = nand_page_size_kib = 16
stride = mdadm_chunk_size_kib / xfs_block_size_kib = 8192 / 16 = 512
stripe_width = number_of_disks * stride = 2 * 512 = 1024

Finally, if I maximize the XFS block size (64 KiB, according to man mkfs.xfs), I get:

mdadm_chunk_size_kib = nand_erase_unit_kib = 8192
xfs_block_size_kib = xfs_maximum_block_size_kib = 64
stride = mdadm_chunk_size_kib / xfs_block_size_kib = 8192 / 64 = 128
stripe_width = number_of_disks * stride = 2 * 128 = 256

Any thoughts? I appreciate any input whatsoever (on any of the points I've mentioned).

Thank you all for your time.


Board footer

Powered by FluxBB