You are not logged in.
Pages: 1
I have some read performance issues that are mostly exclusive to BTRFS. I've done some digging and really am not too sure if there's much that can be done to improve the performance here. I recently got a 980 Pro and, being lazy, just used clonezilla to migrate my current arch install, on an SN750, over to that disk. Upon doing some light benchmarking using KDiskMark, it appeared that I had gotten a small performance increase, only getting about 1.5 GB/s compared to the advertised speeds of 7 GB/s. I don't expect that 7 GB/s will ever really be seen except on strange, bursty reads, however, I decided to do some digging. My initial thought was that the issue was with LUKS2: the default block size is 512. Though the Samsung SSDs say that their block size is 512 (and do not allow an NVMe format to change their blocksize to 4096), this seems to not be the case. The disk reads and writes 4096 blocks under the hood.
Moving forward, I decided to go back to my old disk, formatting the 980 Pro to do some testing. I partitioned the 980 Pro to a single partition using gparted. For the performance testing, I used a couple different tools. When using KDiskMark, SEQ1MQ8T1 with the default 1 GiB block size was used. I would delete and change the filesystem to btrfs or ext4 using the default mkfs.<fs> commands, only changing the blocksize as listed and adding the label "testing" to the partition. For mounting, I simply allowed gnome-disks to use the default mount options unless listed otherwise. It's also worth noting that I have the 980 Pro in the M.2 slot that is directly wired to the CPU.
I have received a few suggestions, and after the LUKS portion of the table I began to add fstrim or blkdiscards to ensure no performance issues. I also added some tests using fio itself. I'll include the fio scripts at the bottom of this post.
Crypto Blocksize Filesystem Blocksize Kernel Tool Read Write Options (cr) Options (fs) Other
ext4 4096 linux-zen KDiskMark 6545.66 4480.99
ext4 1024 linux-zen KDiskMark 5937.62 4295.66
btrfs 4096 linux-zen KDiskMark 1782.14 4459.77
btrfs 4096 linux-zen KDiskMark 1710.47 4047.47 rw,noatime,ssd,space_cache=v2
btrfs 4096 linux-zen KDiskMark 1876.87 4375.13 rw,noatime,ssd,space_cache=v2,nodatacow
btrfs 4096 linux-zen KDiskMark 1850.86 4208.55 rw,noatime,ssd,space_cache=v2,nodatasum
luks2 512 ext4 4096 linux-zen KDiskMark 3050.50 2638.06
luks2 512 ext4 1024 linux-zen KDiskMark 3114.68 2664.32
luks2 512 btrfs 4096 linux-zen KDiskMark 1465.74 2347.44
luks2 4096 ext4 4096 linux-zen KDiskMark 6172.50 4176.87
luks2 4096 btrfs 4096 linux-zen KDiskMark 1221.27 2830.69
luks2 4096 ext4 4096 linux-zen KDiskMark 2961.06 2138.33 perf-no_read_workqueue, perf-no_write_workqueue
luks2 4096 btrfs 4096 linux-zen KDiskMark 1178.82 3196.28 scheduler: none
btrfs 4096 linux-zen KDiskMark 1720.20 4255.63 nvme format, blkdiscard, scheduler: none
btrfs 4096 linux-zen KDiskMark 1727.14 4345.20 rw,noatime,ssd,space_cache=v2,compress=zstd:1 scheduler: none, fstrim
btrfs 4096 linux-zen fio read_uring 1978.00 fstrim
btrfs 4096 linux-zen fio read_libaio 1756.00
btrfs 4096 linux-zen fio read_uring 2010.00 rw,noatime,ssd,space_cache=v2,nodatacow deleted prior fio test file, fstrim
btrfs 4096 linux-zen fio read_libaio 1921.00 rw,noatime,ssd,space_cache=v2,nodatacow
ext4 4096 linux KDiskMark 6394.04 4308.89 fstrim
btrfs 4096 linux KDiskMark 1637.08 4184.36
btrfs 4096 linux KDiskMark 1651.09 4169.45 rw,noatime,ssd,space_cache=v2 fstrim
btrfs 4096 linux KDiskMark 1811.89 4513.29 rw,noatime,ssd,space_cache=v2,nodatacow fstrimBased on the testing results, my assumption about the 512 v 4096 blocksize for LUKS seems to have been correct. With ext4, there is a huge boost in performance when using the 4096 blocksize. That said, the BTRFS results remain quite disappointing. At this point I am leaning towards switching to LVM+ext4 but I currently enjoy btrfs features like the snapshotting, CoW, and bitrot protection. I'm pretty much grasping for straws here. I partially expected this to to go away with turning off CoW or checksumming but strangely that hasn't made a difference. The write performance is also befuddling to me. I use zfs raidz on other systems, so I'm used to seeing high write performance with the read performance of a single disk but there is an explanation there that makes sense to me.
I also did some brief testing with dd, though the accuracy of dd compared to fio is somewhat disputed. For what it's worth, I got 2.4 GB/s without CoW (with and without sync) and 2.3 GB/s without (2.2 GB/s with sync), only looking at read performance. The command I used was
dd if=fio_test_file of=/tmp/random_test bs=1M count=1024or
dd if=fio_test_file of=/tmp/random_test bs=1M count=1024 oflag=dsyncext4 consistently clocked in at 2.0 GB/s, regardless of the sync flag.
I'm using AMD hardware. This may be relevant in this case but I do not have enough data points to be entirely sure. After asking on Reddit and talking to a couple friends, it seems that for the three people using AMD hardware there are slowdowns on reads. The one person who was using Intel hardware reported that they got the 3 GB/s reads and writes expected of their PCIe 3.0 NVMe SSD. I'm hoping that this isn't a strange issue regarding AMD vs Intel PCIe implementation or something horrific like that, but if anyone has the bandwidth to test, I would greatly appreciate it if they could.
Does anyone have any ideas for further troubleshooting or any ideas for solutions? Thanks in advance.
Kernel: 5.15.5 (various)
SSD: Samsung 980 Pro
CPU: AMD Ryzen 7 3800X
Host: X570 AORUS ELITE WIFI -CF
fio benchmarking script
[read_uring]
directory=/foo/bar/testing/
filename=fio_test_file
direct=1
buffered=0
size=1g
startdelay=3
ramp_time=3
runtime=5
time_based
ioengine=io_uring
force_async=4
rw=read
bs=1m
iodepth=8For testing with libaio, I removed the force_async line and changed the ioengine to libaio.
Offline
Pages: 1