You are not logged in.
Hi everyone,
I am seeing a big performance impact with LUKS2 on my system. I am not sure if this is normal so I thought I would ask here.
System:
Thinkpad T14s Gen3 AMD
CPU: Ryzen 7 6850u
RAM: 32GB RAM 6400MHz
NVME: Solidigm P44 Pro 2TB
Kernel: 6.3.1 with amd_pstate=active
Filesystem Linux: EXT4
Filesystem Windows: NTFS
Some benchmarks / speed tests on Windows 10:
- Copying a 50GB file: 18 seconds
- CrystalDiskMark benchmark: https://imgur.com/a/1okVrpY
Arch Linux:
- Copying a 50GB file: 38 seconds
- KDiskMark benchmark: https://imgur.com/a/8Tc6pWS
The performance impact is quite huge but based on the cryptsetup benchmark it should be a lot faster.
cryptsetup -v status lvm
/dev/mapper/lvm is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
key location: keyring
device: /dev/nvme0n1p6
sector size: 512
offset: 32768 sectors
size: 2951163904 sectors
mode: read/write
flags: discards no_read_workqueue no_write_workqueue
cryptsetup luksDump /dev/nvme0n1p6
LUKS header information
Version: 2
Epoch: 6
Metadata area: 16384 [bytes]
Keyslots area: 16744448 [bytes]
UUID: x
Label: (no label)
Subsystem: (no subsystem)
Flags: no-read-workqueue no-write-workqueue
Data segments:
0: crypt
offset: 16777216 [bytes]
length: (whole device)
cipher: aes-xts-plain64
sector: 512 [bytes]
Keyslots:
0: luks2
Key: 512 bits
Priority: normal
Cipher: aes-xts-plain64
Cipher key: 512 bits
PBKDF: argon2id
Time cost: 9
Memory: 1048576
Threads: 4
AF stripes: 4000
AF hash: sha256
Area offset:290816 [bytes]
Area length:258048 [bytes]
Digest ID: 0
Tokens:
Digests:
0: pbkdf2
Hash: sha256
Iterations: 329740
fdisk -l
Disk /dev/nvme0n1: 1,86 TiB, 2048408248320 bytes, 4000797360 sectors
Disk model: SOLIDIGM SSDPFKKW020X7
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 58411B52-D1AC-4175-87AB-8D0F4645D891
Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 206847 204800 100M EFI System
/dev/nvme0n1p2 206848 239615 32768 16M Microsoft reserved
/dev/nvme0n1p3 239616 1047532172 1047292557 499,4G Microsoft basic data
/dev/nvme0n1p4 1047533568 1048575999 1042432 509M Windows recovery environment
/dev/nvme0n1p5 1048576000 1049599999 1024000 500M Linux extended boot
/dev/nvme0n1p6 1049600000 4000796671 2951196672 1,4T Linux filesystem
Disk /dev/mapper/lvm: 1,37 TiB, 1510995918848 bytes, 2951163904 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/MyVolumeGroup: 1,37 TiB, 1510456950784 bytes, 2950111232 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/zram0: 15,06 GiB, 16173236224 bytes, 3948544 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 2744963 iterations per second for 256-bit key
PBKDF2-sha256 5197402 iterations per second for 256-bit key
PBKDF2-sha512 2028193 iterations per second for 256-bit key
PBKDF2-ripemd160 1093405 iterations per second for 256-bit key
PBKDF2-whirlpool 846991 iterations per second for 256-bit key
argon2i 10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 1427,5 MiB/s 5925,7 MiB/s
serpent-cbc 128b 136,8 MiB/s 997,3 MiB/s
twofish-cbc 128b 271,9 MiB/s 515,2 MiB/s
aes-cbc 256b 1094,0 MiB/s 4888,9 MiB/s
serpent-cbc 256b 141,7 MiB/s 997,9 MiB/s
twofish-cbc 256b 281,1 MiB/s 514,7 MiB/s
aes-xts 256b 4782,6 MiB/s 4821,1 MiB/s
serpent-xts 256b 872,4 MiB/s 886,4 MiB/s
twofish-xts 256b 475,8 MiB/s 490,4 MiB/s
aes-xts 512b 4060,4 MiB/s 4112,0 MiB/s
serpent-xts 512b 898,6 MiB/s 883,8 MiB/s
twofish-xts 512b 480,9 MiB/s 489,3 MiB/s
cpupower frequency-info
analyzing CPU 5:
driver: amd_pstate_epp
CPUs which run at the same hardware frequency: 5
CPUs which need to have their frequency coordinated by software: 5
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 4.77 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 400 MHz and 4.77 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 2.63 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
Boost States: 0
Total States: 3
Pstate-P0: 2700MHz
Pstate-P1: 1800MHz
Pstate-P2: 1600MHz
So given the results of the benchmark, my speed should be atleast twice as fast as it currently is on Linux?
I also noticed when copying the 50GB file that only one CPU thread hits 100% while I have a total of 16 threads available.
Did I configure something wrong or is the impact I am seing normal and can't be optimized?
Last edited by Utini (2023-05-07 16:58:25)
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
It'll use more threads, when you copy multiple files rather than one big one. Also check this info.
Offline
Hi, I already set those parameters. The "cryptsetup -v status lvm" shows it.
@Edit:
Just noticed that removing those two parameters doubles the speed in the benchmark.
Last edited by Utini (2023-05-08 08:22:32)
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
Ah, yes, you already set the parameters. I have no other suggestions, yet note the benchmark is memory only (not involving disc IO).
Offline
The benchmark is memory only?
Can you elaborate that please? It is a disk benchmark program after all?
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
No, it's an d/encryption benchmark.
# Tests are approximate using memory only (no storage IO).
Just noticed that removing those two parameters doubles the speed in the benchmark.
- Copying a 50GB file: 18 seconds / windows (on the LUKS drive?)
- Copying a 50GB file: 38 seconds / linux
(2*18)/38 == 94.74%
Offline
The copying test wasn't a good way to compare speeds.
I copied while using Dolphin and that seems to have a bug and causing slower speeds.
With the "cp" command I can actually transfer 50GB in 8 seconds.
So what can / should I use to compare my NVME speed on Arch (with LUKS / LVM) vs Windows?
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
Does windows access the LUKS2 drive at all?
What drive are you testing there?
Offline
It is one drive (Solidigm P44 Pro 2TB).
One of the partitions on it is NTFS Windows 10.
So I would like to test Windows 10 (with no encryption) vs Arch with LUKS2.
The reason behind this is that I would like to figure out the impact on performance that encryption has.
And ofcourse to verify that everything is configured correctly in Arch regarding performance.
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
So I would like to test Windows 10 (with no encryption) vs Arch with LUKS2.
What's the point of that?
The reason behind this is that I would like to figure out the impact on performance that encryption has.
Then you should measure the performance against linux on the same FS on the same drive and not introduce a bunch of irrelvant variables (windows + ntfs)
Offline
The point is that Windows is currently my only reference as on how fast the NVME drive can perform.
So my only way to find out if my NVME drive is configured / working correctly on Linux, is by comparing it with Windows?
The fact that I have seen such high differences in the benchmark (7.000MB/S vs 2.000MB/S) made me worry that something is wrong my Linux configuration.
Last edited by Utini (2023-05-08 13:49:50)
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
If you want to benchmark the filesystem you need to eliminate all variables, a GUI benchmark, probably in a KDE session while perhaps baloo bangs the drive and some plasma gadgets hog the CPU is hardly relevant (as you already figured, cp outperformed your windows copy by a mile. It'll however proabably also not have synced the copy)
Boot the rescue.target and measure the performance w/ dd, https://wiki.archlinux.org/title/Benchmarking#dd
No idea how to get a clean benchmark on windows, though.
You could test d on the ntfs partition (which carries some overhead and apparently there's a write-preventing bug in ntfs3 while ntfs-3g is userspace) or some vfat partition (w/ expectably less overhead for the FS) to ***somehwhat*** ballpark the condition.
Offline
Hmm I would argue that the running session is part of what I want to verify in my test.
But I guess this should be done step by step.
Benchmark "naked" system, next benchmark user session.
Out of interest I just tried the dd benchmark you linked on my currently logged in session.
dd if=/dev/zero of=/home/sneida/Downloads/tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,599712 s, 1,8 GB/s
1,8GB/s seems to be pretty slow now again on a drive that should do 7.000MB/s.
Additionally I encountered this error:
sudo echo 3 > /proc/sys/vm/drop_caches
zsh: permission denied: /proc/sys/vm/drop_caches
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
No, it's an d/encryption benchmark.
# Tests are approximate using memory only (no storage IO).
Wait, we are talking about different benchmark tools.
I wasn't talking about the cryptsetup benchmark but CrystalDiskMark vs KDiskMark:
CrystalDiskMark benchmark: https://imgur.com/a/1okVrpY
KDiskMark benchmark: https://imgur.com/a/x64VtqZ
Those two should be comperable?!
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
Those two should be comperable?!
No?
Different benchmarks, different read/write patterns, different FS, different OS, different co-processes.
That's phoronix-grade "benchmarking". Fun but useless.
zsh: permission denied: /proc/sys/vm/drop_caches
Which process was sudo'd and which process tries to write the file?
Why does
echo 3 | sudo tee /proc/sys/vm/drop_caches
work?
Offline
In this comparison it will always be different FS, different OS and different co-processes.
Forgive be but this is part of what I am trying to compare (the performance impact with those differences presented)?
I also set the same benchmark test profile in both programs (Type, Block Size, Queues, Threads).
Anyway, back to dd:
dd if=/dev/zero of=/home/user/Downloads/tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,636466 s, 1,7 GB/s
echo 3 | sudo tee /proc/sys/vm/drop_caches
3
dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,541827 s, 2,0 GB/s
dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,125088 s, 8,6 GB/s
That is terribly slow or am I not understanding the results correctly?
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
Forgive be but this is part of what I am trying to compare
The reason behind this is that I would like to figure out the impact on performance that encryption has.
Random uncontrolled benchmarks are *completely* pointless, just try to run two in parallel and O!M!G! the results were just cut if half.
But have fun, it's your time.
Offline
Well then my main question still remains.
How to run a "controlled" benchmark on my Arch setup vs my Windows setup.
This would measure the performance impact with all the differences provided between those two setups (OS, Filesystem, encryption,...).
But it needs a tool that runs the same on both OS.
Does such a tool even exist?
Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF
Offline
You're missing the point.
measure the performance impact with all the differences provided between those two setups (OS, Filesystem, encryption,...)
contradicts "controlled benchmark"
You're benchmarking phoronix-style.
You run something with some parameters on some system and in some situation. That gets you some result. And a different one next time. And then some more.
If you want to know what impact the encryption has you isolate that.
If you want to know what impact the filesystem has, you isolate that (spoiler: ext4 is gonna perform better on linux, ntfs is gonna perform better on windows)
If you want to know what impact the sideload from some fat-desktop environments file indexer has, you isolate that.
If you want to know what impact the IO scheduler has, you isolate that.
If you want to know "what is the IO thoughput on a tuesday afternoon in spring 2023 while I'm running plasma and maybe the file indexer is running for some part of the test or maybe not" you test that.
But that is *NEVER* gonna be comparable to "what is the IO thoughput on a wednesday evening in summer 2023 while I'm running windows and maybe the drive is defragmented at the same time or maybe not" in any way shape or form.
These "benchmarks" are utterly pointless.
Offline