You are not logged in.

#1 2023-05-07 16:57:59

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

LUKS2 Performance impact - This seems wrong?

Hi everyone,
I am seeing a big performance impact with LUKS2 on my system. I am not sure if this is normal so I thought I would ask here.

System:

Thinkpad T14s Gen3 AMD
CPU: Ryzen 7 6850u
RAM: 32GB RAM 6400MHz
NVME: Solidigm P44 Pro 2TB
Kernel: 6.3.1 with amd_pstate=active
Filesystem Linux: EXT4
Filesystem Windows: NTFS

Some benchmarks / speed tests on Windows 10:

- Copying a 50GB file: 18 seconds
- CrystalDiskMark benchmark: https://imgur.com/a/1okVrpY

Arch Linux:

- Copying a 50GB file: 38 seconds
- KDiskMark benchmark: https://imgur.com/a/8Tc6pWS

The performance impact is quite huge but based on the cryptsetup benchmark it should be a lot faster.

cryptsetup -v status lvm       

  
/dev/mapper/lvm is active and is in use.
  type:    LUKS2
  cipher:  aes-xts-plain64
  keysize: 512 bits
  key location: keyring
  device:  /dev/nvme0n1p6
  sector size:  512
  offset:  32768 sectors
  size:    2951163904 sectors
  mode:    read/write
  flags:   discards no_read_workqueue no_write_workqueue

cryptsetup luksDump /dev/nvme0n1p6

LUKS header information
Version:        2
Epoch:          6
Metadata area:  16384 [bytes]
Keyslots area:  16744448 [bytes]
UUID:          x
Label:          (no label)
Subsystem:      (no subsystem)
Flags:          no-read-workqueue no-write-workqueue 

Data segments:
  0: crypt
        offset: 16777216 [bytes]
        length: (whole device)
        cipher: aes-xts-plain64
        sector: 512 [bytes]

Keyslots:
  0: luks2
        Key:        512 bits
        Priority:   normal
        Cipher:     aes-xts-plain64
        Cipher key: 512 bits
        PBKDF:      argon2id
        Time cost:  9
        Memory:     1048576
        Threads:    4

        AF stripes: 4000
        AF hash:    sha256
        Area offset:290816 [bytes]
        Area length:258048 [bytes]
        Digest ID:  0
Tokens:
Digests:
  0: pbkdf2
        Hash:       sha256
        Iterations: 329740

fdisk -l

Disk /dev/nvme0n1: 1,86 TiB, 2048408248320 bytes, 4000797360 sectors
Disk model: SOLIDIGM SSDPFKKW020X7                  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 58411B52-D1AC-4175-87AB-8D0F4645D891

Device              Start        End    Sectors   Size Type
/dev/nvme0n1p1       2048     206847     204800   100M EFI System
/dev/nvme0n1p2     206848     239615      32768    16M Microsoft reserved
/dev/nvme0n1p3     239616 1047532172 1047292557 499,4G Microsoft basic data
/dev/nvme0n1p4 1047533568 1048575999    1042432   509M Windows recovery environment
/dev/nvme0n1p5 1048576000 1049599999    1024000   500M Linux extended boot
/dev/nvme0n1p6 1049600000 4000796671 2951196672   1,4T Linux filesystem


Disk /dev/mapper/lvm: 1,37 TiB, 1510995918848 bytes, 2951163904 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/MyVolumeGroup: 1,37 TiB, 1510456950784 bytes, 2950111232 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/zram0: 15,06 GiB, 16173236224 bytes, 3948544 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

cryptsetup benchmark

# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      2744963 iterations per second for 256-bit key
PBKDF2-sha256    5197402 iterations per second for 256-bit key
PBKDF2-sha512    2028193 iterations per second for 256-bit key
PBKDF2-ripemd160 1093405 iterations per second for 256-bit key
PBKDF2-whirlpool  846991 iterations per second for 256-bit key
argon2i      10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id     10 iterations, 1048576 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b      1427,5 MiB/s      5925,7 MiB/s
    serpent-cbc        128b       136,8 MiB/s       997,3 MiB/s
    twofish-cbc        128b       271,9 MiB/s       515,2 MiB/s
        aes-cbc        256b      1094,0 MiB/s      4888,9 MiB/s
    serpent-cbc        256b       141,7 MiB/s       997,9 MiB/s
    twofish-cbc        256b       281,1 MiB/s       514,7 MiB/s
        aes-xts        256b      4782,6 MiB/s      4821,1 MiB/s
    serpent-xts        256b       872,4 MiB/s       886,4 MiB/s
    twofish-xts        256b       475,8 MiB/s       490,4 MiB/s
        aes-xts        512b      4060,4 MiB/s      4112,0 MiB/s
    serpent-xts        512b       898,6 MiB/s       883,8 MiB/s
    twofish-xts        512b       480,9 MiB/s       489,3 MiB/s

cpupower frequency-info

analyzing CPU 5:
  driver: amd_pstate_epp
  CPUs which run at the same hardware frequency: 5
  CPUs which need to have their frequency coordinated by software: 5
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 4.77 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 4.77 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.63 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    Boost States: 0
    Total States: 3
    Pstate-P0:  2700MHz
    Pstate-P1:  1800MHz
    Pstate-P2:  1600MHz

So given the results of the benchmark, my speed should be atleast twice as fast as it currently is on Linux?
I also noticed when copying the 50GB file that only one CPU thread hits 100% while I have a total of 16 threads available.

Did I configure something wrong or is the impact I am seing normal and can't be optimized?

Last edited by Utini (2023-05-07 16:58:25)


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#2 2023-05-08 01:10:59

Strike0
Member
From: Germany
Registered: 2011-09-05
Posts: 1,474

Re: LUKS2 Performance impact - This seems wrong?

It'll use more threads, when you copy multiple files rather than one big one. Also check this info.

Offline

#3 2023-05-08 06:07:52

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

Hi, I already set those parameters. The "cryptsetup -v status lvm" shows it.

@Edit:
Just noticed that removing those two parameters doubles the speed in the benchmark.

Last edited by Utini (2023-05-08 08:22:32)


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#4 2023-05-08 10:28:56

Strike0
Member
From: Germany
Registered: 2011-09-05
Posts: 1,474

Re: LUKS2 Performance impact - This seems wrong?

Ah, yes, you already set the parameters. I have no other suggestions, yet note the benchmark is memory only (not involving disc IO).

Offline

#5 2023-05-08 11:49:52

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

The benchmark is memory only?
Can you elaborate that please? It is a disk benchmark program after all?


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#6 2023-05-08 12:21:22

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

No, it's an d/encryption benchmark.

# Tests are approximate using memory only (no storage IO).

Just noticed that removing those two parameters doubles the speed in the benchmark.

- Copying a 50GB file: 18 seconds / windows (on the LUKS drive?)
- Copying a 50GB file: 38 seconds / linux

(2*18)/38 == 94.74%

Offline

#7 2023-05-08 12:30:51

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

The copying test wasn't a good way to compare speeds.
I copied while using Dolphin and that seems to have a bug and causing slower speeds.
With the "cp" command I can actually transfer 50GB in 8 seconds.

So what can / should I use to compare my NVME speed on Arch (with LUKS / LVM) vs Windows?


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#8 2023-05-08 12:35:54

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

Does windows access the LUKS2 drive at all?
What drive are you testing there?

Offline

#9 2023-05-08 13:16:01

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

It is one drive (Solidigm P44 Pro 2TB).
One of the partitions on it is NTFS Windows 10.

So I would like to test Windows 10 (with no encryption) vs Arch with LUKS2.
The reason behind this is that I would like to figure out the impact on performance that encryption has.
And ofcourse to verify that everything is configured correctly in Arch regarding performance.


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#10 2023-05-08 13:41:43

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

So I would like to test Windows 10 (with no encryption) vs Arch with LUKS2.

What's the point of that?

The reason behind this is that I would like to figure out the impact on performance that encryption has.

Then you should measure the performance against linux on the same FS on the same drive and not introduce a bunch of irrelvant variables (windows + ntfs)

Offline

#11 2023-05-08 13:49:10

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

The point is that Windows is currently my only reference as on how fast the NVME drive can perform.
So my only way to find out if my NVME drive is configured / working correctly on Linux, is by comparing it with Windows?

The fact that I have seen such high differences in the benchmark (7.000MB/S vs 2.000MB/S) made me worry that something is wrong my Linux configuration.

Last edited by Utini (2023-05-08 13:49:50)


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#12 2023-05-08 14:06:18

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

If you want to benchmark the filesystem you need to eliminate all variables, a GUI benchmark, probably in a KDE session while perhaps baloo bangs the drive and some plasma gadgets hog the CPU is hardly relevant (as you already figured, cp outperformed your windows copy by a mile. It'll however proabably also not have synced the copy)


Boot the rescue.target and measure the performance w/ dd, https://wiki.archlinux.org/title/Benchmarking#dd

No idea how to get a clean benchmark on windows, though.
You could test d on the ntfs partition (which carries some overhead and apparently there's a write-preventing bug in ntfs3 while ntfs-3g is userspace) or some vfat partition (w/ expectably less overhead for the FS) to ***somehwhat*** ballpark the condition.

Offline

#13 2023-05-08 14:15:32

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

Hmm I would argue that the running session is part of what I want to verify in my test.
But I guess this should be done step by step.
Benchmark "naked" system, next benchmark user session.

Out of interest I just tried the dd benchmark you linked on my currently logged in session.

dd if=/dev/zero of=/home/sneida/Downloads/tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,599712 s, 1,8 GB/s

1,8GB/s seems to be pretty slow now again on a drive that should do 7.000MB/s.

Additionally I encountered this error:

sudo echo 3 > /proc/sys/vm/drop_caches
zsh: permission denied: /proc/sys/vm/drop_caches

Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#14 2023-05-08 14:19:11

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

seth wrote:

No, it's an d/encryption benchmark.

# Tests are approximate using memory only (no storage IO).

Wait, we are talking about different benchmark tools.
I wasn't talking about the cryptsetup benchmark but CrystalDiskMark vs KDiskMark:

CrystalDiskMark benchmark: https://imgur.com/a/1okVrpY
KDiskMark benchmark: https://imgur.com/a/x64VtqZ

Those two should be comperable?!


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#15 2023-05-08 14:22:51

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

Those two should be comperable?!

No?
Different benchmarks, different read/write patterns, different FS, different OS, different co-processes.
That's phoronix-grade "benchmarking". Fun but useless.

zsh: permission denied: /proc/sys/vm/drop_caches

Which process was sudo'd and which process tries to write the file?
Why does

echo 3 | sudo tee /proc/sys/vm/drop_caches

work?

Offline

#16 2023-05-08 14:32:16

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

In this comparison it will always be different FS, different OS and different co-processes.
Forgive be but this is part of what I am trying to compare (the performance impact with those differences presented)?
I also set the same benchmark test profile in both programs (Type, Block Size, Queues, Threads).

Anyway, back to dd:

dd if=/dev/zero of=/home/user/Downloads/tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,636466 s, 1,7 GB/s

echo 3 | sudo tee /proc/sys/vm/drop_caches                  
3

dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,541827 s, 2,0 GB/s

dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,125088 s, 8,6 GB/s

That is terribly slow or am I not understanding the results correctly?


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#17 2023-05-08 14:36:55

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

You wrote:

Forgive be but this is part of what I am trying to compare

YOu earlier wrote:

The reason behind this is that I would like to figure out the impact on performance that encryption has.

Random uncontrolled benchmarks are *completely* pointless, just try to run two in parallel and O!M!G! the results were just cut if half.
But have fun, it's your time.

Offline

#18 2023-05-08 14:45:58

Utini
Member
Registered: 2015-09-28
Posts: 476
Website

Re: LUKS2 Performance impact - This seems wrong?

Well then my main question still remains.

How to run a "controlled" benchmark on my Arch setup vs my Windows setup.
This would measure the performance impact with all the differences provided between those two setups (OS, Filesystem, encryption,...).
But it needs a tool that runs the same on both OS.

Does such a tool even exist?


Setup 1: Thinkpad T14s G3, 14" FHD - R7 6850U - 32GB RAM - 2TB Solidigm P44 Pro NVME
Setup 2: Thinkpad X1E G1, 15.6" FHD - i7-8850H - 32GB RAM - NVIDIA GTX 1050Ti - 2x 1TB Samsung 970 Pro NVME
Accessories: Filco Majestouch TKL MX-Brown Mini Otaku, Benq XL2420T (144Hz), Lo(w)gitech G400, Puretrak Talent, Sennheiser HD800S + Meier Daccord FF + Meier Classic FF

Offline

#19 2023-05-09 13:39:22

seth
Member
Registered: 2012-09-03
Posts: 58,240

Re: LUKS2 Performance impact - This seems wrong?

You're missing the point.

measure the performance impact with all the differences provided between those two setups (OS, Filesystem, encryption,...)

contradicts "controlled benchmark"

You're benchmarking phoronix-style.
You run something with some parameters on some system and in some situation. That gets you some result. And a different one next time. And then some more.

If you want to know what impact the encryption has you isolate that.
If you want to know what impact the filesystem has, you isolate that (spoiler: ext4 is gonna perform better on linux, ntfs is gonna perform better on windows)
If you want to know what impact the sideload from some fat-desktop environments file indexer has, you isolate that.
If you want to know what impact the IO scheduler has, you isolate that.

If you want to know "what is the IO thoughput on a tuesday afternoon in spring 2023 while I'm running plasma and maybe the file indexer is running for some part of the test or maybe not" you test that.
But that is *NEVER* gonna be comparable to "what is the IO thoughput on a wednesday evening in summer 2023 while I'm running windows and maybe the drive is defragmented at the same time or maybe not" in any way shape or form.

These "benchmarks" are utterly pointless.

Offline

Board footer

Powered by FluxBB