You are not logged in.

#1 2020-01-04 09:24:33

gardotd426
Member
Registered: 2019-10-19
Posts: 48

Trying to figure out NVME sector size/performance

I built my first PC a few months ago, and I installed one 240GB SATA SSD for my OS and a couple games, and used a few HDDs for the rest of my games and /home, where I installed Manjaro. However, I generally prefer to run more than one Linux installation at a time, and Arch is my other main distro of choice. So when I got a 120GB HP EX900 120GB NVME SSD last month, I installed it, and after following the formatting instructions on the Arch Wiki and the other sources I could find, I installed Arch on it. However, I've noticed that I/O performance seems to be rather poor. Synthetic benchmarks such as

# hdparm -Tt --direct /dev/nvme0n1

show 1-1.5GB/s cached reads and disk reads, but when I actually try and launch anything, or do any real-world tasks that depend on read/write speed, it runs quite a bit slower than my budget-level Crucial BX500 SATA SSD. Programs take multiple seconds to launch, some even 8-10 seconds. Even a terminal can take up to 10 seconds to launch. And this is not down to CPU or RAM, as I have a Ryzen 5 2600X and 16GB of 3000MHz Trident Vulcan-Z memory (2x8 in dual channel w/XMP). And my CPU actually seems to be one of the better binned 2600x's, at least according to many phoronix and geekbench runs.

From what little I know about NVME drives, I know that poor performance is usually due to either throttling from high temperatures or misaligned sector-size. From what I've read on the Arch Wiki (the Wiki's pages on NVME and Advanced Format are actually shockingly short on actual information relative to the greatness the Wiki usually provides), and from everything else I've been able to suss out, the actual PHYSICAL block size for NVMEs is supposed to be 4096, which usually gets emulated as 512 logical size for legacy compatibility. However, EVERY single tool I've used to report on block size has reported both a logical AND physical block size of 512, except for one. Every post I've read from people having similar issues is usually full of people using parted or smartctl or any myriad of commands to get the true size, and it always seems to work for them, however for me they all still report 512 for logical AND physical. The only outlier is stat:

 # stat -f /dev/nvme0n1
File: "/dev/nvme0n1"
    ID: 0        Namelen: 255     Type: tmpfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 2038644    Free: 2038644    Available: 2038644
Inodes: Total: 2038644    Free: 2037915

People in these aforementioned forums have said that when they enounter similar issues, parted should still report the correct size/reports the correct size for them:

# parted -l 
Model: HP SSD EX900 120GB (nvme)
Disk /dev/nvme0n1: 120GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name        Flags
 1      1049kB  962MB   960MB   fat32                    boot, esp
 2      962MB   42.4GB  41.4GB  ext4         Arch Linux
 3      42.4GB  120GB   77.6GB  ext4         NVMEGAMES

cat:

 # cat /sys/block/nv*/queue/hw_sector_size
512
 # cat /sys/block/nv*/queue/physical_block_size
512

nvme-cli

 # nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     HBSE19301500618      HP SSD EX900 120GB                       1         120.03  GB / 120.03  GB    512   B +  0 B   S0614B0 

I followed formatting instructions when I first installed the drive, but it still seems like I'm actually on 512 instead of 4096. Every tool linux has except stat (which I doubt the accuracy of) says both logical and physical are 512 (and tools like parted report 512/4096 for a couple of my HDDs, so it's not like it can't tell), and like I said I can't think of anything else that would explain this absolutely awful performance. Even with the cheapest nvme in the world (the HP EX900 actually has moderately okay reviews, it's supposedly a perfectly fine drive), it should NEVER take 10 seconds just to open a terminal. And this is a regularly occurring thing, not isolated. And like I said, it's not the RAM or the CPU, especially considering the fact that my Manjaro partition on a much worse-spec'ed SSD doesn't have the same issue. I'm not using some tank of a DE either, I mainly use i3 but also use Plasma when I need to run games that have issues with i3. Manjaro I use GNOME and i3, so again, everything else is equal except the actual OS drives, and the Arch one should absolutely outperform the Manjaro one, but it doesn't. I use the same kernels for both, 5.4.7-arch and 5.4.7-MANJARO, along with the same 5.5-tkg and 5.4.3-fsync kernels. Temperatures are 30 C below any warning level, so definitely not that.

$ inxi -FDxxz
System:    Host: archlinux Kernel: 5.4.3-arch1-1-fsync x86_64 bits: 64 compiler: gcc v: 9.2.0 
           Desktop: KDE Plasma 5.17.4 tk: Qt 5.14.0 wm: kwin_x11 dm: LightDM, SDDM 
           Distro: Arch Linux 
Machine:   Type: Desktop Mobo: ASRock model: B450M/ac serial: <filter> UEFI: American Megatrends 
           v: P1.20 date: 08/01/2019 
CPU:       Topology: 6-Core model: AMD Ryzen 5 2600X bits: 64 type: MT MCP arch: Zen+ rev: 2 
           L2 cache: 3072 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 96029 
           Speed: 3942 MHz min/max: 2200/4000 MHz Core speeds (MHz): 1: 3865 2: 1981 3: 2111 
           4: 2193 5: 2018 6: 2290 7: 2080 8: 2198 9: 2986 10: 1990 11: 1964 12: 2211 
Graphics:  Device-1: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] vendor: XFX Pine 
           driver: amdgpu v: kernel bus ID: 07:00.0 chip ID: 1002:67df 
           Display: x11 server: X.Org 1.20.6 driver: amdgpu compositor: kwin_x11 
           resolution: 1366x768~60Hz, 1920x1080~60Hz 
           OpenGL: 
           renderer: Radeon RX 580 Series (POLARIS10 DRM 3.35.0 5.4.3-arch1-1-fsync LLVM 9.0.0) 
           v: 4.5 Mesa 20.0.0-devel (git-3409c06e26) direct render: Yes 
Audio:     Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: XFX Pine 
           driver: snd_hda_intel v: kernel bus ID: 07:00.1 chip ID: 1002:aaf0 
           Device-2: AMD Family 17h HD Audio vendor: ASRock driver: snd_hda_intel v: kernel 
           bus ID: 09:00.3 chip ID: 1022:1457 
           Sound Server: ALSA v: k5.4.3-arch1-1-fsync 
Network:   Device-1: Intel Dual Band Wireless-AC 3168NGW [Stone Peak] driver: iwlwifi v: kernel 
           bus ID: 04:00.0 chip ID: 8086:24fb 
           IF: wlp4s0 state: up mac: <filter> 
           Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASRock 
           driver: r8169 v: kernel port: f000 bus ID: 05:00.0 chip ID: 10ec:8168 
           IF: enp5s0 state: down mac: <filter> 
Drives:    Local Storage: total: 1.91 TiB used: 752.03 GiB (38.5%) 
           ID-1: /dev/nvme0n1 vendor: HP model: SSD EX900 120GB size: 111.79 GiB speed: 31.6 Gb/s 
           lanes: 4 serial: <filter> temp: 45 C 
           ID-2: /dev/sda vendor: Seagate model: ST500LT012-1DG142 size: 465.76 GiB speed: 6.0 Gb/s 
           serial: <filter> temp: 43 C 
           ID-3: /dev/sdb type: USB model: SABRENT SABRENT size: 74.53 GiB serial: <filter> 
           ID-4: /dev/sdc vendor: Toshiba model: MK6475GSX size: 596.17 GiB speed: 3.0 Gb/s 
           serial: <filter> temp: 42 C 
           ID-5: /dev/sdd vendor: HGST (Hitachi) model: HTS545050A7E380 size: 465.76 GiB 
           speed: 3.0 Gb/s serial: <filter> temp: 39 C 
           ID-6: /dev/sde type: USB model: General USB Flash Disk size: 14.55 GiB serial: <filter> 
           ID-7: /dev/sdf vendor: Crucial model: CT240BX500SSD1 size: 223.57 GiB speed: 6.0 Gb/s 
           serial: <filter> temp: 49 C 
Partition: ID-1: / size: 37.73 GiB used: 30.55 GiB (81.0%) fs: ext4 dev: /dev/nvme0n1p2 
           ID-2: /home size: 96.69 GiB used: 64.62 GiB (66.8%) fs: ext4 dev: /dev/sda1 
           ID-3: swap-1 size: 43.87 GiB used: 107.3 MiB (0.2%) fs: swap dev: /dev/sdc5 
Sensors:   System Temperatures: cpu: 62.1 C mobo: 34.0 C gpu: amdgpu temp: 44 C 
           Fan Speeds (RPM): fan-1: 1773 fan-2: 2683 fan-3: 0 fan-4: 0 fan-5: 0 fan-6: 0 fan-7: 0 
           gpu: amdgpu fan: 3187 
Info:      Processes: 361 Uptime: 5h 12m Memory: 15.57 GiB used: 5.04 GiB (32.4%) Init: systemd 
           v: 244 Compilers: gcc: 9.2.0 clang: 9.0.1 Shell: zsh v: 5.7.1 running in: gnome-terminal 
           inxi: 3.0.37 

Offline

#2 2020-01-04 14:03:43

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: Trying to figure out NVME sector size/performance

strace a "slow terminal". There are various other factors that could lead to a slow down, why do you think that this is related to your disk I/O? Things that often lead to a delay are a broken session, and/or socket issues. Also since manjaro does a whole lot of "behind the scene" settings, double check that stuff is really on the same setting, e.g. does Manjaro define a non-standard IO scheduler? Does it change CPU scheduler/governor settings?

Offline

#3 2020-01-04 16:11:28

gardotd426
Member
Registered: 2019-10-19
Posts: 48

Re: Trying to figure out NVME sector size/performance

V1del wrote:

strace a "slow terminal". There are various other factors that could lead to a slow down, why do you think that this is related to your disk I/O? Things that often lead to a delay are a broken session, and/or socket issues. Also since manjaro does a whole lot of "behind the scene" settings, double check that stuff is really on the same setting, e.g. does Manjaro define a non-standard IO scheduler? Does it change CPU scheduler/governor settings?

Stracing seems pretty useless here, and even if Manjaro did define a non-standard scheduler, I'm not sure why that would have much to do with why Arch is SO slow. Maybe I explained it wrong? It's not Manjaro that has the issue, it's Arch. Arch is on the NVME, and it's insanely slow to open any program. But cpu and RAM benchmarks do extremely well (Like, as in the best 2600X score on Geekbench and in smallpt on Phoronix). So that's why I know it's not memory or processor-related, but either way, I have both Manjaro and Arch set up to always use the performance CPU governor. And it's not terminals getting all slow once they're open (which is the only thing strace could track, from my limited understanding). It's terminals taking 10 seconds to launch. Everything takes forever to launch on Arch, it seems like anything that has to call /usr or anything else on the root partition (which is the NVME)takes an inordinately long period of time.

So if the CPU governor/settings are the exact same, and the RAM settings are the same, and the slowness exists on the NVME but not the SATA SSD, that's what leads me to believe it's something to do with the NVME formatting, since I've seen myriad sources say that aligning NVMEs to 512 can cause them to be extremely slow. And I can't seem to figure out how to get it to actually align to 4096, or find out definitively if that alignment actually took/succeeded.

As far as the I/O scheduler, this actually seems like it may be part of my problem. I looked into it, and ran:

sudo cat /sys/block/nvme0n1/queue/scheduler

and for some weird reason, the system has literally no scheduler set for the nvme, but it has mq-deadline set for every other drive (including the SATA SSD, which I also have mounted in Arch. Not the Manjaro partition, but my Steam Library partition). But for the nvme, the scheduler file in /sys shows the available scheduler, but the one in brackets (and therefore the "selected" one) is literally "[none]." This confuses the hell out of me, since I've never seen anything in the Wiki, installation guide, or getting started guide that says you need to define an I/O scheduler for your root partition in Arch. I know I can select a different scheduler by simply editing the file, but why is it currently set to none? Also, I would like to know what's going on with the sector alignment from the OP, even if that's not why everything is so slow. And this isn't really a "slow down," it's a constant. Even though by all indications the CPU and RAM are totally kickin'. I get solid fps in demanding games, I get outstanding cpu and ram benchmarks, etc.

Offline

#4 2020-01-04 16:37:14

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: Trying to figure out NVME sector size/performance

Again, why are you sure it's I/O ? You mention that benchmarks give you the intended speed, are there file system differences? Have you tried that same Arch install on that SATA SSD drive (and vice versa for Manjaro) ?

Stracing a program would give us information on where it actually hangs, also double check what actually happens I/O wise with something like iotop instead of wild guesses.

You should not be confused, nvme drives do indeed by default not use a kernel level scheduler (because nvme drives are designed to be able to handle a couple million concurrent requests and often have better firmware internal schedulers). But all of this assumes the defaults, and I do not know enough of Manjaro to know in what way they deviate from the upstream defaults.

Your stat command doesn't concern itself with the actual drive, and is completely irrelevant here. (That command as executed, shows you the "block size" of the /dev file system, which is a virtual, non-disk, kernel populated filesystem) and generally that filesystems work on a 4k block size is normal and independent of your partition sector size.

That actual partitioning tools show you 512k is also generally normal.

You're doing a lot of assumptions  here, that would better be verified with actual facts and actable data. So check iotop, check/post strace  think about/check other system level differences between the two systems.

FWIW I also run an nvme drive, and at least on my end everything seems to run well within my expectations.

Last edited by V1del (2020-01-04 16:38:32)

Offline

#5 2020-01-04 18:57:03

gardotd426
Member
Registered: 2019-10-19
Posts: 48

Re: Trying to figure out NVME sector size/performance

Like I said, I don't know what else it could be. It seems you want me to track down any differences between the Manjaro and Arch installs, so here are the side-by-sides:

Filesystem: Ext4 for both.

Kernel: Manjaro: 5.4.7-MANJARO, 5.5-tkg-pds, 5.4.3-fsync    Arch: 5.4.7-arch, 5.5-tkg-pds, 5.4.3-fsync

I/O scheduler: Manjaro: bfq    Arch: None (but as you said, this is to be expected)

CPU governor: Performance for both.

I don't know what other system-level things I could check for differences, but as you can see they are pretty much exactly the same. I generally don't even run the 5.4.7-MANJARO kernel, I usually stick to fsync and tkg, which is the literal exact same kernel I'm using on Arch, so there are no differences whatsoever there.

My confusion over strace is how I could even use it here. From my understanding, strace REQUIRES the program to already be open. So how could I possibly strace something before it opens? If there's a way, I can't find it, but then again I've never used strace before. But it seems like a pid is required. And like I said, once the programs open, they're not entirely slow, it's mainly when launching anything. Right now I'm in the zen kernel, which has set bfq as my scheduler, and it actually seems like that's largely sped things up. I have timeshift snapshots of both systems, so I suppose I could swap Manjaro and Arch and see if the problems persist, since I can easily go back to where I'm at now when done. I'll reboot after I make dinner and boot into the vanilla kernel and run iotop and see what the reads are like. But that reminds me of another thing, I was rsync-ing about 19GB from one of my HDDs to the NVME earlier and it took like, 30 minutes, which is outrageously longer than it should take. I've restored snapshots multiple times of the same size from the same HDD onto the SSD and it take less than 5 minutes. But that could potentially be the HDD's read speed's fault I suppose, and either way I imagine that has nothing to do with the slow launch issue since that's read vs write.

But like I said, I'll run iotop and do some tests here in a few, but if there's any other comparative aspects I could track down between Manjaro and Arch aside from what I've already given please let me know and I'll get them. The thing is, I actually prefer Arch to Manjaro most of the time, and now that I've got Arch fully configured and set up with everything I set up an install with, I'd prefer to have Arch as the main driver and Manjaro as my backup (which is why I put Arch on the NVME), but this 10 seconds-2 minutes to launch programs is way too much. If I have to, I'll just use the zen kernel, but I also want to actually learn as much as I can, instead of finding fixes that don't expose the actual problem OR educate me more on how my system works on a deeper level.

Offline

#6 2020-01-04 19:38:01

Head_on_a_Stick
Member
From: The Wirral
Registered: 2014-02-20
Posts: 8,999
Website

Re: Trying to figure out NVME sector size/performance

Have you checked the journal for any relevant error messages?

FWIW parted reports my M.2 NVMe drive as having 512/512 sector size but the subjective performance is nothing short of stellar (Debian stable).


Jin, Jîyan, Azadî

Offline

#7 2020-01-04 20:07:17

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,812

Re: Trying to figure out NVME sector size/performance

You can use strace to invoke a program (i.e. no it doesn't have to be running, strace will attach to it immediately on start up), you will get all the calls that happen (in the majority of cases you can literally just

strace $PROGRAM

), if you stall on a file that should become pretty apparent if you stall on something else, that hopefully too. It does sound much more like some dbus timeout or so to me.

Another alternative is disabling lowering the nvme power saving kick in: https://wiki.archlinux.org/index.php/So … Linux_4.10 (ignore the Samsung part, there exist other buggy nvme implementations)

As for other differences look trough your sysctl directories for example e.g. /etc/sysctl.d and/or /usr/lib/sysctl.d

Last edited by V1del (2020-01-04 20:10:36)

Offline

#8 2020-01-04 20:51:48

merlock
Member
Registered: 2018-10-30
Posts: 262

Re: Trying to figure out NVME sector size/performance

gardotd426 wrote:

System:    Host: archlinux Kernel: 5.4.3-arch1-1-fsync

Maybe your problem lies in the AUR kernel you're using (flagged out-of-date on 2019-12-19).

Oh, and from what I've gone by for a few years now is that 'none' is the correct scheduler for NVME drives.  I seem to remember reading somewhere that you don't really want to put some kind of I/O scheduler on the PCIe bus.

Last edited by merlock (2020-01-04 20:52:48)


Eenie meenie, chili beanie, the spirits are about to speak -- Bullwinkle J. Moose
It's a big club...and you ain't in it -- George Carlin
Registered Linux user #149839
perl -e 'print$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10); '

Offline

#9 2020-01-04 21:16:52

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,415

Re: Trying to figure out NVME sector size/performance

My confusion over strace is how I could even use it here. From my understanding, strace REQUIRES the program to already be open.

Programs™ do not "open". The elf loader loads the binary, that's probably already in RAM anyway (certainly on the second run) and then program™ may or may not perform multiple disk reads/writes before it doesstuff™

If you "strace -t program™" it'll also print the wallclock with each ioctl and "strace -r program™" will print a relative timestamp, telling you how long any system call takes.

Sidebar: if by "open program" you mean "some window pops up on my screen", there can be a whole host of other stuff impeding this. Starting with the display server (wayland/X11) over dbus and ending with a compositor being slow to update the output.

Online

#10 2022-08-15 20:29:47

kristianlm
Member
Registered: 2013-04-24
Posts: 5

Re: Trying to figure out NVME sector size/performance

I know I'm late for this, and this may not be relevant, but I believe I experienced a similar problem a while back when I did a dd if=/dev/sda of=/dev/sdb. My Arch OS was very slow on /dev/sdb afterwards, even though /dev/sda ran fine. Any disk write would be very slow.

It turns out HDD's and SSD's don't work the same way, and I wasn't aware of this. An SSD does a lot of work behind the scenes and needs to keep a list of "unused" blocks. I finally stumbled upon a solution and ran`fstrim /` or something similar. This will inform the block driver which blocks are not in use by the file-system and this speeds writes up significantly. Since I used dd, no blocks's weren't readily available. At least that's my vague intuition on how this works.

Maybe this tip can be of use to someone.

Offline

Board footer

Powered by FluxBB