You are not logged in.

#1 2024-08-12 19:17:53

kelloco2
Member
Registered: 2012-02-13
Posts: 129

Advanced NVMe disks with 4k sectors and partitions alignment

Hi, I've been using ArchLinux for years on my machines, both laptops and servers, and I'm quite satisfied with it. However, going through the installation process each time is a bit stressful, so I'm writing my own advanced installer that sets up everything I need. I have most of it done already—the installer handles a lot of cool stuff like encrypted disks with GRUB2 (LVM on LUKS), auto updates + snapshot backups, zram, XFCE4, etc. But I'm not sure if my installer is correctly partitioning advanced hard drives that support what's known as "Advanced Format."

To start with the basics, older hard drives that don't support such features typically report a 512B physical sector size. Linux partitioning tools optimize the settings and create the first partition at the 2048th sector. So, the output of the command "LANG=en_EN parted /dev/sda unit s print" would look something like this:

LANG=en_EN parted /dev/sda unit s print
Model: ATA KINGSTON RBUSNS8 (scsi)
Disk /dev/sda: 250069680s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start     End         Size        File system  Name  Flags
 1      2048s     616447s     614400s     fat32        efi   boot, esp

And that’s perfectly fine. There's nothing more to worry about. To check if the partitions are aligned:
    - For the first sector: 2048 sectors x 512B = 1048576 bytes = 1MiB
    - For the last sector: (614400 sectors + 1) x 512B = 315621376 bytes, and then 315621376 bytes % 1048576 bytes = 0
    - For Size: 614400sectors x 512B = 314572800b and then 314572800b % 1048576 b = 0

Notice what the disk reports: "Sector size (logical/physical): 512B/512B."

Now, moving on to newer and more advanced hard drives. These have a physical sector size of 4K and, by default, operate in "512-byte emulation" mode for backward compatibility, but they allow you to switch to a more efficient setting. Here’s what an expert said about this on another forum:

Most client-oriented storage operates by default in "512-bytes emulation" mode, where although the logical sector size is 512 byes/sector, internally the firmware uses 4096 bytes/sector. Storage with a 4096 byte size for both logical and physical sectors operates in what is commonly called "4K native" mode or "4Kn". Due to possible software compatibility issues that have still not been completely solved yet (for instance, cloning partitions from a 512B drive to a 4096B drive is not directly possible), these drives tend to be quite rare in the client space and it is mostly enterprise class drives that employ it.
(...)
Why change this setting? In theory, the 4K native LBA mode would get away with the "translation" the firmware has to do with 512-bytes logical sectors to map them to the underlying 4K "physical" arrangement (if a physical/logical distinction makes sense for SSDs) and may offer somewhat higher performance in this way.
(...)
This is possibly true for fast NVMe SSDs and high-performance (non-Windows) file systems in high-I/O environments,
(...)
Furthermore, changing the logical sector size requires to delete everything on the SSD and basically reinstall the OS from scratch, which makes it even more unlikely for users to attempt it and see if differences arise. This is better tested with brand-new, empty drives.
(...)
I did it after following this blogpost where a WD SN850 user on Linux reportedly measured 10% higher performance on EXT4 partitions with basic benchmarks

The first of these drives appeared years ago, like the WD HDDs with what's called "Advanced Format" (https://wiki.archlinux.org/title/Advanced_Format). Recently, I bought a cheap but decent NVMe drive for one of my machines—the WD Blue SN580, which supports this feature. You can check your drive with this command:

nvme id-ns -H /dev/nvme0n1

If you see something like this at the end:

LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use)
LBA Format  1 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0x1 Better

This means your drive also supports 4K sectors, and you can switch it to this mode (but be careful, you will lose all data, so do this before partitioning). After switching, the disk starts reporting a 4K logical sector size, and the kernel will use 4K too. In Linux, all layers, such as LVM, LUKS, and EXT4, support 4K sectors perfectly. Here’s my question—after switching to 4K and partitioning the disk, for example, using sgdisk, what should the first sector be? Is it still 2048? Because when I run fdisk -l, it looks like this:

LC_ALL=C fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 931.51 GiB, 1000204886016 bytes, 244190646 sectors
Disk model: WD Blue SN580 1TB                       
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 841C3EDF-DDBE-44BF-8C85-DC1C0CD7D521

Device          Start       End   Sectors   Size Type
/dev/nvme0n1p1    256     77055     76800   300M EFI System

As someone who has partitioned a lot of disks, not seeing 2048 as the first sector immediately concerned me. But maybe I haven't partitioned such 4K disks before, so I started calculating:
- For the first sector: 256 sectors x 4096 = 1048576 bytes = 1MiB
- For the last sector: (77055+1) sector x 4096B = 315621376 bytes, and then 315621376 bytes modulo 1048576 bytes = 0
- For size: 76800sectors * 4096 = 314572800 bytes, and then 314572800b modulo 1048576b = 0

Everything seems to add up, but what do you think? I want to be 100% sure that my installer performs correct partitioning in this case. Is there anyone familiar with Advanced Format/disks using 4K and partitioning such drives who could provide some insight? Or who use 4K as the logical sector—could you share the output of your fdisk -l command? I’m curious if it looks the same for you.

Offline

#2 2024-08-12 19:30:14

frostschutz
Member
Registered: 2013-11-15
Posts: 1,464

Re: Advanced NVMe disks with 4k sectors and partitions alignment

It's the same 1 MiB offset, so you're fine. (You could use `parted /dev/disk unit b print` to have it shown in byte offsets directly).

>>> 256 * 4096
1048576
>>> 2048 * 512
1048576

The GPT partition table (and its implementation in the Linux kernel) unfortunately still relies on the logical sector size. We need a new standard partition table format that uses MiB instead of sector so creating unaligned partition is not possible.

Operating systems could support both on their own accord but no one implements that, so it's up to the user to re-write the partition table to make the partitions visible again — assuming the data was not actually deleted anyway.

Whether it actually helps with performance for your use-case… did you benchmark it?

Last edited by frostschutz (2024-08-12 19:31:35)

Offline

#3 2024-08-13 18:00:46

kelloco2
Member
Registered: 2012-02-13
Posts: 129

Re: Advanced NVMe disks with 4k sectors and partitions alignment

frostschutz wrote:

The GPT partition table (and its implementation in the Linux kernel) unfortunately still relies on the logical sector size.

Whether it actually helps with performance for your use-case… did you benchmark it?

Thank you for the detailed explanation. I agree with You - it could be made a bit simpler. Btw in my installer, I rely on reliable Linux tools for partitioning, which generally do a good job of partitioning (like sgdisk) and aligning partitions, and I use gigabytes and megabytes everywhere. I use bytes only when necessary (e.g. to deal with sector size). Just for a brief moment after partitioning, when I check partition alignment using parted align-check, I also verify it with my own code, which goes down to the sector level—this part is harder to understand. Unfortunately, there's a lot of information online about 2048 being the correct first sector, but for truly 4k disks (with physical and logical sectors), there isn't much information that 256 sector is correct. Regarding performance, I haven't benchmarked it. I just wanted to set up the disk this way right out of the box, and honestly, this disk is too fast for my needs. You know, my main laptop is a 5-year-old budget machine with a Core i7 that probably can't even fully utilize this disk's potential, and I have 32GB of RAM—Linux caches almost all of my files in RAM anyway. However, I did run one test on this disk when I copied a really large number of files from one partition to another (and it's hard to fool Linux using file system magic in this case), and it reached a speed of 2.5GB/s (I immediately used sync to be sure) . I've never had such a fast disk or seen such high transfer rates, so I'm satisfied. Of course, the manufacturer claims even higher speeds, but I'm using strong file encryption on this drive, and I've read that with encryption, you can easily divide the maximum transfer speed by half—that’s how much it can drop. So considering that I'm getting 2.5GB/s and the manufacturer claims just under 5GB/s, I'm happy with it. I know it might seem strange that I didn't check the transfer speeds earlier, but I just like experimenting

Offline

#4 2024-08-14 08:39:05

nl6720
The Evil Wiki Admin
Registered: 2016-07-02
Posts: 644

Re: Advanced NVMe disks with 4k sectors and partitions alignment

If your drive supports OPAL, then you can use LUKS with OPAL instead of software encryption and retain the drive's full speed. See https://wiki.archlinux.org/title/Self-e … cryptsetup. Mind that the preparation requires wiping the whole drive.

Offline

Board footer

Powered by FluxBB