You are not logged in.
Pages: 1
Hello,
I recently built from scratch a nice headless server. Arch is installed on it and running fine. Node 804 enclosure, space for six 3.5" hard drives, the whole shebang.
Regarding the hard drives, I have three fresh Seagate IronWolf 4 TB drives, and another one which is almost full.
I would like to setup a RAID 5 on this server. The minimum number of disks for this level is 3, so I should be good. But this is my first time setting up a RAID so I would like to ask a few questions, to make sure I understand the full process. I have read the wiki page here: https://wiki.archlinux.org/index.php/RA … filesystem
The server is intended to act as a NAS (among other things). I already have Samba up and running.
1) I would like to first setup the raid on the three fresh drives. Then plug the 4th drive (the one that contains data), copy the data to the RAID, wipe the 4th drive, and add it to the array. Would that work?
2) I would like to format the drives and create several partitions. One for movies/series, publicly accessible through Samba. And another one for backups, not accessible through Samba. Can I create several partitions on a RAID? I would say so, after reading the wiki.
3) When, if at all, should I calculate the stride and stripe width? What is the impact of setting/not setting these two parameters?
4) Should I change the block size? I read the article here: https://www.zdnet.com/article/chunks-th … rformance/, which basically says that the block size should be set accordingly to the I/O load. Can it make a huge difference is not setup properly? I read that the default is 4KB. My typical I/O load would be reading/writing videos from the raid, or uploading files through ssh.
Thanks for your help, and merry christmas!
Last edited by djipey (2020-12-25 21:34:53)
Offline
if you don't have a dedicated RAID card, use software RAID is better the faked RAID (mother board RAID)
before you building the RAID, I suggest you to back up your data if it's important!!!
you can google how to do it (LVM + Software RAID), there are many tutorials available.
good luck and happy holiday!
Offline
one SSD plus a back up better than a bunch of RAID.
Offline
I would advice against pure raid. Both a hardware raid card and mdadm are not able to know what data is correct when one of the drives decides to flip a bit. This will lead to unnoticed bit rot. This guy explains these dangers in detail [1]
I would recommend to use btrfs [2], as it can create a raid array with non-uniform drives, and it can detect "bit-rot". If it has more than one copy of the data, it can also correct bit-rot. Don't use the raid5 and raid6 modes, use raid1, raid1c3, and raid1c4 (2, 3 or 4 copies of the data) modes instead. [3]
If you want extra speed, you could use bcache [4]. Be advised that you cannot convert existing data on a drive to use bcache, both the caching SSD partition and the hard drive partition needs to be bcache formatted. Don't use write caching, unless you have multiple SSD's, like this stack on my NAS:
+-------------------------------------------------+
| btrfs raid 1 /Storage |
+-------------------------+-----------------------+
| /dev/Bcache0 | /dev/bcache1 |
+------------+------------+-----------+-----------+
| Cache | Data | Cache | Data |
| /dev/sda2 | /dev/sdb1 | /dev/sdc2 | /dev/sdd1 |
+------------+------------+-----------+-----------+Finally: a raid is not a backup, it's only a high-availability solution. If for instance the power supply of your box decides to do something stupid, it's possible all drives die at the same time.
During configuring of your NAS, don't have your (non-backed up) data anywhere near it. A single typo in the commands can destroy all data on the wrong drive in seconds.
[1] Level1Enterprise RAID: Obsolete? New Tech BTRFS/ZFS and "traditional" RAID
https://www.youtube.com/watch?v=yAuEgepZG_8
Level1Enterprise RAID Obsolete? Part 2: Failure Testing Linux's RAID: md, h/w & BTRFS
https://www.youtube.com/watch?v=pv9smNQ5fG0
[2] https://wiki.archlinux.org/index.php/Btrfs
Offline
After building the NAS, it may be useful to test it. I've made this script to generate data and a md5 checksum:
$ cat ~/mkfiles_and_md5.sh
#! /bin/bash
for n in {1..6000}; do
if [ ! -f file$( printf %03d "$n" ).bin ]; then
dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1M count=100
md5sum file$( printf %03d "$n" ).bin >> md5sums.txt
fi
doneAfterwards, the data can be compared to the md5sums using this command:
$ md5sum -c md5sums.txt
file001.bin: OK
file002.bin: OK
file003.bin: OK
file004.bin: OK
file005.bin: OK
file006.bin: OK
file007.bin: OKKeep a terminal open with dmesg -w, so you can see errors in the system before they cause problems with real data
Offline
Thanks all for your answers. I will definitely take some of your suggestions (the script to test the raid, and the advice to not have my non-backed-up data anywhere near the raid being built). Also, I know raid isn't a backup. I have that covered, my vital data (not the videos) is backed up off-site already.
For the rest, I already made my decision. I will use mdadm with a ext4 filesystem. No btrfs or zfs. We could argue endlessly on this choice, but I simply don't need the fancy features from zfs/btrfs. I want something simple to setup, and I want to be able to expand the array one drive at a time. mdadm has been around forever, and it would work well in my setup (I think). That's what Synology is running on its NAS anyway.
Regarding my 4 initial questions, would you mind providing answers to them specifically?
Offline
1) I would like to first setup the raid on the three fresh drives. Then plug the 4th drive (the one that contains data), copy the data to the RAID, wipe the 4th drive, and add it to the array. Would that work?
https://wiki.archlinux.org/index.php/RA … o_an_array
2) I would like to format the drives and create several partitions. One for movies/series, publicly accessible through Samba. And another one for backups, not accessible through Samba. Can I create several partitions on a RAID? I would say so, after reading the wiki.
https://wiki.archlinux.org/index.php/RA … filesystem
3) When, if at all, should I calculate the stride and stripe width? What is the impact of setting/not setting these two parameters?
Covered by previous link.
4) Should I change the block size? I read the article here: https://www.zdnet.com/article/chunks-th … rformance/, which basically says that the block size should be set accordingly to the I/O load. Can it make a huge difference is not setup properly? I read that the default is 4KB. My typical I/O load would be reading/writing videos from the raid, or uploading files through ssh.
I always used block size of the underlying devices.
Offline
We could argue endlessly on this choice, but I simply don't need the fancy features from zfs/btrfs. I want something simple to setup, and I want to be able to expand the array one drive at a time. mdadm has been around forever, and it would work well in my setup (I think). That's what Synology is running on its NAS anyway.
Regarding my 4 initial questions, would you mind providing answers to them specifically?
It's your system, so do whatever you like.
Still, I would like to point out that I like btrfs better than ZFS because:
-My system with btrfs + bcache uses 140MB of ram directly after bootup. I have not tried ZFS, but I guess it's a lot more.
Tasks: 178 total, 1 running, 177 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 1973.0 total, 1660.8 free, 139.4 used, 172.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1694.7 avail Mem-A btrfs raid can be constructed using drives with random size, and you can change the number of copies of the data while the filesystem is mounted. All new files will honor the new rule. This means you are quite flexible in case of an emergency / if you change your mind / if you want to upgrade your array with second hand drives that are bigger/faster/better than the ones currently in the array. The rebalance that makes all files have the same raid level can be done when it's most convenient.
To answer your 4 questions, I don't know. I have never setup a raid before i build my bcache + btrfs NAS.
To answer your 4 questions in case you are using btrfs:
1 - Yes. this command will do that: sudo btrfs device add /dev/sdx1 /data
2 - Kind of. Btrfs can be divided into subvolumes. See https://wiki.archlinux.org/index.php/Btrfs#Subvolumes
3 - I don't know. On my system bcache makes sure data is read and written in optimal sized chunks.
4 - I don't know.
Offline
Thanks. 1) and 2) were sanity checks.
Regarding 3) and the stride and stripe width, I was just wondering if it is *necessary* to calculate them when formatting to ext4. For example, running this command:
[citation]
# mkfs.ext4 -v -L myarray -b 4096 -E stride=128,stripe-width=256 /dev/md0
[/citation]
-E is an option. What would be the impact of using the defaults (== not doing the calculations)?
4) Thanks. I can get that with "fdisk -l" right?
Disque /dev/sdc : 1,82 TiB, 2000398934016 octets, 3907029168 secteurs
Modèle de disque : ST2000LM007-1R81
Unités : secteur de 1 × 512 = 512 octets
Taille de secteur (logique / physique) : 512 octets / 4096 octets
taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets
Type d'étiquette de disque : dos
Identifiant de disque : 0x9b2f8737Sorry for the french output. 1 octet == 1 byte. In this case, the physical block size of my disk is 4096 bytes right?
Offline
4) Thanks. I can get that with "fdisk -l" right?
Disque /dev/sdc : 1,82 TiB, 2000398934016 octets, 3907029168 secteurs Modèle de disque : ST2000LM007-1R81 Unités : secteur de 1 × 512 = 512 octets Taille de secteur (logique / physique) : 512 octets / 4096 octets taille d'E/S (minimale / optimale) : 4096 octets / 4096 octets Type d'étiquette de disque : dos Identifiant de disque : 0x9b2f8737Sorry for the french output. 1 octet == 1 byte. In this case, the physical block size of my disk is 4096 bytes right?
I believe so. When partitioning, make sure your partitions line up with the 4096 byte blocks, to avoid misaligned writes.
Offline
The drive is 512 emulated / 4K native so yes 4K block size.
The worst case would be misalignment causing writes to the files-system to be mapped to multiple disk writes that is in addition to the extra disk write needed to update parity.
Offline
I believe so. When partitioning, make sure your partitions line up with the 4096 byte blocks, to avoid misaligned writes.
Roger that, thanks.
One last question regarding the calculation of the *stripe* width. In my setup, I will first have 3 disks in my array (2 data disks and one parity disk). According to the wiki, the maths should be:
stride = chunk size / block size
stride = 512 / 4 # assuming defaults
stride = 128
stripe width = # of physical data disks * stride
stripe width = 2 * 128
stripe width = 256But what happens to these values when I add a 4th disk to my array? The stripe width should be 384 at this point. Would this be automatically handled by "mdadm --grow" ?
Offline
But what happens to these values when I add a 4th disk to my array? The stripe width should be 384 at this point. Would this be automatically handled by "mdadm --grow" ?
No. mdadm does not care about the filesystem. It's a generic/filesystem-agnostic block storage device.
For ext4 you can adjust with tune2fs. For other filesystems it depends on their tools.
For a NAS (for media storage) it does not matter either way. It might be interesting for servers where every tiny bit of performance counts, but otherwise you just won't notice any difference.
Last edited by frostschutz (2020-12-25 11:44:13)
Offline
No. mdadm does not care about the filesystem. It's a generic/filesystem-agnostic block storage device.
For ext4 you can adjust with tune2fs. For other filesystems it depends on their tools.
For a NAS (for media storage) it does not matter either way. It might be interesting for servers where every tiny bit of performance counts, but otherwise you just won't notice any difference.
Perfect, thanks. That's what I was asking at the beginning. I'll just format my partitions with ext4, with the defaults.
Offline
Please capitalize and move [SOLVED] the the start of the title for clarity.
Offline
Pages: 1