[SOLVED] RAID with multiple partitions

blinkingbit · 2024-11-20 10:29:25

It's my first time mounting a RAID configuration and I'm a bit confused because of the following warning in the wiki:

Note: It is also possible to create a RAID directly on the raw disks (without partitions), but not recommended because it can cause problems when swapping a failed disk.

I don't know if I have to create one RAID1 with 6 partitions inside or 3 RAID1 with 2 partitions inside.

I have 2 disk /dev/sda /dev/sdc one is completely new and the other is currently being used for storage.

I have 3 partitions on sdc: /dev/sdc1, /dev/sdc2 and /dev/sdc3. And I want to use /dev/sda to mirror those partitions with RAID1.

I have found that I should make the array with the empty disk first, then migrate the content to the array and then grow the array with my used disk. I'm not really sure about the relation between the array, the partitions, the disks and the --raid-devices parameter of mdadm. All examples I found talk about one partition only.

Last edited by blinkingbit (2024-11-20 16:16:23)

frostschutz · 2024-11-20 13:16:49

The note in the wiki really ought to be a warning. Using drives without partition table is dangerous. Plenty of software out there does not expect a partition-less setup, does not recognize metadata, and will just see that as an empty drive and offer it as the default location to install, partition, format, (or do these directly without even asking), resulting in data loss. There's just no point in not using a partition table. Too much risk and the only reward is 1 more megabyte of disk space.

So, you should partition each drive, no matter what you use it for. Then put RAID on partitions. If you have three partitions, that could be done as three RAIDs (e.g. md1, md2, md3). Alternatively you could stick to one partition / one RAID and then use LVM to create three logical volumes. Or put it all in one filesystem, if you don't need them separated. It depends on your personal preference and use case.

If you only have one drive available (because the other is still in use), then yes, creating RAID on one drive (either as single drive RAID1 using --force, or two drive RAID1 using missing in place of one drive) is a standard method. Copy your files over (rsync -a), verify the copy is good and once you're sure it's all good to go, zap the original drive and add it as the missing drive to your RAID.

Just note you don't have any redundancy until that resync is finished. An additional backup would add some peace of mind and you need backups anyway.

cryptearth · 2024-11-20 14:06:30

the advice to use partitions rather than devices is to deal with different sized drives
as an example: you have three drives, one each from seagate, WD and toshiba - let's use a nominal size of 1tb for the sake of easy maths
first of all there's a difference between using si-prefixes base 10 (kilo= 1.000, mega= 1.000 kilo, ...) and binary prefixes base 2 (kibi= 1.024, mebi= 1.048.576,...)
usually drives are marketed using si-prefixes base 10 - so a 1tb are 1.000.000.000.000b
and although over the years several de-facto standards came up there's no actual standard defining them exactly - so the three drives in our example could all be the same - but could also be different
for them to work properly only the size of the smallest drive can be used - so unless the drives are all equal you're better of creating equal sized partitions - otherwise if you have matching drives go full drive without partitions
some solutions bring thier own style: ZFS always creates partitions even if you give the whole drive (which is recommended for zfs anyway for several reasons)
then you have to plan if you even want to use partitions or volumes or datasets or whatnot
you can mirror your existing drive - but do you want it? a 1-to-1 full mirror is, depending on drive size and speed, a quite time consuming operation
going the smart way and only replicate actual used data can speed up things up to several times
or maybe a re-structure from partitions to datasets might fit your needs better

I hate to give this reply - but multi-drive arrays depend on use case
just mirror a drive because you can isn't a meaningful use case (although the least harmful one when something goes bad as each drive should work on it's own) - if you don't plan to take advantage of the increased speeds and availability but just mirror it for the sake mirroring - that can bite you unexpected by stuff like not properly keep boot file in sync

as for antifreeze reply:
I guess that's the proper explanation - not just to prevent mis-aligned sizes but to prevent damage from software not able to detect the raid master signature
using gpt with protective mbr makes drives larger 2tb dos compatible: older software just sees an mbr drive with one partition spanning the entire drive while gpt-aware tools can read the layout correct

personal opinion:
as I searched for quite some time and use it for a while now I recommend using ZFS
why?
- it abstracts the management of the physical drives - you just give it a target drive and let do it its magic
- it only copies the actual used blocks - prevents additional stress by copy tons of 0s of empty sectors
- has better resiliency against bit rot then what can be achieved by using md or lvm
- has native encryption
- has native snapshots
- is true cross-platform compatible between linux, mac and windows
the only drawbeck is zfs is quite memory hungry as it uses main ram as cache - but this can somewhat mitigated by a fast ssd
for your use case I recommend somehow backup the data to a third temp storage - wipe both drives - setup a fresh pool with both drives in a mirror vdev - and restore from backup

blinkingbit · 2024-11-20 16:15:55

Thank you both. The LVM versus multiple RAID configurations was the confusing part for me, seeing them as both viable solutions each with its own quirks leaves me more relaxed.

Just for the sake of clarity: I'm not using RAID1 as a backup of any kind. I just wanted to increase a set of files availability because even if I can end up restoring the data, I want to have access to it in the event of a drive failure. If both drives fail at the same time I have a backup for it in the cloud, but if only one fails I can just remove the bad one and keep moving until I can get a replacement. Redundancy is just a convenience thing, I'm not looking for security (I have a backup) or speed (I have some nvmes too), just a smoother transition between a drive that starts failing and the arrival of the replacement.

I have decided to keep 3 raids with 2 partition each because LVM is too much abstraction for me and the possibility of growing each raid seems easy enough having 3 separated arrays.

As for the filesystem, I just use ext4 because it's the default. I guess abstraction of the physical drives is a good thing for some but I like to know what is contained in each drive.

cryptearth · 2024-11-20 17:52:01

so your use case goes towards availability
for that raid1 is a good start as it comes with what you aim for with some additional pros like speed
but it also has a con: you only have access to half of the raw bulk storage as the other half is the mirror
ext4 on top of simple md mirror can work - but be aware of bit rot - it can cause damage!
why: ext4 was not designed to be used on some sort of raid - so its fault resiliency relies on that the drives tell the underlying md that an error occured - but: what happen when one drive returns bad data without knowing?
in that case md recieves two different data streams both claim to be correct
in the best case the system somehow correctly identifies the good and the bad data - discards the bad data and restores it from the goid data
in the worst case the system mis-identiefies both the actual good data as bad and the actual bad data as good - discards the actual good data and replaces it with bad data
the raid silently corrupts itself
the issue: there's no way to tell what drive failed unless the bad drives reports itself as such
to get around this filesystems should be used which are designed for raid and come with additional checksums to verify the read block
there're also special raid drives which have 520 byte sectors instead of the regular 512 bytes per sektor - the additional 8 byte are used as a checksum of the block with a special algorithm able to detect if the block gone bad - some are even so sophisticated they can correct some errors
according to wikipedia there're only few filesystems which provide resiliency with checksums - along with zfs and btrfs the new bcachefs is mentioned - maybe it could be an option?

Arch Linux

#1 2024-11-20 10:29:25

[SOLVED] RAID with multiple partitions

#2 2024-11-20 13:16:49

Re: [SOLVED] RAID with multiple partitions

#3 2024-11-20 14:06:30

Re: [SOLVED] RAID with multiple partitions

#4 2024-11-20 16:15:55

Re: [SOLVED] RAID with multiple partitions

#5 2024-11-20 17:52:01

Re: [SOLVED] RAID with multiple partitions

Board footer