You are not logged in.

#1 2013-10-14 13:08:50

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,595
Website

RAID1 confusion when a disc goes bad

I have one of the most simple RAID1 arrays I think: 2x 4TB discs (/dev/sdb1 and /dev/sdc1). Everything is fine now, but let's assume that one of the discs starts going bad some day.

1) How does mdadm know which copy of the data to expose? In other words, if /dev/sdb is going bad and has bad data one it. /mnt/md0/foo technically resides both on /dev/sdb1 and /dev/sdc1. How does mdadm know which copy of /mnt/md0/foo is good?

2) If I scrub by running `echo check > /sys/block/md0/md/sync_action` regularly, how can I understand which files have differing checksums if I do actually have corruption on the array?
2b) When I do find out that /mnt/md0/foo has a different checksum from its copy, how do I know which of the two is the "good" copy?


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2013-10-14 13:15:14

Andreaskem
Member
Registered: 2013-10-13
Posts: 67

Re: RAID1 confusion when a disc goes bad

Offline

#3 2013-10-14 14:30:43

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: RAID1 confusion when a disc goes bad

Like it is explainned in the link Andreaskem posted, as long as the disks are not returning any errors the data is assumed to be good, if one of the disks returns an error when reading the data will be fetched from the other disk and shoulf hopefully be the correct data.

In case both disks are returning data but one of them is returning corrupt data, your only chance is to either use a filesystem that does automatic checksum creation and verification or you need to separately verify manually created checkesums.

I'm currently using option 2, I have a raid 1 array (2 usb external drives) formatted with ext4, every file that is copied into the array already carries an md5sum in its extended attributes. If I ever have the problem of one of the drives returning bad data, I can force the array to start in degraded mode (1 drive at a time) and hopefully I'll be able to tell which copy is good by checking against the stored md5sum.

As a final note, I might be mistaken but `echo check > /sys/block/md0/md/sync_action` may find discrepancies if you dont stop the array properly but that doesn't mean your data has any discrepancies.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#4 2013-10-14 14:39:04

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,595
Website

Re: RAID1 confusion when a disc goes bad

OK... so mdadm is not file-level RAID, it is block-level RAID and when I scrub, I am not looking at the checksums of the files, but rather am looking for corrupt blocks on the device.  Do I have this right?

Last edited by graysky (2013-10-14 14:55:45)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#5 2013-10-14 16:50:40

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: RAID1 confusion when a disc goes bad

graysky wrote:

OK... so mdadm is not file-level RAID, it is block-level RAID and when I scrub, I am not looking at the checksums of the files, but rather am looking for corrupt blocks on the device.  Do I have this right?

You will see the number of blocks where there is a mismatch, however I would not call those blocks corrupt right away. In the example I mentioned, where the disks can be disconnected without stopping the array first, it might just mean some metadata has not been updated to be consistent across all disks.

I have seen this before in my array after doing `echo check > /sys/block/md0/md/sync_action`, but I didn't find any corrupted files (check against stored md5sum), fsck didn't seem to find any problems on the ext4 filesystem and everything pointed to the array being just fine.

People that deal with this on a daily basis might have a different oppinon though, as I don't have a lot of experience with this (didn't have to solve any crysis yet).

Edit:
Typo

Last edited by R00KIE (2013-10-14 16:51:19)


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB