RAID scrubbing how to fix an error on a check [not solved]

maggie · 2015-12-03 20:36:05

I have read on the wiki that scrubbing should be done with

echo check > /sys/block/md0/md/sync_action

But I don't understand what to do when errors are found that can't be fixed automatically like if good sectors contain bad data. The article says that users can inspect the data in the sector and the data that would be produced by rebuilding the sectors from redundant information and pick the correct data to keep but it doesn't show how to do this. How should I handle this if a scrub gives me some errors?

Last edited by maggie (2015-12-04 20:23:55)

frostschutz · 2015-12-03 21:51:29

There is no straight forward way to do this. Basically the "good sectors contain bad data" case should not occur, ever. If the disks do not report read errors, the RAID layer has no idea which data is correct and which isn't.

alphaniner · 2015-12-03 22:01:15

I remember when I first read that, it seemed somewhat reasonable. At least for a mirrored RAID. Not so much for a parity RAID, but I don't do those so that didn't concern me. But after thinking about it, I've come to the conclusion that it's bunk. Hell, the check doesn't even report which sectors were found to be bad!

The only way I could see a manual repair being practical is if the tools put the array into a read-only state and presented virtual RAIDs indicating the repaired and unrepaired states. This is theoretically possible I think, but the tools aren't capable of any such thing AFAIK.

Last edited by alphaniner (2015-12-03 22:01:49)

graysky · 2015-12-03 23:34:44

Maybe I am mistaken, but this is why I chose ZFS. The scrub operation on ZFS somehow knows as I understand it. Correct me if I am wrong.

alphaniner · 2015-12-04 14:07:46

The main thing that puts me off ZFS is the recommended practice of <80% pool usage.

TheChickenMan · 2015-12-04 15:20:34

This is something which has also always worried me. I have never heard an explanation which I have found satisfying about it either. I do like to hope that when using more than a single parity (eg raid 6) the scrubbing operation would chose the data said to be correct by the greater number of sources. If both parities agree it would correct the data, if the data and one parity agrees it would correct the second parity and so on. I have never seen definitive proof that it is able to do this but it would be nice if it were.

maggie · 2015-12-04 20:23:23

I will just scrub with check not repair

frostschutz · 2015-12-04 21:12:58

TheChickenMan wrote:

I do like to hope that when using more than a single parity (eg raid 6) the scrubbing operation would chose the data said to be correct by the greater number of sources.

That's not done and it could be the wrong thing to do, too. If you accidentally zero out two drives (or --assemble --force two outdated drives) what you ought to do is kick both drives out of your RAID6 (and have the good data survive) rather than "repair" and have zeroes / bad data on both those drives over-vote all others.

The default behaviour is to recalculate parity. That's the mode of operation that does not change data (what you read before repair = what you read after repair). That's an important property to have since some filesystems may be confused and cause kernel panics if the data of their device changes by itself while in use.

I will just scrub with check not repair

Yes, do checks, and also check the mismatch_cnt afterwards, and if it ever turns out to be != 0 you can still hit the brakes and ponder it some more. That's what I will do, but so far, no disk decided to corrupt its data on me (not w/o reporting proper read errors that is).

Hell, the check doesn't even report which sectors were found to be bad!

That would be nice to have as well. You can determine it manually or do the stupid approach with n different raid views (each with missing parity disks) and then compare files.

this is why I chose ZFS

Hope it works out for you.

Last edited by frostschutz (2015-12-04 21:13:45)

Arch Linux

#1 2015-12-03 20:36:05

RAID scrubbing how to fix an error on a check [not solved]

#2 2015-12-03 21:51:29

Re: RAID scrubbing how to fix an error on a check [not solved]

#3 2015-12-03 22:01:15

Re: RAID scrubbing how to fix an error on a check [not solved]

#4 2015-12-03 23:34:44

Re: RAID scrubbing how to fix an error on a check [not solved]

#5 2015-12-04 14:07:46

Re: RAID scrubbing how to fix an error on a check [not solved]

#6 2015-12-04 15:20:34

Re: RAID scrubbing how to fix an error on a check [not solved]

#7 2015-12-04 20:23:23

Re: RAID scrubbing how to fix an error on a check [not solved]

#8 2015-12-04 21:12:58

Re: RAID scrubbing how to fix an error on a check [not solved]

Board footer