You are not logged in.

#1 2015-12-03 20:36:05

maggie
Member
Registered: 2011-02-12
Posts: 255

RAID scrubbing how to fix an error on a check [not solved]

I have read on the wiki that scrubbing should be done with

echo check > /sys/block/md0/md/sync_action

But I don't understand what to do when errors are found that can't be fixed automatically like if good sectors contain bad data. The article says that users can inspect the data in the sector and the data that would be produced by rebuilding the sectors from redundant information and pick the correct data to keep but it doesn't show how to do this. How should I handle this if a scrub gives me some errors?

Last edited by maggie (2015-12-04 20:23:55)

Offline

#2 2015-12-03 21:51:29

frostschutz
Member
Registered: 2013-11-15
Posts: 1,499

Re: RAID scrubbing how to fix an error on a check [not solved]

There is no straight forward way to do this. Basically the "good sectors contain bad data" case should not occur, ever. If the disks do not report read errors, the RAID layer has no idea which data is correct and which isn't.

Online

#3 2015-12-03 22:01:15

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: RAID scrubbing how to fix an error on a check [not solved]

I remember when I first read that, it seemed somewhat reasonable. At least for a mirrored RAID. Not so much for a parity RAID, but I don't do those so that didn't concern me. But after thinking about it, I've come to the conclusion that it's bunk. Hell, the check doesn't even report which sectors were found to be bad!

The only way I could see a manual repair being practical is if the tools put the array into a read-only state and presented virtual RAIDs indicating the repaired and unrepaired states. This is theoretically possible I think, but the tools aren't capable of any such thing AFAIK.

Last edited by alphaniner (2015-12-03 22:01:49)


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#4 2015-12-03 23:34:44

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,696
Website

Re: RAID scrubbing how to fix an error on a check [not solved]

Maybe I am mistaken, but this is why I chose ZFS.  The scrub operation on ZFS somehow knows as I understand it.  Correct me if I am wrong.


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#5 2015-12-04 14:07:46

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: RAID scrubbing how to fix an error on a check [not solved]

The main thing that puts me off ZFS is the recommended practice of <80% pool usage.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#6 2015-12-04 15:20:34

TheChickenMan
Member
From: United States
Registered: 2015-07-25
Posts: 354

Re: RAID scrubbing how to fix an error on a check [not solved]

This is something which has also always worried me. I have never heard an explanation which I have found satisfying about it either. I do like to hope that when using more than a single parity (eg raid 6) the scrubbing operation would chose the data said to be correct by the greater number of sources. If both parities agree it would correct the data, if the data and one parity agrees it would correct the second parity and so on. I have never seen definitive proof that it is able to do this but it would be nice if it were.


If quantum mechanics hasn't profoundly shocked you, you haven't understood it yet.
Niels Bohr

Offline

#7 2015-12-04 20:23:23

maggie
Member
Registered: 2011-02-12
Posts: 255

Re: RAID scrubbing how to fix an error on a check [not solved]

I will just scrub with check not repair roll

Offline

#8 2015-12-04 21:12:58

frostschutz
Member
Registered: 2013-11-15
Posts: 1,499

Re: RAID scrubbing how to fix an error on a check [not solved]

TheChickenMan wrote:

I do like to hope that when using more than a single parity (eg raid 6) the scrubbing operation would chose the data said to be correct by the greater number of sources.

That's not done and it could be the wrong thing to do, too. If you accidentally zero out two drives (or --assemble --force two outdated drives) what you ought to do is kick both drives out of your RAID6 (and have the good data survive) rather than "repair" and have zeroes / bad data on both those drives over-vote all others.

The default behaviour is to recalculate parity. That's the mode of operation that does not change data (what you read before repair = what you read after repair). That's an important property to have since some filesystems may be confused and cause kernel panics if the data of their device changes by itself while in use.

I will just scrub with check not repair

Yes, do checks, and also check the mismatch_cnt afterwards, and if it ever turns out to be != 0 you can still hit the brakes and ponder it some more. That's what I will do, but so far, no disk decided to corrupt its data on me (not w/o reporting proper read errors that is).

Hell, the check doesn't even report which sectors were found to be bad!

That would be nice to have as well. You can determine it manually or do the stupid approach with n different raid views (each with missing parity disks) and then compare files.

this is why I chose ZFS

Hope it works out for you.

Last edited by frostschutz (2015-12-04 21:13:45)

Online

Board footer

Powered by FluxBB