You are not logged in.

#1 2011-07-01 04:07:32

thetrivialstuff
Member
Registered: 2006-05-10
Posts: 191

what does mdadm's sync_action check do?

Obviously, it checks the RAID array you run it on, but I'd like to know more specifics on what it does and does not do. For instance, through some testing and reading of docs, I've deduced that it does *not* do any kind of comparison between data (this is on a RAID-1 mirrored array). This is by design:

RAID (be it hardware or software), assumes that if a write to a disk doesn't return an error, then the write was successful. [...] RAID cannot, and is not supposed to, guard against data corruption on the media. Therefore, it doesn't make any sense either, to purposely corrupt data (using dd for example) on a disk to see how the RAID system will handle that.

(from https://raid.wiki.kernel.org/index.php/ … nd_testing )

(I kind of wish that it *would* do a binary comparison, just to make me feel better... but anyway.)

So, what *does* it do? Does it just do the equivalent of

dd if=/dev/sdXA of=/dev/null
dd if=/dev/sdYA of=/dev/null

just to ensure that the drives can read those sectors? And what does it do in case the drive returns a read error? I would expect that to result in a "check failed" message somewhere, but apparently that's not always the case.

During a check on one of my servers, the dmesg showed a bunch of read errors from the first drive, and the SMART log confirms that... but the check succeeded. And strangely, the drive now reads successfully from start to finish, and using dd and diff, I was able to confirm that the "bad" regions on that drive are now readable and binary identical to the good mirror.

I'm guessing what happened is that if mdadm encounters a bad sector while reading, it assumes the drive will swap in one of its spare sectors, and tries to copy the data from the good drive back onto the bad sector, then retries the read of that sector -- if the drive returns a successful read, it means the bad sector was fixed, and mdadm doesn't have to treat it as a problem any more. Is that right? I can't find that in the docs anywhere, and I'd like to know for sure. (Still gonna replace the drive in the mean time.)

~Felix.

Offline

Board footer

Powered by FluxBB