btrfs : how to be warned of error in checksum

seal20 · 2014-08-07 10:04:41

Hi everyone,

I had a dying drive so I decided to change it and at the same time to move from ext4 to btrfs. It is a single drive setup so I though raid1, even if possible (http://askubuntu.com/questions/406096/h … id-1-array) do not bring much benefits for the cost (lose half the space). Such a raid will only protect from bitrot when the filesystem is umounted which has a very low probability, I am correct?

I ended up with a single btrfs partition with different subvolumes. I plan to snapshot weekly the "Now Working On" subvolumes because it is the one where most of the changes happened.

Here is my setup:

sylvain@sylvain-pc ~ % sudo btrfs filesystem show /mnt/Research
Label: 'Research'  uuid: 4af42047-285e-4f13-a778-30a7517ff33c
        Total devices 1 FS bytes used 89.99GiB
        devid    1 size 1.82TiB used 92.04GiB path /dev/sdc1

Btrfs v3.14.2-dirty
sylvain@sylvain-pc ~ % sudo btrfs subvolume list /mnt/Research
ID 257 gen 84 top level 5 path Symposium and Presentations
ID 258 gen 128 top level 5 path Software
ID 259 gen 20 top level 5 path Classes
ID 260 gen 79 top level 5 path Oceano
ID 261 gen 112 top level 5 path Now Working On

But to take full advantage of btrfs I though it will be nice to run scrubs once a week and get warned when corruption is detected so I can repair the file from a backup. Two challenges were raised.

1 How to be warned if corruption happened? I don't want to check every week the system logs so I need a log with only the errors. I found this script that could be a good base for a start: http://marc.merlins.org/linux/scripts/btrfs-scrub but it is for debian and it assumes scrubs are logged in syslogs. In a systemd archlinux system where does it log?

2. I need to link corrupted inode to real filename so I can copy the file from my backups (which are taken daily and copied to a NAS). For this I found : http://www.commandlinefu.com/commands/v … sum-errors but I don't know if it works.. Can someone confirm it?

I ended up drafting a script that may do the work but I need some help in finalizing it. Could a script/btrfs guru can have a look? or could somebody point me to something that would do what I want?

#! /bin/bash

#!!!This script is not yet finished/working do not use it!!!

# By Marc MERLIN <marc_soft@merlins.org> 2014/03/20
# License: Apache-2.0
#Modified by Sylvain 2014/08/08

test -x /usr/bin/btrfs || exit 0

# bash shortcut for `basename $0`
PROG=${0##*/}
lock=/var/run/$PROG

# shlock (from inn) does the right thing and grabs a lock for a dead process
# (it checks the PID in the lock file and if it's not there, it
# updates the PID with the value given to -p)
if ! shlock -p $$ -f $lock; then
    echo "$lock held, quitting" >&2
    exit
fi

if which on_ac_power >/dev/null 2>&1; then
    ON_BATTERY=0
    on_ac_power >/dev/null 2>&1 || ON_BATTERY=$?
    if [ "$ON_BATTERY" -eq 1 ]; then
	exit 0
    fi
fi

for btrfs in $(grep btrfs /proc/mounts | awk '{ print $1 }' | sort -u)
do
    #create a tmp log file with all the btrfs related info logged in syslog. Need to adjust because I cannot see anything in syslog
    $NOW= $(date +"%F")
    tmp_filename= $btrfs_$NOW_full
    touch /tmp/$tmp_filename
    #Start logging and the scrubs
    echo "Starting scrub of $btrfs" > $tmp_filename
    tail -n 0 -f /var/log/syslog | grep "BTRFS: " | grep -Ev '(disk space caching is enabled|unlinked .* orphans|turning on discard|device label .* devid .* transid|enabling SSD mode|BTRFS: has skinny extents|BTRFS: device label)' > /tmp/$tmp_filename&
    /usr/bin/btrfs scrub start -Bd $btrfs
    pkill -f 'tail -n 0 -f /var/log/syslog'
    echo "Ended scrub of $btrfs" > $tmp_filename
    #Create a file containing errors
    err_filename= $btrfs_$NOW_errors
    touch $HOME/$err_filename
    cat $tmp_filename | grep -Po 'csum failed ino\S* \d+' | awk '{print $4}' | sort -u | xargs -n 1 find / -inum 2> $HOME/$err_filename
done

rm $lock

What do you think? any suggestions?

Spider.007 · 2014-08-07 11:13:17

Your script looks fine, but I would reduce it for readability, to simply:

while read d m t x
do
    [[ $t != "btrfs" ]] && continue
    btrfs scrub start -Bd $m
    dmesg | grep -o 'csum failed ino [0-9]*' | awk '{print $4}' | sort -u | xargs -n 1 find / -inum
done </proc/mounts

Also, if possible I would e-mail the results or find an alternative to notify yourself.

Last edited by Spider.007 (2014-08-07 11:17:49)

seal20 · 2014-08-08 01:58:47

Thanks Spider.007

From your script I understand that scrub error will show in dmesg. But when I run a scurb manually it does not anything in dmesg. Is it because I have no error??

I am a shell script beginner and I don't understand how your script works, could you explain it a bit. In my script I catched the btrfs devices from proc/mount using grep and awk, but it seems you only use "read d m t x" ? How is it possible ? I think I don't understand the "injection" of </proc/mount in the loop...

I need to do more research.

For the mail it would be great I will investigate how it is possible.

Spider.007 · 2014-08-08 09:21:16

Sending email is simple; have a look at https://wiki.archlinux.org/index.php/SSMTP for example. Btrfs-scrub does indeed log it's errors in the dmesg; and obviously they are only logged if an error occurs. You could also have a look at the boot-message (when mounting the drive initially). On my machine it outputs a summary per device:

kernel: BTRFS: bdev /dev/sdc2 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
kernel: BTRFS: bdev /dev/sdd2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
kernel: BTRFS: bdev /dev/sda2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0

Regarding the bash-magic; from `man read`:

The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Section 2.6.5, Field Splitting); the first field shall be assigned to the first variable var, the second field to the second variable var, and so on. If there are fewer fields than there are var operands, the remaining vars shall be set to empty strings.

If you are serious about learning shell scripting this is a great place to start: http://tldp.org/LDP/abs/html/

Last edited by Spider.007 (2014-08-08 09:21:30)

Lockheed · 2015-02-27 17:38:21

seal20 wrote:

It is a single drive setup so I though raid1, even if possible (http://askubuntu.com/questions/406096/h … id-1-array) do not bring much benefits for the cost (lose half the space). Such a raid will only protect from bitrot when the filesystem is umounted which has a very low probability, I am correct?

Since several months, I have just such a setup (2TB hdd with 1TB+1TB in btrfs RAID1) which I have done precisely because of info in this link. And, by the way, apart from losing half the space, you also lose half the write speed (read speed remains unaffected (or actually slightly better)).

What I am curious about is where did you get the idea that such setup protects from data corruption only when filesystem is unmounted? Maybe I misunderstand something, but it makes no sense to me. Scrub runs on mounted partitions and it fixes errors when the filesystem is "online".

Last edited by Lockheed (2015-02-27 17:42:40)

Arch Linux

#1 2014-08-07 10:04:41

btrfs : how to be warned of error in checksum

#2 2014-08-07 11:13:17

Re: btrfs : how to be warned of error in checksum

#3 2014-08-08 01:58:47

Re: btrfs : how to be warned of error in checksum

#4 2014-08-08 09:21:16

Re: btrfs : how to be warned of error in checksum

#5 2015-02-27 17:38:21

Re: btrfs : how to be warned of error in checksum

Board footer