You are not logged in.
Hi everyone,
I had a dying drive so I decided to change it and at the same time to move from ext4 to btrfs. It is a single drive setup so I though raid1, even if possible (http://askubuntu.com/questions/406096/h … id-1-array) do not bring much benefits for the cost (lose half the space). Such a raid will only protect from bitrot when the filesystem is umounted which has a very low probability, I am correct?
I ended up with a single btrfs partition with different subvolumes. I plan to snapshot weekly the "Now Working On" subvolumes because it is the one where most of the changes happened.
Here is my setup:
sylvain@sylvain-pc ~ % sudo btrfs filesystem show /mnt/Research
Label: 'Research' uuid: 4af42047-285e-4f13-a778-30a7517ff33c
Total devices 1 FS bytes used 89.99GiB
devid 1 size 1.82TiB used 92.04GiB path /dev/sdc1
Btrfs v3.14.2-dirty
sylvain@sylvain-pc ~ % sudo btrfs subvolume list /mnt/Research
ID 257 gen 84 top level 5 path Symposium and Presentations
ID 258 gen 128 top level 5 path Software
ID 259 gen 20 top level 5 path Classes
ID 260 gen 79 top level 5 path Oceano
ID 261 gen 112 top level 5 path Now Working On
But to take full advantage of btrfs I though it will be nice to run scrubs once a week and get warned when corruption is detected so I can repair the file from a backup. Two challenges were raised.
1 How to be warned if corruption happened? I don't want to check every week the system logs so I need a log with only the errors. I found this script that could be a good base for a start: http://marc.merlins.org/linux/scripts/btrfs-scrub but it is for debian and it assumes scrubs are logged in syslogs. In a systemd archlinux system where does it log?
2. I need to link corrupted inode to real filename so I can copy the file from my backups (which are taken daily and copied to a NAS). For this I found : http://www.commandlinefu.com/commands/v … sum-errors but I don't know if it works.. Can someone confirm it?
I ended up drafting a script that may do the work but I need some help in finalizing it. Could a script/btrfs guru can have a look? or could somebody point me to something that would do what I want?
#! /bin/bash
#!!!This script is not yet finished/working do not use it!!!
# By Marc MERLIN <marc_soft@merlins.org> 2014/03/20
# License: Apache-2.0
#Modified by Sylvain 2014/08/08
test -x /usr/bin/btrfs || exit 0
# bash shortcut for `basename $0`
PROG=${0##*/}
lock=/var/run/$PROG
# shlock (from inn) does the right thing and grabs a lock for a dead process
# (it checks the PID in the lock file and if it's not there, it
# updates the PID with the value given to -p)
if ! shlock -p $$ -f $lock; then
echo "$lock held, quitting" >&2
exit
fi
if which on_ac_power >/dev/null 2>&1; then
ON_BATTERY=0
on_ac_power >/dev/null 2>&1 || ON_BATTERY=$?
if [ "$ON_BATTERY" -eq 1 ]; then
exit 0
fi
fi
for btrfs in $(grep btrfs /proc/mounts | awk '{ print $1 }' | sort -u)
do
#create a tmp log file with all the btrfs related info logged in syslog. Need to adjust because I cannot see anything in syslog
$NOW= $(date +"%F")
tmp_filename= $btrfs_$NOW_full
touch /tmp/$tmp_filename
#Start logging and the scrubs
echo "Starting scrub of $btrfs" > $tmp_filename
tail -n 0 -f /var/log/syslog | grep "BTRFS: " | grep -Ev '(disk space caching is enabled|unlinked .* orphans|turning on discard|device label .* devid .* transid|enabling SSD mode|BTRFS: has skinny extents|BTRFS: device label)' > /tmp/$tmp_filename&
/usr/bin/btrfs scrub start -Bd $btrfs
pkill -f 'tail -n 0 -f /var/log/syslog'
echo "Ended scrub of $btrfs" > $tmp_filename
#Create a file containing errors
err_filename= $btrfs_$NOW_errors
touch $HOME/$err_filename
cat $tmp_filename | grep -Po 'csum failed ino\S* \d+' | awk '{print $4}' | sort -u | xargs -n 1 find / -inum 2> $HOME/$err_filename
done
rm $lock
What do you think? any suggestions?
Offline
Your script looks fine, but I would reduce it for readability, to simply:
while read d m t x
do
[[ $t != "btrfs" ]] && continue
btrfs scrub start -Bd $m
dmesg | grep -o 'csum failed ino [0-9]*' | awk '{print $4}' | sort -u | xargs -n 1 find / -inum
done </proc/mounts
Also, if possible I would e-mail the results or find an alternative to notify yourself.
Last edited by Spider.007 (2014-08-07 11:17:49)
Offline
Thanks Spider.007
From your script I understand that scrub error will show in dmesg. But when I run a scurb manually it does not anything in dmesg. Is it because I have no error??
I am a shell script beginner and I don't understand how your script works, could you explain it a bit. In my script I catched the btrfs devices from proc/mount using grep and awk, but it seems you only use "read d m t x" ? How is it possible ? I think I don't understand the "injection" of </proc/mount in the loop...
I need to do more research.
For the mail it would be great I will investigate how it is possible.
Offline
Sending email is simple; have a look at https://wiki.archlinux.org/index.php/SSMTP for example. Btrfs-scrub does indeed log it's errors in the dmesg; and obviously they are only logged if an error occurs. You could also have a look at the boot-message (when mounting the drive initially). On my machine it outputs a summary per device:
kernel: BTRFS: bdev /dev/sdc2 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
kernel: BTRFS: bdev /dev/sdd2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
kernel: BTRFS: bdev /dev/sda2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Regarding the bash-magic; from `man read`:
The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Section 2.6.5, Field Splitting); the first field shall be assigned to the first variable var, the second field to the second variable var, and so on. If there are fewer fields than there are var operands, the remaining vars shall be set to empty strings.
If you are serious about learning shell scripting this is a great place to start: http://tldp.org/LDP/abs/html/
Last edited by Spider.007 (2014-08-08 09:21:30)
Offline
It is a single drive setup so I though raid1, even if possible (http://askubuntu.com/questions/406096/h … id-1-array) do not bring much benefits for the cost (lose half the space). Such a raid will only protect from bitrot when the filesystem is umounted which has a very low probability, I am correct?
Since several months, I have just such a setup (2TB hdd with 1TB+1TB in btrfs RAID1) which I have done precisely because of info in this link. And, by the way, apart from losing half the space, you also lose half the write speed (read speed remains unaffected (or actually slightly better)).
What I am curious about is where did you get the idea that such setup protects from data corruption only when filesystem is unmounted? Maybe I misunderstand something, but it makes no sense to me. Scrub runs on mounted partitions and it fixes errors when the filesystem is "online".
Last edited by Lockheed (2015-02-27 17:42:40)
Offline