You are not logged in.

#1 2012-10-20 17:15:39

fooacad
Member
Registered: 2012-10-20
Posts: 5

rdupfind - a randomized fast duplicate-file finder

Hello all,

     This is my first post here. Greetings!

I have posted a small utility here: https://bitbucket.org/jyothisv/rdupfind

This is duplicate file finder which first checks against filesize so that only files of the same size are ever compared. If the file we are considering already has a conflicting file of the same size, we basically select a sequence of random blocks of each file, compute their hashsums (one hashsum for the random blocks of one file) and compare. We repeat this process until we either find a difference or exceed the predefined number of trials (--ntrials option). If things are all matching in all the trials, we have a potential hit and we verify it by computing the full hash and comparing (unless the user specifies --noverify which disables this last check) and then output the duplicates.
If on the way we found a difference in hash values, we record the differing hash and file at the appropriate place and move on to the next file.

I hope my explanation above is clear enough. Please shoot any questions, comments and criticisms.

Some usage examples are given on the project page.

You can get the AUR package here: https://aur.archlinux.org/packages.php?ID=63793

Last edited by fooacad (2012-10-21 06:06:47)

Offline

#2 2012-10-20 17:20:21

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: rdupfind - a randomized fast duplicate-file finder

The link you posted gives me a warning "You're about to log in with the username jyothisv but the website doesn't require authentication. This may be an attempt to trick you."
https://bitbucket.org/jyothisv/rdupfind works better :-)

Last edited by karol (2012-10-20 17:20:38)

Offline

#3 2012-10-20 17:24:59

fooacad
Member
Registered: 2012-10-20
Posts: 5

Re: rdupfind - a randomized fast duplicate-file finder

karol wrote:

The link you posted gives me a warning "You're about to log in with the username jyothisv but the website doesn't require authentication. This may be an attempt to trick you."
https://bitbucket.org/jyothisv/rdupfind works better :-)

Sorry!
Thanks karol! I have changed it in the original post.

Offline

#4 2012-10-20 20:02:46

dolik.rce
Member
From: Czech republic
Registered: 2011-05-04
Posts: 43

Re: rdupfind - a randomized fast duplicate-file finder

Nice tool, I think I will start using it regularly... I never noticed how many duplicate files I have in my /home smile

One nice feature to add would be option to ignore empty files, they often serve as locks or timestamps and pollute the output of the script because they are always the same wink

Are you planning to write a PKGBUILD and make rdupfind available through AUR?

Offline

#5 2012-10-21 02:08:53

fooacad
Member
Registered: 2012-10-20
Posts: 5

Re: rdupfind - a randomized fast duplicate-file finder

dolik.rce wrote:

Nice tool, I think I will start using it regularly... I never noticed how many duplicate files I have in my /home smile

One nice feature to add would be option to ignore empty files, they often serve as locks or timestamps and pollute the output of the script because they are always the same wink

Are you planning to write a PKGBUILD and make rdupfind available through AUR?

Thanks for the comment!
Now it ignores empty files. I'll add command line options to restrict the size to a specific range as soon as possible.
About AUR, I might do it sometime later in the week.

Offline

#6 2012-10-21 06:03:12

fooacad
Member
Registered: 2012-10-20
Posts: 5

Re: rdupfind - a randomized fast duplicate-file finder

Now it supports the option -z (or --size) for restricting the filesize range.
For example,

 
rdupfind -z +100M # consider only files >= 100MB.
rdupfind -z 100K #consider only files which are exactly 100K (= 100*1024 bytes) in size
rdupfind -z _1G #consider only files which are at most 1GB in size (note the _ rather than -)

The suffixes can be lower case also.

I have uploaded the package to AUR here: https://aur.archlinux.org/packages.php?ID=63793

Offline

Board footer

Powered by FluxBB