You are not logged in.

#1 2010-01-13 18:53:50

IgnorantGuru
Member
Registered: 2009-11-09
Posts: 640
Website

rmdupe script - removes duplicates

This script has been working well for me so I thought I'd share it.  It is based on the rm interface and only uses standard linux commands.  Includes a simulation mode, reference-only folders, a trash mode, size limits, and a custom rm command ability.  You can use this to remove duplicates from a group of folders or just search for them.  Also it does a full byte-for-byte comparison, not a checksum (to avoid false matches).

There are other solutions for this, I realize...  I just wanted to write my own that was command-line only and had the features I wanted.  You can read about it and download it here...  comments welcome

For programmers, you may notice it will unnecessarily compare two files twice, so its not as efficient as it could be.  Just needs a little code to make it smarter in that department, but the good news is if files change while its running, it will work on the new files, not a cached version.  At some point I may improve the code, but it seems stable enough that I'm sharing it.  Use the simulation mode first if you're concerned.

Usage: rmdupe [OPTIONS] FOLDER [...]
Removes duplicate files in specified folders.  By default, newest duplicates
 are removed.
Options:
-R, -r              search specified folders recursively
--ref FOLDER        also search FOLDER recursively for copies but don't
                    remove any files from here (multiple --ref allowed)
                    Note: files may be removed from a ref folder if that
                    folder is also a specified folder
--trash FOLDER      copy duplicate files to FOLDER instead of removing
--sim               simulate and report duplicates only - no removal
--quiet             minimize output (disabled if used with --sim)
--verbose           detailed output
--old               remove oldest duplicates instead of newest
--minsize SIZE      limit search to duplicate files SIZE MB and larger
--maxsize SIZE      limit search to duplicate files SIZE MB and smaller
--rmcmd "RMCMD"     execute RMCMD instead of rm to remove copies
                    (may contain arguments, eg: "srm -ll")
--xdev              don't descend to other filesystems when recursing
                    specified or ref folders
Notes: do not use wildcards; symlinks are not followed except on the
       command line; zero-length files are ignored

Offline

Board footer

Powered by FluxBB