You are not logged in.

#1 2013-04-05 16:46:24

deepsoul
Member
From: Earth
Registered: 2012-12-23
Posts: 67
Website

diffn - Efficient comparison of multiple large files

This is a program I wrote to keep track of the versions and copies of large binary files I deal with at work.  It compares its argument files for pairwise equality.  It reads every file at most once, and only up to the point of the first difference with all other files.  Before opening a file at all, it compares directory entries for differences in size (inequality shortcut) and inode number (equality shortcut).  This makes it more efficient than the obvious alternatives, a double shell loop with diff -q or md5sum'ming all files and comparing digests.  Lists of equal files are output with configurable separators, which allows to adapt the format for reading by a script or program.  Two quiet modes exist that indicate by return value if all respectively any of the argument files are equal.

Its main use cases are:

  • Finding unnecessary duplicates of multimedia or other sizeable binary files

  • Making sure you have a backup of each such file somewhere

  • Finding modified source files when you have resorted to manual version control

diffn is in the AUR now, and you can read its manual page online.


Officer, I had to drive home - I was way too drunk to teleport!

Offline

#2 2013-04-05 17:54:29

Inxsible
Forum Fellow
From: Chicago
Registered: 2008-06-09
Posts: 9,183

Re: diffn - Efficient comparison of multiple large files

interesting.... I'll take a look over the weekend. It just so happens that I need to compare my music folders across 3 different HDDs


Forum Rules

There's no such thing as a stupid question, but there sure are a lot of inquisitive idiots !

Offline

Board footer

Powered by FluxBB