You are not logged in.

#1 2009-11-14 05:18:01

simongh
Member
Registered: 2009-10-01
Posts: 4

Help! Recursive directory comparison.

Not really a programming question, I suppose, but anyway:

I want to recursively compare two directory trees (who differ in structure), and list any duplicate files. I've been playing around with diff but can't make it handle differing tree structures. I'm kind of a newbie at this, though, and it's five o'clock in the morning. tongue

For further clarification: the missus has a HDD with files (pictures, movies, music) sorted by some scheme not even she herself can fathom (though she'd never admit it).
I've made it my mission to backup and sort all her stuff. Now, some of the files on her drive are already on mine, but sorted (in a sane fashion), which is why I want to be able to find duplicates of any given filetype or pattern.

I hope that was clear enough. big_smile

Offline

#2 2009-11-14 05:20:04

Peasantoid
Member
Registered: 2009-04-26
Posts: 928
Website

Re: Help! Recursive directory comparison.

I'm too tired to think about this too hard, but I can definitely see find(1) in your future.

Offline

#3 2009-11-14 05:39:35

mikesd
Member
From: Australia
Registered: 2008-02-01
Posts: 788
Website

Re: Help! Recursive directory comparison.

By duplicate files do you mean the files have the same name or the files have identical contents and may or may not have the same name?

From your post it sounds like you just want to compare based on filenames. find will be of help in both cases however if you need to find duplicate files based on contents it will help to generate a hash of all the files and then compare the hashes for duplicates.

Last edited by mikesd (2009-11-14 05:40:42)

Offline

#4 2009-11-14 05:45:03

simongh
Member
Registered: 2009-10-01
Posts: 4

Re: Help! Recursive directory comparison.

Yeah, comparison based on filename is what I'm after. find is exactly what i needed. Maybe I should have tackled this task when fully awake... might not have had to bother you with it. big_smile

Thanks for pointing me in the right direction, though!

Offline

#5 2009-11-14 05:48:26

skottish
Forum Fellow
From: Here
Registered: 2006-06-16
Posts: 7,942

Re: Help! Recursive directory comparison.

Welcome to the forums.

If you feel like it, report back to this thread with your final solution. It may help someone with a similar problem. By the way, I believe that diff can recursively compare trees. I've never tried it, so I can't elaborate.

Offline

#6 2009-11-14 05:51:03

brisbin33
Member
From: boston, ma
Registered: 2008-07-24
Posts: 1,796
Website

Re: Help! Recursive directory comparison.

#!/bin/bash

his_directory="/mnt/mine"
her_directory="/mnt/yours"

find "$his_directory" -type f -exec basename {} \; | while read file; do
  result="$(find "$her_directory" -name "$file")"

  [ -n "$result" ] && echo $result found in hers and yours

done

note: untested

Offline

#7 2009-11-14 05:51:05

Pox
Member
From: Melbourne, AU
Registered: 2007-10-04
Posts: 66

Re: Help! Recursive directory comparison.

Looks like you've already worked it out, but here's a solution in ruby (doing set intersection in bash is a bit of a pain):

#!/usr/bin/env ruby

def filenames(dir) # return basenames of all files in tree
    Dir["#{dir}/**/*"].map{|f| f.gsub(/^.*\//,'')}.sort
end

dir1, dir2 = ARGV[0..1]

puts (filenames(dir1) & filenames(dir2)).join("\n")

Offline

#8 2009-11-14 05:58:19

brisbin33
Member
From: boston, ma
Registered: 2008-07-24
Posts: 1,796
Website

Re: Help! Recursive directory comparison.

i do believe we just wrote the same thing in two languages.

Offline

#9 2009-11-14 06:06:15

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: Help! Recursive directory comparison.

If the purpose of this is to avoid backing up duplicate files, why do you only want to compare by file name? It seems that "fdupes" might be more appropriate for this.

*avoids temptation to rewrite previous script in Perl and Python*


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#10 2009-11-14 06:22:33

simongh
Member
Registered: 2009-10-01
Posts: 4

Re: Help! Recursive directory comparison.

fdupes looks interesting.. I'll probably end up using it as a part of the solution. Thanks for the heads up. Filename comparison should be sufficient though, as none of the file names have been altered, to my knowledge.
Thank you brisbin and pox for the examples, they might come in handy.
As for diff, it doesn't seem to go deeper in the trees unless the structure of both entries is identical... 'least that's what it seemed like to me.

I'll post what I came up with later. As for now, back to bed again... working the nightshift seriously messes up my sleep smile

Last edited by simongh (2009-11-14 06:23:02)

Offline

#11 2009-11-14 07:27:18

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: Help! Recursive directory comparison.

simongh wrote:

working the nightshift seriously messes up my sleep smile

What's sleep and how much does it cost?


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#12 2009-11-14 10:49:26

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: Help! Recursive directory comparison.

brisbin33 wrote:

find "$his_directory" -type f -exec basename {} \;

find has -printf, which in this case is useful if you want to add a check for e.g. filesize ( -printf '%f\t%s\n' and when looking in dir 2: IFS=$'\t' ... while read file size ... find ... -size ${size}c )

Offline

#13 2009-11-14 23:19:41

simongh
Member
Registered: 2009-10-01
Posts: 4

Re: Help! Recursive directory comparison.

In the end, a combination of `find . -name *.ext -exec basename {} \;`, `fdupes -rd` and `diff` did the job, coupled with a pipe or two. Basically, using find and diff to determine what files were on both drives, putting that in a file, and then make a loop to go through said file and remove (find -exec rm) all dupes. Finally, `fdupes` was used to clear all the dupes the missus already had of her own files on her extremely well sorted drive. Was planning on writing a bashscript to automate the process a bit, but it seemed like a waste of time.

Thanks all of you the tips and hints.

Offline

Board footer

Powered by FluxBB