You are not logged in.

#1 2015-07-14 06:45:19

broken pipe
Member
Registered: 2010-12-10
Posts: 238

Remove duplicates ignoring the file type

Hi all!

I would like to remove some duplicates from my music library. There a lots of .ogg and .mp3 with the same name ("song.ogg" and "song.mp3").
Is there any easy way to run some duplicate checks only on the file names?


Best regards!

Offline

#2 2015-07-14 13:55:33

parchd
Member
Registered: 2014-03-08
Posts: 421

Re: Remove duplicates ignoring the file type

What about the following:

find|grep -Po ".*(?=ogg$|mp3$)"|sort|uniq -d

run it in the top directory of your music library and it will list duplicates without the last letters of the file extension. They will only be listed if they are in the same directory.

find without any arguments lists files recursively.
in this instance, grep is looking for ogg and mp3 files and printing them without the extension
sort , well, sorts them - this bit is probably unnecessary
uniq is here printing anything that isn't unique (i.e. there is both an mp3 and an ogg of the same file)

It is then up to you what to do with the output wink

Note: I haven't really tested this, but it doesn't do anything dangerous.

Offline

#3 2015-07-14 14:12:20

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,554
Website

Re: Remove duplicates ignoring the file type

No need for find and grep:

find -regex '.*\.\(ogg\|wav\)' -exec bash -c 'F='{}' && echo ${F%.*}' \; | sort | uniq -d

On second thought, the find and grep is probably better.  That finds all files and invokes grep once to filter them.  My version finds the right subset of files, but invokes a shell for each one - this would be a waste.  Another approach would be comm:

comm -12 <(find -name '*.ogg' | sed 's/.ogg$//') <(find -name '*.mp3 | sed 's/.mp3$//')

"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

Board footer

Powered by FluxBB