Remove duplicates ignoring the file type

broken pipe · 2015-07-14 06:45:19

Hi all!

I would like to remove some duplicates from my music library. There a lots of .ogg and .mp3 with the same name ("song.ogg" and "song.mp3").
Is there any easy way to run some duplicate checks only on the file names?

Best regards!

parchd · 2015-07-14 13:55:33

What about the following:

find|grep -Po ".*(?=ogg$|mp3$)"|sort|uniq -d

run it in the top directory of your music library and it will list duplicates without the last letters of the file extension. They will only be listed if they are in the same directory.

find without any arguments lists files recursively.
in this instance, grep is looking for ogg and mp3 files and printing them without the extension
sort , well, sorts them - this bit is probably unnecessary
uniq is here printing anything that isn't unique (i.e. there is both an mp3 and an ogg of the same file)

It is then up to you what to do with the output

Note: I haven't really tested this, but it doesn't do anything dangerous.

Trilby · 2015-07-14 14:12:20

No need for find and grep:

find -regex '.*\.\(ogg\|wav\)' -exec bash -c 'F='{}' && echo ${F%.*}' \; | sort | uniq -d

On second thought, the find and grep is probably better. That finds all files and invokes grep once to filter them. My version finds the right subset of files, but invokes a shell for each one - this would be a waste. Another approach would be comm:

comm -12 <(find -name '*.ogg' | sed 's/.ogg$//') <(find -name '*.mp3 | sed 's/.mp3$//')

Arch Linux

#1 2015-07-14 06:45:19

Remove duplicates ignoring the file type

#2 2015-07-14 13:55:33

Re: Remove duplicates ignoring the file type

#3 2015-07-14 14:12:20

Re: Remove duplicates ignoring the file type

Board footer