[Solved] Tool for detecting duplicated content

xanb · 2013-10-21 14:03:15

Hi,

Is there any tool for detecting the maximum common content in one file.
For example:

line 1
line 2

line 1
line 3

line 1
line 2

is a line that has lines 1,2 duplicated by lines 7, 8. And line 1 duplicates line 4, 6.

uniq or sort assume you have an structured file. So these don't work for me.

Thanks in advance,
Xan

Last edited by xanb (2013-10-22 10:25:23)

karol · 2013-10-21 14:13:10

Are you saying that

sort <file> | uniq -cd

is not for you? Why?

Last edited by karol (2013-10-21 14:13:37)

WonderWoofy · 2013-10-21 16:35:40

karol wrote:

Are you saying that
sort <file> | uniq -cd
is not for you? Why?

Can't the same ting be achieved with

sort -u <file>

?

karol · 2013-10-21 16:55:22

WonderWoofy wrote:

karol wrote:
Are you saying that
sort <file> | uniq -cd
is not for you? Why?
Can't the same ting be achieved with
sort -u <file>
?

Ummm, not necessarily.

$ sort -u <file>

line 1
line 2
line 3
$ sort  <file> | uniq -cd
      2 
      3 line 1
      2 line 2

The latter tells you which lines (not which numbers, it prints the "contents" of the line in question) are duplicated, triplicated etc.

xanb · 2013-10-22 10:25:44

Thanks a lot, both of you. This is really what I want.

Arch Linux