You are not logged in.

#1 2013-10-21 14:03:15

xanb
Member
Registered: 2012-07-24
Posts: 418

[Solved] Tool for detecting duplicated content

Hi,

Is there any tool for detecting the maximum common content in one file.
For example:

line 1
line 2

line 1
line 3

line 1
line 2

is a line that has lines 1,2 duplicated by lines 7, 8. And line 1 duplicates line 4, 6.

uniq or sort assume you have an structured file. So these don't work for me.

Thanks in advance,
Xan

Last edited by xanb (2013-10-22 10:25:23)


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#2 2013-10-21 14:13:10

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Tool for detecting duplicated content

Are you saying that

sort <file> | uniq -cd

is not for you? Why?

Last edited by karol (2013-10-21 14:13:37)

Offline

#3 2013-10-21 16:35:40

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: [Solved] Tool for detecting duplicated content

karol wrote:

Are you saying that

sort <file> | uniq -cd

is not for you? Why?

Can't the same ting be achieved with

sort -u <file>

?

Offline

#4 2013-10-21 16:55:22

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: [Solved] Tool for detecting duplicated content

WonderWoofy wrote:
karol wrote:

Are you saying that

sort <file> | uniq -cd

is not for you? Why?

Can't the same ting be achieved with

sort -u <file>

?

Ummm, not necessarily.

$ sort -u <file>

line 1
line 2
line 3
$ sort  <file> | uniq -cd
      2 
      3 line 1
      2 line 2

The latter tells you which lines (not which numbers, it prints the "contents" of the line in question) are duplicated, triplicated etc.

Offline

#5 2013-10-22 10:25:44

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Solved] Tool for detecting duplicated content

Thanks a lot, both of you. This is really what I want.


Owning one OpenRC (artoo way) and other three systemd machines

Offline

Board footer

Powered by FluxBB