[Gllug] Finding similarities among text files

Richard Jones rich at annexia.org
Tue Jul 17 11:37:25 UTC 2007


On Tue, Jul 17, 2007 at 10:57:30AM +0100, Ziya Suzen wrote:
> I am looking for a utility that does opposite of what diff can do and
> find similarities between two files (or even better find similarities
> among bunch of files).
> 
> I have a little project in my hand with a lot of copy paste has been
> done. I am going to pull common pieces of code into common files. I
> would like to have a kick start with a quick analysis to show me (at
> least with some heuristics) the common pieces of code (or text, it
> does not have to be so clever and do a semantic comparison).
> 
> Please, let me know if you heard of anything like this.

Two things spring to mind..  Firstly git contains code to do this.
Secondly there are various tools out there to measure unauthorized
copying of code and essay plagiarism.  Some of them work by measuring
the distribution of words, so they won't be very useful, but this tool
works differently: http://www.catb.org/~esr/comparator/comparator.html

Rich.

-- 
Richard Jones
Red Hat
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list