[Gllug] Finding similarities among text files
Richard Jones
rich at annexia.org
Tue Jul 17 11:37:25 UTC 2007
On Tue, Jul 17, 2007 at 10:57:30AM +0100, Ziya Suzen wrote:
> I am looking for a utility that does opposite of what diff can do and
> find similarities between two files (or even better find similarities
> among bunch of files).
>
> I have a little project in my hand with a lot of copy paste has been
> done. I am going to pull common pieces of code into common files. I
> would like to have a kick start with a quick analysis to show me (at
> least with some heuristics) the common pieces of code (or text, it
> does not have to be so clever and do a semantic comparison).
>
> Please, let me know if you heard of anything like this.
Two things spring to mind.. Firstly git contains code to do this.
Secondly there are various tools out there to measure unauthorized
copying of code and essay plagiarism. Some of them work by measuring
the distribution of words, so they won't be very useful, but this tool
works differently: http://www.catb.org/~esr/comparator/comparator.html
Rich.
--
Richard Jones
Red Hat
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list