[sclug] Getting rid of duplicate files
Tim Sutton
tim at linfiniti.com
Wed Sep 27 11:29:59 UTC 2006
Hi All
I'm trying to free up space on my hard disk. In particular Im trying
to get rid of duplicate images that dont have matchine file names. Im
using md5 in a simple script (next examples done in bash on a mac but
need to do the same on linux). I created a list for all files under
my Pictures dir:
find Pictures/ -type f >> /tmp/pictures.txt
Then I ran this little script to build md5 checksums:
1 PICFILE=/tmp/pictures$$.txt
2 MD5FILE=/tmp/md5$$.txt
3 find ~/Pictures/ -type f >> ${PICFILE}
4 while read LINE
5 do
6 md5sum "${LINE}" >> ${MD5FILE}
7 done < $PICFILE
I ran the resulting file through sort, which results in a very big
file like this:
001ada07d60c6c3fd10cd5b17d3bdd69 /Users/timsutton/Pictures//
iPhoto Library/Originals/2006/ 100OLYMP_10/P1010020.JPG
001dec8a7571a375b659b5da8f292409 /Users/timsutton/Pictures//
iPhoto Library/Originals/2006/ NEW_FOLDER/P1010020.JPG
001dec8a7571a375b659b5da8f292409 /Users/timsutton/Pictures//
iPhoto Library/Originals/2006/ New Folder/p1010020.jpg
0024cdaf90e82c89df0e74089cada586 /Users/timsutton/Pictures//
iPhoto Library/Originals/2006/ 2004_03_07/114_1415.JPG
etc.
Now Im trying to think of a neat way to get rid of the duplicates. I
want to keep at least 1 of any given md5. Can anyone offer a tasty
bit of bash / awk / sed /grep etc that will do that? Or should I
just simply revert to a for loop with a buffer holding the last md5
and check if the current val is the same as the buffered val then
delete the file?
Thanks
Regards
--
Tim Sutton
Visit http://qgis.org for a great Open Source GIS
Home Page: http://linfiniti.com
Skype: timlinux
MSN: tim_bdworld at msn.com
Yahoo: tim_bdworld at yahoo.com
Jabber: timlinux
Irc: timlinux on #qgis at freenode.net
More information about the Sclug
mailing list