[sclug] Getting rid of duplicate files
Sean Furey
sean-lists-sclug at furey.me.uk
Wed Sep 27 12:10:48 UTC 2006
( cat md5.txt | sort | awk '{ print $2" "$1 }' | uniq -f 1 -d ;
cat md5.txt | sort | awk '{ print $2" "$1 }' | uniq -f 1 -D ) |
sort | uniq -u
Shoot me now...
Sean
On Wed, Sep 27, 2006 at 08:29:48AM -0300, Tim Sutton wrote:
> Hi All
>
> I'm trying to free up space on my hard disk. In particular Im trying
> to get rid of duplicate images that dont have matchine file names. Im
> using md5 in a simple script (next examples done in bash on a mac but
> need to do the same on linux). I created a list for all files under
> my Pictures dir:
>
> find Pictures/ -type f >> /tmp/pictures.txt
>
> Then I ran this little script to build md5 checksums:
>
> 1 PICFILE=/tmp/pictures$$.txt
> 2 MD5FILE=/tmp/md5$$.txt
> 3 find ~/Pictures/ -type f >> ${PICFILE}
> 4 while read LINE
> 5 do
> 6 md5sum "${LINE}" >> ${MD5FILE}
> 7 done < $PICFILE
>
> I ran the resulting file through sort, which results in a very big
> file like this:
>
> 001ada07d60c6c3fd10cd5b17d3bdd69 /Users/timsutton/Pictures//
> iPhoto Library/Originals/2006/ 100OLYMP_10/P1010020.JPG
> 001dec8a7571a375b659b5da8f292409 /Users/timsutton/Pictures//
> iPhoto Library/Originals/2006/ NEW_FOLDER/P1010020.JPG
> 001dec8a7571a375b659b5da8f292409 /Users/timsutton/Pictures//
> iPhoto Library/Originals/2006/ New Folder/p1010020.jpg
> 0024cdaf90e82c89df0e74089cada586 /Users/timsutton/Pictures//
> iPhoto Library/Originals/2006/ 2004_03_07/114_1415.JPG
> etc.
> Now Im trying to think of a neat way to get rid of the duplicates. I
> want to keep at least 1 of any given md5. Can anyone offer a tasty
> bit of bash / awk / sed /grep etc that will do that? Or should I
> just simply revert to a for loop with a buffer holding the last md5
> and check if the current val is the same as the buffered val then
> delete the file?
>
> Thanks
>
> Regards
More information about the Sclug
mailing list