[sclug] Getting rid of duplicate files

Sean Furey sean-lists-sclug at furey.me.uk
Wed Sep 27 12:10:48 UTC 2006


( cat md5.txt | sort | awk '{ print $2" "$1 }' | uniq -f 1 -d ;
  cat md5.txt | sort | awk '{ print $2" "$1 }' | uniq -f 1 -D ) |
sort | uniq -u

Shoot me now...

Sean

On Wed, Sep 27, 2006 at 08:29:48AM -0300, Tim Sutton wrote:
> Hi All
> 
> I'm trying to free up space on my hard disk. In particular Im trying  
> to get rid of duplicate images that dont have matchine file names. Im  
> using md5 in a simple script (next examples done in bash on a mac but  
> need to  do the same on linux). I created a list  for all files under  
> my Pictures dir:
> 
> find Pictures/ -type f >> /tmp/pictures.txt
> 
> Then I ran this little script to build md5 checksums:
> 
>       1 PICFILE=/tmp/pictures$$.txt
>       2 MD5FILE=/tmp/md5$$.txt
>       3 find ~/Pictures/ -type f >> ${PICFILE}
>       4 while read LINE
>       5 do
>       6   md5sum "${LINE}" >> ${MD5FILE}
>       7 done < $PICFILE
> 
> I ran the resulting file through sort, which results in a very big  
> file like this:
> 
>      001ada07d60c6c3fd10cd5b17d3bdd69 /Users/timsutton/Pictures// 
> iPhoto Library/Originals/2006/        100OLYMP_10/P1010020.JPG
>      001dec8a7571a375b659b5da8f292409 /Users/timsutton/Pictures// 
> iPhoto Library/Originals/2006/        NEW_FOLDER/P1010020.JPG
>      001dec8a7571a375b659b5da8f292409 /Users/timsutton/Pictures// 
> iPhoto Library/Originals/2006/        New Folder/p1010020.jpg
>      0024cdaf90e82c89df0e74089cada586 /Users/timsutton/Pictures// 
> iPhoto Library/Originals/2006/        2004_03_07/114_1415.JPG
>      etc.
> Now Im trying to think of a neat way to get rid of the duplicates. I  
> want to keep at least 1 of any given md5. Can anyone offer a tasty  
> bit of  bash / awk / sed /grep etc that will do that? Or should I  
> just simply revert to a for loop with a buffer holding the last md5  
> and check if the current val is the same as the buffered val then  
> delete the file?
> 
> Thanks
> 
> Regards


More information about the Sclug mailing list