[sclug] Getting rid of duplicate files
Roland Turner SCLUG
raz.fpyht.bet.hx at raz.cx
Wed Sep 27 13:02:13 UTC 2006
On Wed, 2006-09-27 at 13:10 +0100, Sean Furey wrote:
> ( cat md5.txt | sort | awk '{ print $2" "$1 }' | uniq -f 1 -d ;
> cat md5.txt | sort | awk '{ print $2" "$1 }' | uniq -f 1 -D ) |
> sort | uniq -u
That uniq -d/-D/-u trick is revolting; I love it!
There's some ugliness ("cat foo | ...") and some redundancy (you need
not run awk twice and "sort | uniq -u" is equivalent to "sort -u") so,
assuming bash:
awk '{ print $2" "$1 }' md5.txt
| sort | tee >(uniq -f 1 -d) >(uniq -f 1 -D) >/dev/null
| sort -u
(The particular sequence of operations in pipeline construction means
that the two instances of uniq will both write to the pipe which has
"sort -u" listening on it, while it is tee's unprocessed output stream
that will be dumped to /dev/null.)
Unfortunately that awk command won't cope correctly paths with spaces in
them though. e.g. "iPhoto Library". Also, the uniq trick is just a
little too revolting for my taste. If having a list of files to keep
(move them to safety and delete what remains) is OK, then how about the
somewhat simpler:
uniq -w 32 md5.txt
- Raz
More information about the Sclug
mailing list