[Gllug] Stories of using filters

Steve Cobrin steve.cobrin at highbury.net
Fri Mar 18 11:46:35 UTC 2005


Here's a script I use frequently, snarfed off the net, and tweaked a little.
Of course it would run far faster if recoded in Perl, but its quite a nice (sic)
example of lots of piping!
 [best viewed with fixed pitch font.]

 -- Steve

#!/bin/sh
# September 2000 * Padraig at Brady001.iol.ie
#
# Customised 9 Dec 2001 Steve Cobrin <cobrin at highbury.net>
#
# BUGS
#	Doesn't handle files with funny characters in filename, e.g backspace
#
CMD=`basename $0 .sh`
USAGE="usage: $CMD [path...]"
#
################################################################################
args=
size=1024c		# 1k bytes
while [ $# -gt 0 ]
do
	case $1 in
		-s | --size )
			if [ $# -gt 1 ]
			then
				shift
				size=$1
			else
				echo "$CMD: missing parameter" 1>&2
				echo "$USAGE" 1>&2
				exit 1
			fi
			;;
		-*)	echo "$CMD: unknown options \"$1\"" 1>&2
			echo "$USAGE" 1>&2
			exit 1
			;;
		*)	break
			;;
	esac
	shift
done
if [ $# -gt 0 ]
then
	args=$*
else
	args="."
fi
################################################################################
# find -- find all files bigger than $size, outputting "filename<nul>inode<nul><size>"
# tr   -- protect embedded tabs and spaces, then replace nulls with spaces so other commands can process output
# sort -- sort on size (largest first) then inode, ignore duplicate lines
# uniq -- remove duplicates with same inode and size
# cut  -- cut out all but filename part
# sort on filename
# tr   -- put back spaces and tabs, and replace newline with null
# generate m55sums
# sort on checksum
# protect spaces and tabs
# swap checksum and filename round
# only show duplicate entries of checksum
# switch back checksum and filename
# swap back spaces and tabs
find $args -xdev -size +$size -type f ! -type l -printf "%p\0%i\0%s\n"	\
  | tr ' \t\0' '\0\1 '                                          \
  | sort +2nr +1 -u                                             \
  | uniq -2 -D                                                  \
  | cut -f1 -d' '                                               \
  | sort                                                        \
  | tr '\0\1\n' ' \t\0'                                         \
  | xargs -0 md5sum                                             \
  | sort +0 -1                                                  \
  | tr ' \t' '\1\2'                                             \
  | sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/'                      \
  | uniq -D -1                                                  \
  | sed -e 's/\(^.*\) \(.*\)/\2 \1/'                            \
  | tr '\1\2' ' \t'                                             \
  | (
      psum='no match'
      line=''
      while read sum file; do
        if [ "$sum" != "$psum" ]; then
          if [ ! -z "$line" ]; then
             echo -e "$line"
          fi
          #line="`du -b "$file"`"
          line="`cat "$file" | wc -c`"
          psum="$sum"
        fi
        line="$line `echo $file | sed -e 's/ /\\\\ /g'`"
      done

      if [ ! -z "$line" ]; then
        echo -e "$line"
      fi
    )                                                           \
  | sed -e 's/^  *//'                                           \
  | sort +0 -1 -brn                                             \
  | cut -d" " -f2-                                              \
  | sed -e 's/\([^\\]\) /\1\
/g' \
  | while read files
    do : files=\"$files\"
      ls -l "$files"
    done

: END of script
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list