[SWLUG] regexp and sed

Dave Cridland [Home] dave at cridland.net
Tue Dec 24 14:40:07 UTC 2002


On Tue, 24 Dec 2002 09:41:56 +0000
bascule <asura at theexcession.co.uk> wrote:
> .*\(\..*\)
> could be described as:
> a sequence of any charcters followed by one period followed by a sequence of 
> any characters - some of which may be periods,
> it looks to me like each filename can be split in three different ways and 
> still match this description, so - at last the question - where in the 
I think that means "A sequence of between zero and infinity characters, followed by a grouping of a period followed by a sequence of between zero and infinity characters."

Which is not the same.

> regexp, or in sed, is the logic that determines that the part:
> \(\..*\)
> only matches the last period and what follows and not any of the other periods 
> and what follows?

The initial .* is greedy, andwill "eat" as much of the string as possible, leaving the minimal amount for the group, which should then match only the extension, and the preceeding period.

FWIW, I would do:

for DWDI in *; do
	DWDMT=`stat -c '%y' $DWDI`
	# Pull out the modtime in whatever format stat likes.
	DWDFMT=`date --date "$DWDMT" +'%Y-%m-%d_%H.%M.%S'`
	# Turn it into our format, using date to do the hard work.
	DWDEXT=`echo $DWDI | sed -e 's/^.*\.\([^.]*\)$/\1/'`
	# Extract extension
	#  - use entire filename if there's no extension.
	mv $DWDI $DWDFMT.$DWDEXT
	# Actually do the rename.
done

Not because it's any better, but because it strikes me that I might understand it if I looked at it after 6 months.

The sed there is effectively looking for a string which ends with a period followed by some stuff which isn't periods. We then swap the string for the stuff which isn't periods, which is hopefully the extension. A file called, say "viruses", though, ends up being treated as if it had an extension of "viruses".

Dave.




More information about the Swlug mailing list