[Gllug] Apache log files

Tue Apr 7 10:57:05 UTC 2009

On Tue, Apr 7, 2009 at 11:11 AM, william pink <will.pink at gmail.com> wrote:

> I have the rather horrible task of splitting up lots (40Gb's worth) of
> Apache log files by date, the last time I did this I found the line number I
> then tailed the file and outputted it into a new file which was a long
> arduous task. I imagine this can be done in a few minutes with some
> Regex/Sed/AwkBash trickery but I wouldn't know where to start can anyone
> give me any pointers to get started?

	#!/bin/bash

	indatefmt="+%d/%b/%Y"
	outdatefmt="+%Y-%m-%d"

	start_date="mar 25"
	end_date=$(date "$indatefmt")

	count=0
	while true
	do
		indate=$(date "$indatefmt" -d "$start_date + $count days")
		outdate=$(date "$outdatefmt" -d "$start_date + $count days")

		fgrep "$indate" big_logfile > "small_logfile.$outdate"

		[ "$indate" = "$end_date" ] && break
		((count++))
	done

It's a bit inefficient, as it scans the log file multiple times,
but for comparatively small log files like you have, that shouldn't
be too arduous. It'll also pick up any entries that happen to have
the date format you're looking for in the URL, for example. To work
around either of those, using a scripting language like python or
perl to read and examine each line in turn is probably the right
solution. But the quick and dirty approach above will probably be
fine for you.

Then fix your setup so it logs to per-date files to start with...

Tet

-- 
The greatest shortcoming of the human race is our inability to
understand the exponential function -- Albert Bartlett
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug