[Gllug] Slightly large httpd request log requires splitting up

Tue Jul 15 17:26:10 UTC 2008

william pink wrote:

> Hello,
> 
> I have had a bit of a Google but nothing came up as relevant, I have a
> Apache http request log file that is a whopping 17GB because it has not
> been rotated and compressed since it's creation. What I need to do is
> split it up into smaller chunks by date and compress it, Of course some
> sort of shell script would be the ideal solution but with only my basic
> knowledge of shell
> scripting this would take some considerable  time to write. Does anyone of
> know of any scripts or app out there which could do this for me?

Lines in my log file look like this:
        24.182.65.204 - - [01/Jun/2008:00:57:35 +0100] "GET / ....
So this would be sufficient:

for m in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec;
do
  grep "\[...$m.2008:" big_logfile | gzip -9 > ${m}_log.gz
done

I don't think it's a problem, but you might need to adjust the pattern to
make sure you don't match URLs.

To split by day as well:

for m in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec;
do
  grep "\[...$m.2008:" big_logfile > ${m}-temp
  for d in `seq -w 31`;
  do
    grep "\[$d.$m.2008:" ${m}-temp | gzip -9 > ${m}-${d}_log.gz
  done
  rm -f ${m}-temp
done

(Which leaves you with some pointless files, like Feb-31_log.gz)

-- 
Matt

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug