[Gllug] Apache log files

Peter Corlett abuse at cabal.org.uk
Wed Apr 8 14:02:52 UTC 2009


On Wed, Apr 08, 2009 at 02:49:15PM +0100, John Hearns wrote:
[...]
> That's more like it. But I can still make out access.log - so I can figure
> out it does something with access logs. More compression! More confusion!

Nah, that's a terrible idea. The original, non-obfuscated version I slapped
together in a few minutes is this one, which should be more obvious:

#!/usr/bin/env perl
use warnings;
use strict;

my($fh, $curpath);

while(<>) {
  my($day, $month, $year) = (m~\[(..)/(...)/(....):~);
  die "Regex match failed" unless defined $1; # regex sanity check
  my $path = "$year-$month-$day.access.log";

  unless($fh && $curpath eq $path) {
    $curpath = $path;
    open $fh, '>', $path or die "Can't create $path: $!";
  }
  print $fh $_;

  # uncomment if you want progress reports
  #print STDERR "[$.] $curpath\r" unless $. % 1e4;
}

__END__

(And this is a proper, solid, efficient script, unlike some of the crap that
has been posted so far.)

Change the $path assignment and maybe the date-extracting regex to taste. I
ran it like so to test:

zcat -f apache2/*access* | ./logsplit.pl

... and it ripped through 80MB of test logs in three seconds. So 40GB will
probably take of the order of an hour. (The main constraint is going to be
disk performance.)

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list