[Gllug] Apache log files
general_email at technicalbloke.com
general_email at technicalbloke.com
Wed Apr 8 03:53:48 UTC 2009
John Hearns wrote:
> 2009/4/7 william pink <will.pink at gmail.com>:
>
>> Hi,
>>
>> I have the rather horrible task of splitting up lots (40Gb's worth) of
>> Apache log files by date, the last time I did this I found the line number I
>> then tailed the file and outputted it into a new file which was a long
>> arduous task. I imagine this can be done in a few minutes with some
>> Regex/Sed/AwkBash trickery but I wouldn't know where to start can anyone
>> give me any pointers to get started?
>>
>
> I would think of Perl for this task - that's what the language is good at.
>
> However, I do sometimes have a problem of a certian application
> producing huge output files.
> I deal with this using the 'csplit' utility.
> Man csplit, and think hard about the regexp PATTERN which will match
> your date ranges.
>
This Python regex should match a line from an apache log file and split
it into it's constituents...
import re
import sys
rexp = re.compile(r'(\d{1,3}[.]\d{1,3}[.]\d{1,3}[.]\d{1,3}) .{0,20}
.{0,20} \[(\d{2}/.../\d{4}):(\d{2}:\d{2}:\d{2}) (.....)\] \"')
for line in open( sys.argv[1], 'r' ):
m = rexp.match( line.rstrip() )
if m:
print m.group(2), m.group(3), m.group(4)
Group 2 is date, 3 is the time, 4 is the timezone.
Maybe this could serve as a useful base for your filtering/splitting.
Regards,
Roger.
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list