[Gllug] Apache log files

Wed Apr 8 13:05:38 UTC 2009

On Tue, 2009-04-07 at 11:11 +0100, william pink wrote:
> Hi,
> 
> I have the rather horrible task of splitting up lots (40Gb's worth) of
> Apache log files by date, the last time I did this I found the line
> number I then tailed the file and outputted it into a new file which
> was a long arduous task. I imagine this can be done in a few minutes
> with some Regex/Sed/AwkBash trickery but I wouldn't know where to
> start can anyone give me any pointers to get started?
> 
> Thanks,
> Will

Hi

time for i in {jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec} ; do
zcat access.*.gz | grep -i $i\/2008 | sort -k 4.9b,4.12bn -k 4.5b,4.7bM
-k 4.2b,4.3bn -k 4.14b,4.15bn -k 4.17b,4.18bn -k 4.20b,4.21bn  >
${i}_split.txt; done

This bash oneliner works for me please note the hardcoded 2008 in the
grep statement. It also splits the data by month and produces a bunch of
files based on  month_split.txt. Also note the zcat statement this has
to produce a concat of your input files.  Not sure if this will be able
to handle 40 gigs worth of data though

Tim  

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug