[Sussex] Parsing a Logfile with Perl....
Steve Dobson
steve at dobson.org
Wed Mar 12 11:30:13 UTC 2008
Richie
On Wed, 2008-03-12 at 10:32 +0000, Richie Jarvis wrote:
> I have written a little perl script to read a logfile, and parse certain
> values for matching lines into a csv file. It works great - until I
> tried it on one of our systems and discovered that a colleague had put a
> '-' character into one of the usernames I am parsing. After lots of
> cursing, I am stuck on this one, and wonder if anyone can see how to
> adjust my regex to span the situation where usernames with and without
> funny characters can be encompassed?
>
> Here is an example line from a well-formatted line:
>
> 2007-05-31 15:21:13 Sent SMS [SMSC:mbloxpsmsca] [SVC:fusion] [ACT:]
> [BINF:] [from:62569] [to:16474075000] [flags:-1:1:-1:-1:-1]
> [msg:100:01062F1F2DB69181923945413141363634383631323734414246333536363635343442423438464444353732303745433300030B6A00C54601C60001550187360603773700018707060354454D502D7B31363437343037353030307D0001873806034375]
> [udh:12:0B05040B8423F00003210401]
>
> Here is one from the badly-formatted line:
>
> 2008-03-07 05:09:54 Sent SMS [SMSC:mbloxpsmsca] [SVC:hpit-ems] [ACT:]
> [BINF:] [from:62569] [to:+16475882516] [flags:-1:0:-1:-1:-1]
> [msg:143://SS Please download mProveDM
> https://fusiondm-itg.houston.hp.com:443/fusiondl/EMA.cab?D=a5619ITGITG13A0E4B216685DBD31C50B0C9E6F91F2N8AB384A870]
> [udh:0:]
>
> My script spits out the following output for these:
>
> Good: 2007-05-31,15:21:12,fusion,62569,216.154.251.59,16474075000
> Bad: 2008-03-07,05:09:53,hpit,ems,62569> (15.243.169,to
>
> Currently, I have the rather ungainly regex as follows:
>
> $_ =~
> /^(\d+-\d+-\d+)\D+(\d+\D+\d+\D+\d+)\D+\w+\W+\w+\W+\w+\W+\w+\W+\w+\W+(\w+)\W+(\w+)\W+(\w+\W+\w+\W+\w+\W+\w+)\W+\w+\W+(\w+)\W+/;
>
> I am sure there is a better way to do this - i.e. search for the string
> [SVC: and gobble everything up to the ], but being a bit of a newbie to
> regex, I am googling wildly, and not getting much inspiration.
>
> Does anyone have any pointers?
I'm not a perl coder by any means so be warned that my knowledge of perl
regexp is zero.
Having said that I've used regexp for year and know a trick or two.
For the lines you've shown the easy way to "gobble all" characters until
to hit
a *unique* pattern just to prefix the pattern with the unlimited
wildcard matching
pattern '.' '*'. For example to match to the string "to:":
.*to:
Then to use an invert range to match to the delimiter. A range is given
inside
square brackets and if the first character is a '^' the range is
inverted. So to
match to the end ']' would be:
[^]]
Note: To match to a ']' in a range the ']' must be the first character
in the range.
(IIRC - it works in sed to I must have.)
Using sed rather than perl to I tested your data thus:
$ echo "2007-05-31 15:21:13 Sent SMS [SMSC:mbloxpsmsca] "\
> "[SVC:hpit-ems] [ACT:] [BINF:] [from:62569] "\
> "[to:+16474075000] [flags:-1:1:-1:-1:-1] "\
> "[msg:100:01062F1F2DB691819239] "\
> "[udh:12:0B05040B8423F00003210401]" | \
> sed -e 's/\([0-9\-]*\) \([0-9:]*\).*SVC:'\
> '\([^]]*\).*from:\([^]]*\).*to:'\
> '\([^]]*\).*/\1,\2,\3,\4,\5/'
2007-05-31,15:21:13,hpit-ems,62569,+16474075000
I assume that the IP addr is embedded in the data somewhere and you work
that
out in a part of your perl script that you didn't publish.
Note: In sed that the sequence '\' '(' <pattern> '\' ')' stores the text
that is
matched by <pattern> into a number buffer that can be extracted by the
sequence
'\' <buf-no>, where <buf-no> is 0, 1, 2, 3, ...
Hope this helps. If not you know my number so call.
Steve
--
Steve Dobson
Wait ... is this a FUN THING or the END of LIFE in Petticoat Junction??
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mailman.lug.org.uk/pipermail/sussex/attachments/20080312/2dab31cf/attachment.pgp
More information about the Sussex
mailing list