[Sussex] Parsing a Logfile with Perl....

Richie Jarvis richie at helkit.com
Wed Mar 12 10:32:19 UTC 2008


Hi All,

I have written a little perl script to read a logfile, and parse certain 
values for matching lines into a csv file.  It works great - until I 
tried it on one of our systems and discovered that a colleague had put a 
'-' character into one of the usernames I am parsing.  After lots of 
cursing, I am stuck on this one, and wonder if anyone can see how to 
adjust my regex to span the situation where usernames with and without 
funny characters can be encompassed?

Here is an example line from a well-formatted line:

2007-05-31 15:21:13 Sent SMS [SMSC:mbloxpsmsca] [SVC:fusion] [ACT:] 
[BINF:] [from:62569] [to:16474075000] [flags:-1:1:-1:-1:-1] 
[msg:100:01062F1F2DB69181923945413141363634383631323734414246333536363635343442423438464444353732303745433300030B6A00C54601C60001550187360603773700018707060354454D502D7B31363437343037353030307D0001873806034375] 
[udh:12:0B05040B8423F00003210401]

Here is one from the badly-formatted line:

2008-03-07 05:09:54 Sent SMS [SMSC:mbloxpsmsca] [SVC:hpit-ems] [ACT:] 
[BINF:] [from:62569] [to:+16475882516] [flags:-1:0:-1:-1:-1] 
[msg:143://SS Please download mProveDM 
https://fusiondm-itg.houston.hp.com:443/fusiondl/EMA.cab?D=a5619ITGITG13A0E4B216685DBD31C50B0C9E6F91F2N8AB384A870] 
[udh:0:]

My script spits out the following output for these:

Good: 2007-05-31,15:21:12,fusion,62569,216.154.251.59,16474075000
Bad: 2008-03-07,05:09:53,hpit,ems,62569> (15.243.169,to

Currently, I have the rather ungainly regex as follows:

$_ =~ 
/^(\d+-\d+-\d+)\D+(\d+\D+\d+\D+\d+)\D+\w+\W+\w+\W+\w+\W+\w+\W+\w+\W+(\w+)\W+(\w+)\W+(\w+\W+\w+\W+\w+\W+\w+)\W+\w+\W+(\w+)\W+/;

I am sure there is a better way to do this - i.e. search for the string 
[SVC: and gobble everything up to the ], but being a bit of a newbie to 
regex, I am googling wildly, and not getting much inspiration. 

Does anyone have any pointers?

Thanks in advance,

Richie




More information about the Sussex mailing list