[Phpwm] Cleaning up addresses

Phil Beynon phil at infolinkelectronics.co.uk
Mon Mar 19 15:10:14 GMT 2007


> We are getting addresses in from field users - in theory up to 3
> discreet address lines, followed by Town, County, Post Code etc. The
> address fields contain an amazing amount of rubbish - we had one with
> about 5 additional carriage returns between each line.
> Does anyone have a favourite thorough approach to cleaning this sort of
> stuff up?
> alan dunn
> --

I've had this one before;
Explode it into an array using carriage returns as the trigger and dump any
empty array elements, run each retained array element through a regular
expression to strip out any non alphas and non numerics.
Depending on just how nationwide you are then you might need to run the data
against an acronym lookup table - Saint / St type of things (I do have a
table of these available).
The other thing to watch for is your counties list - not so much
abbreviations, but obsolete counties - even the Royal Mail can't give a
definitive list as there are still a lot of people using ancient ones, RM
don't really make a lot of use of the county on an address because of this.

Welsh addresses are by far the worst! :-)

P




More information about the Phpwm mailing list