[clug-progsig] RegEx

John Jardine john_e_jardine at spamcop.net
Wed Aug 12 15:21:57 PDT 2009


Hi,

The theoretician in me likes this:
http://en.wikipedia.org/wiki/Telephone_number which states: 
> Most telephone networks today are interconnected in the international
> telephone network, where the format of telephone numbers is
> standardized by ITU-T in the recommendation E.164, which specifies
> that the entire number should be 15 digits or shorter, and begin with
> a country prefix. For most countries, this is followed by an area code
> or city code and the subscriber number, which might consist of the
> code for a particular telephone switch. ITU-T recommendation E.123
> describes how to represent an international telephone number in
> writing or print, starting with a plus sign ("+") and the country
> code. When calling an international number from a fixed line phone,
> the + must be replaced with the international call prefix chosen by
> the country the call is being made from. Some mobile phones allow the
> + to be entered directly.
Which doesn't actually solve your problem but will prevent your data
from becoming someone else's problem:)

The really fun bit is that properly formatted phone numbers come in
different lengths.

To solve your problem you can take a couple of tacks:
1) Single uber regex for any input format
2) nested 'if' or 'case' structure that simplifies your regex but only
ever matches once.
3) progressive set of 'if' conditions that evolve the phone number into
a common representation.

I favour (3) because you solve the problem in parts, for instance check
for a country code and if you find one then you can figure out which
numbering plan you're in and refine from there.  If there is no country
code then you *may* be able to reasonably assume one, if not you have to
report it as an error.

I googled around a bit and found this site which looked pretty
promising:
http://regexlib.com/DisplayPatterns.aspx?cattabindex=6&categoryId=7

Google Search:
http://www.google.ca/search?q=itu+phone+number
+format&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a

Cheers,
J.J.

On Wed, 2009-08-12 at 14:42 -0600, Shawn wrote:
> that would probably work in my specific case.  But there are 2 and 3 
> digit country codes.  I was hoping to make my routine as generic as 
> possible to cover the rare chance my customers may do work overseas...
> 
> For my immediate problem, I can take the "easy" way out where needed - 
> the phone numbers are just a text field...  But, I'd rather get 
> everything into a similar format wherever possible... :)
> 
> Shawn
> 
> Goran Poprzen wrote:
> > Did you try counting digits before applying your routine?
> > If there is 10 or 9 digits, chances are that there is no country code
> > at the beginning.
> > If there is 11 digits and first digit is 1, you have North America's
> > phone number.
> > 
> > just my little contribution.
> > Cheers,
> > Goran
> > 
> > On Wed, Aug 12, 2009 at 1:41 PM, Shawn<sgrover at open2space.com> wrote:
> >> Thanks John.  I'll try this out later today, but suspect it will show the
> >> same problem my other attempts have run into.
> >>
> >> Consider two phone numbers:
> >> 4031234567
> >> and 14031234567
> >>
> >> The core of the routine I found would split these up like this:
> >>
> >> - for the first number: [403] [123] [4567]
> >> - for the second number: [140] [312] [3456] [7]
> >>
> >> I've tried various things and end up with the second break down when ever a
> >> country code is included.  What is needed is identifying IF the country code
> >> exists and if so extract it, otherwise ignore country code and identify the
> >> remaining elements.
> >>
> >> Shawn
> >>
> >>
> >> John Jardine wrote:
> >>> I can give you a hand with this this afternoon, my morning is pretty
> >>> busy.  If you need it before then, try prefixing the original expression
> >>> with:
> >>> (+*\d+[-\w]*)*
> >>> English version:
> >>> Optional country code, composed of:
> >>> * optional plus sign (common in GSM type phone numbers)
> >>> * required digits
> >>> * optional white-space and hyphen(s)
> >>>
> >>> The above expression will tend to eat the entire phone number though.
> >>> More details on how this all hangs together here:
> >>> http://en.wikipedia.org/wiki/List_of_country_calling_codes
> >>>
> >>> The problem is that DB schemas have treated phone numbers (and addresses
> >>> too) in an ad-hoc manner.  There are actually telco (and postal)
> >>> recommendations for data formats that make this problem go away.
> >>>
> >>> The last thing to consider is if you're storing the target phone number
> >>> or the dial-string you need to get to the target phone.  These differ
> >>> based on where the call originates (withing target NPA or outside).
> >>>
> >>> Cheers,
> >>> J.J.
> >>>
> >>>
> >>> On Wed, 2009-08-12 at 03:31 -0600, Shawn wrote:
> >>>> I'm working on a php migration routine where I need to take a list of
> >>>> phone numbers, which may or may not be properly entered, then format them
> >>>> into a a common format.
> >>>>
> >>>> To do this I have found a routine
> >>>> (http://benramsey.com/archives/making-it-valid-telephone-numbers/) that use
> >>>> a regular expression to find the elements of the number.  The regular
> >>>> expression is:
> >>>>
> >>>> $pattern =
> >>>> '/^[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';
> >>>>
> >>>> In my case, if the number doesn't fit the usual patterns, I just return
> >>>> the original value.  (cleaning up the data while migrating, where possible)
> >>>>
> >>>> This works well thus far, unless the number starts with the country code
> >>>> (i.e. "1-403-111-2222").  In these cases I end up with a number formatted as
> >>>> "(140) 311-1222 x.2".  So I need to modify the regex to handle the country
> >>>> code.
> >>>>
> >>>> I've tried this:
> >>>> $pattern =
> >>>> '/^(1?)[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';
> >>>>
> >>>> (assuming I only ever see the country code of "1")  But this doesn't work
> >>>> well.
> >>>>
> >>>> Anyone more familiar with regular expressions have any suggestions?
> >>>>
> >>>> Thanks bunches.  I'll happily post my full format function once I get it
> >>>> working properly, should anyone need it...
> >>>>
> >>>> Shawn
> >>>>
> >>>> _______________________________________________
> >>>> clug-progsig mailing list
> >>>> clug-progsig at clug.ca
> >>>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
> >>>
> >>> _______________________________________________
> >>> clug-progsig mailing list
> >>> clug-progsig at clug.ca
> >>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
> >> _______________________________________________
> >> clug-progsig mailing list
> >> clug-progsig at clug.ca
> >> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
> >>
> > 
> > _______________________________________________
> > clug-progsig mailing list
> > clug-progsig at clug.ca
> > http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
> 
> _______________________________________________
> clug-progsig mailing list
> clug-progsig at clug.ca
> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca




More information about the clug-progsig mailing list