[clug-progsig] RegEx
Shawn
sgrover at open2space.com
Wed Aug 12 20:05:40 PDT 2009
The problem is I'm dealing with an existing database that is being
migrated into a new system. So, I need to take the old data as is -
both good and bad - and clean it up as much as possible.
One of the regex's on the site John linked to (regexlib.com) looks
promising, I'll be trying it out later tonight.
Thanks for the thoughts/efforts though. :)
Shawn
Goran Poprzen wrote:
> I'll try another shot/approach.
> Did you think about having two input fields for the phone number?
> First field for a country code (optional) and an area code, and second
> field for the phone number. That way user is forced to do some work
> for you.
>
> Goran
>
> On Wed, Aug 12, 2009 at 2:42 PM, Shawn<sgrover at open2space.com> wrote:
>> that would probably work in my specific case. But there are 2 and 3 digit
>> country codes. I was hoping to make my routine as generic as possible to
>> cover the rare chance my customers may do work overseas...
>>
>> For my immediate problem, I can take the "easy" way out where needed - the
>> phone numbers are just a text field... But, I'd rather get everything into
>> a similar format wherever possible... :)
>>
>> Shawn
>>
>> Goran Poprzen wrote:
>>> Did you try counting digits before applying your routine?
>>> If there is 10 or 9 digits, chances are that there is no country code
>>> at the beginning.
>>> If there is 11 digits and first digit is 1, you have North America's
>>> phone number.
>>>
>>> just my little contribution.
>>> Cheers,
>>> Goran
>>>
>>> On Wed, Aug 12, 2009 at 1:41 PM, Shawn<sgrover at open2space.com> wrote:
>>>> Thanks John. I'll try this out later today, but suspect it will show the
>>>> same problem my other attempts have run into.
>>>>
>>>> Consider two phone numbers:
>>>> 4031234567
>>>> and 14031234567
>>>>
>>>> The core of the routine I found would split these up like this:
>>>>
>>>> - for the first number: [403] [123] [4567]
>>>> - for the second number: [140] [312] [3456] [7]
>>>>
>>>> I've tried various things and end up with the second break down when ever
>>>> a
>>>> country code is included. What is needed is identifying IF the country
>>>> code
>>>> exists and if so extract it, otherwise ignore country code and identify
>>>> the
>>>> remaining elements.
>>>>
>>>> Shawn
>>>>
>>>>
>>>> John Jardine wrote:
>>>>> I can give you a hand with this this afternoon, my morning is pretty
>>>>> busy. If you need it before then, try prefixing the original expression
>>>>> with:
>>>>> (+*\d+[-\w]*)*
>>>>> English version:
>>>>> Optional country code, composed of:
>>>>> * optional plus sign (common in GSM type phone numbers)
>>>>> * required digits
>>>>> * optional white-space and hyphen(s)
>>>>>
>>>>> The above expression will tend to eat the entire phone number though.
>>>>> More details on how this all hangs together here:
>>>>> http://en.wikipedia.org/wiki/List_of_country_calling_codes
>>>>>
>>>>> The problem is that DB schemas have treated phone numbers (and addresses
>>>>> too) in an ad-hoc manner. There are actually telco (and postal)
>>>>> recommendations for data formats that make this problem go away.
>>>>>
>>>>> The last thing to consider is if you're storing the target phone number
>>>>> or the dial-string you need to get to the target phone. These differ
>>>>> based on where the call originates (withing target NPA or outside).
>>>>>
>>>>> Cheers,
>>>>> J.J.
>>>>>
>>>>>
>>>>> On Wed, 2009-08-12 at 03:31 -0600, Shawn wrote:
>>>>>> I'm working on a php migration routine where I need to take a list of
>>>>>> phone numbers, which may or may not be properly entered, then format
>>>>>> them
>>>>>> into a a common format.
>>>>>>
>>>>>> To do this I have found a routine
>>>>>> (http://benramsey.com/archives/making-it-valid-telephone-numbers/) that
>>>>>> use
>>>>>> a regular expression to find the elements of the number. The regular
>>>>>> expression is:
>>>>>>
>>>>>> $pattern =
>>>>>>
>>>>>> '/^[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';
>>>>>>
>>>>>> In my case, if the number doesn't fit the usual patterns, I just return
>>>>>> the original value. (cleaning up the data while migrating, where
>>>>>> possible)
>>>>>>
>>>>>> This works well thus far, unless the number starts with the country
>>>>>> code
>>>>>> (i.e. "1-403-111-2222"). In these cases I end up with a number
>>>>>> formatted as
>>>>>> "(140) 311-1222 x.2". So I need to modify the regex to handle the
>>>>>> country
>>>>>> code.
>>>>>>
>>>>>> I've tried this:
>>>>>> $pattern =
>>>>>>
>>>>>> '/^(1?)[\(]?(\d{0,3})[\)]?[\s]?[\-]?(\d{3})[\s]?[\-]?(\d{4})[\s]?[x]?(\d*)$/';
>>>>>>
>>>>>> (assuming I only ever see the country code of "1") But this doesn't
>>>>>> work
>>>>>> well.
>>>>>>
>>>>>> Anyone more familiar with regular expressions have any suggestions?
>>>>>>
>>>>>> Thanks bunches. I'll happily post my full format function once I get
>>>>>> it
>>>>>> working properly, should anyone need it...
>>>>>>
>>>>>> Shawn
>>>>>>
>>>>>> _______________________________________________
>>>>>> clug-progsig mailing list
>>>>>> clug-progsig at clug.ca
>>>>>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
>>>>> _______________________________________________
>>>>> clug-progsig mailing list
>>>>> clug-progsig at clug.ca
>>>>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
>>>> _______________________________________________
>>>> clug-progsig mailing list
>>>> clug-progsig at clug.ca
>>>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
>>>>
>>> _______________________________________________
>>> clug-progsig mailing list
>>> clug-progsig at clug.ca
>>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
>> _______________________________________________
>> clug-progsig mailing list
>> clug-progsig at clug.ca
>> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
>>
>
> _______________________________________________
> clug-progsig mailing list
> clug-progsig at clug.ca
> http://clug.ca/mailman/listinfo/clug-progsig_clug.ca
More information about the clug-progsig
mailing list