[vox-tech] ARE (Tcl / Postgresql) REGEX question

Bryan Richter bryan.richter at gmail.com
Mon Dec 1 23:27:13 PST 2008


On Mon, Dec 1, 2008 at 9:13 PM, Alex Mandel <tech_dev at wildintellect.com> wrote:
> Dylan Beaudette wrote:
>> Hi,
>>
>> I have a rather complex (for me) regular expression that I am trying to figure
>> out.
>>
>> Here is an example that works just fine:
>>
>> -- I am trying to extract the two colors:
>> -- 10YR 6/4 and 7.5YR 4/4 from the following block of text
>> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay
>> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky;
>> hard, friable, sticky and plastic; few very fine and many fine and medium
>> roots; many very fine and fine interstital and tubular pores; few thin clay
>> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
>> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
>>
>>       regexp_matches
>> --------------------------
>>  {"10YR 6/4","7.5YR 4/4"}
>>
>>
>>
>> However, this pattern does not work when there is only one color:
>>
>> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay
>> loam; weak coarse subangular blocky; hard, friable, sticky and plastic; few
>> very fine and many fine and medium roots; many very fine and fine interstital
>> and tubular pores; few thin clay films lining pores; pH 5.4; clear smooth
>> boundary.' , E'([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?
>> [0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
>>
>>
>> I have tried making the second capturing clause optional by appending the '?'
>> operator. This causes the single color example to be parsed correctly, but
>> now the double color example does not work:
>>
>> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay
>> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky;
>> hard, friable, sticky and plastic; few very fine and many fine and medium
>> roots; many very fine and fine interstital and tubular pores; few thin clay
>> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
>> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])?') ;
>>
>>   regexp_matches
>> -------------------
>>  {"10YR 6/4",NULL}
>>
>>
>> Any ideas on how to improve this regex?
>>
>> Thanks!
>>
>> Dylan
>>
>>
>
> Not sure if it helps but I ran into a similar problem running some regex
> in python and the only solution was to find another function.
> In my case findall on the regex object, do you have another function
> that specifies to find all matches and not just the first one, then you
> would only run the 1st 1/2 of your regex and iterate over your text
> until you find all matches.
>

I agree that what you want to do is use a global search for *just* the color
pattern. I.e.

([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])


I can't tell what system you're using, but every one I've seen has a separate
function for iterating over matches, like Alex mentioned.

By the way, this site might be handy:

http://osteele.com/tools/rework/

I used it to test out your regex. (Which could still be cleaned up... but
regexes never actually get pretty. :) )

-- 
Bryan Richter


More information about the vox-tech mailing list