[vox-tech] ARE (Tcl / Postgresql) REGEX question

Alex Mandel tech_dev at wildintellect.com
Mon Dec 1 21:13:31 PST 2008


Dylan Beaudette wrote:
> Hi,
> 
> I have a rather complex (for me) regular expression that I am trying to figure 
> out.
> 
> Here is an example that works just fine:
> 
> -- I am trying to extract the two colors:
> -- 10YR 6/4 and 7.5YR 4/4 from the following block of text
> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky; 
> hard, friable, sticky and plastic; few very fine and many fine and medium 
> roots; many very fine and fine interstital and tubular pores; few thin clay 
> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
> 
>       regexp_matches      
> --------------------------
>  {"10YR 6/4","7.5YR 4/4"}
> 
> 
> 
> However, this pattern does not work when there is only one color:
> 
> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
> loam; weak coarse subangular blocky; hard, friable, sticky and plastic; few 
> very fine and many fine and medium roots; many very fine and fine interstital 
> and tubular pores; few thin clay films lining pores; pH 5.4; clear smooth 
> boundary.' , E'([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?
> [0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
> 
> 
> I have tried making the second capturing clause optional by appending the '?' 
> operator. This causes the single color example to be parsed correctly, but 
> now the double color example does not work:
> 
> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky; 
> hard, friable, sticky and plastic; few very fine and many fine and medium 
> roots; many very fine and fine interstital and tubular pores; few thin clay 
> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])?') ;
> 
>   regexp_matches   
> -------------------
>  {"10YR 6/4",NULL}
> 
> 
> Any ideas on how to improve this regex?
> 
> Thanks!
> 
> Dylan
> 
> 

Not sure if it helps but I ran into a similar problem running some regex
in python and the only solution was to find another function.
In my case findall on the regex object, do you have another function
that specifies to find all matches and not just the first one, then you
would only run the 1st 1/2 of your regex and iterate over your text
until you find all matches.

Alex


More information about the vox-tech mailing list