[vox-tech] ARE (Tcl / Postgresql) REGEX question

Dylan Beaudette debeaudette at ucdavis.edu
Mon Dec 1 18:45:44 PST 2008


Hi,

I have a rather complex (for me) regular expression that I am trying to figure 
out.

Here is an example that works just fine:

-- I am trying to extract the two colors:
-- 10YR 6/4 and 7.5YR 4/4 from the following block of text
SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky; 
hard, friable, sticky and plastic; few very fine and many fine and medium 
roots; many very fine and fine interstital and tubular pores; few thin clay 
films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;

      regexp_matches      
--------------------------
 {"10YR 6/4","7.5YR 4/4"}



However, this pattern does not work when there is only one color:

SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
loam; weak coarse subangular blocky; hard, friable, sticky and plastic; few 
very fine and many fine and medium roots; many very fine and fine interstital 
and tubular pores; few thin clay films lining pores; pH 5.4; clear smooth 
boundary.' , E'([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?
[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;


I have tried making the second capturing clause optional by appending the '?' 
operator. This causes the single color example to be parsed correctly, but 
now the double color example does not work:

SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky; 
hard, friable, sticky and plastic; few very fine and many fine and medium 
roots; many very fine and fine interstital and tubular pores; few thin clay 
films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])?') ;

  regexp_matches   
-------------------
 {"10YR 6/4",NULL}


Any ideas on how to improve this regex?

Thanks!

Dylan


-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341


More information about the vox-tech mailing list