[vox-tech] Matching Contents of Lists

Bruce Wolk bawolk at ucdavis.edu
Wed Jul 6 21:16:19 PDT 2005



Lango, Trevor M. said the following:
> I have two lists, not necessarily of the same length.  List #1 has two
> columns.  List #2 has one column.  I would like to do the following:
> 
> Scan list #1 line by line.  If a match for column #1 in list #1 is found
> in list #2, extract the matching lines and put them in a new list (#3).
> Otherwise, leave the contents of lists #1 and #2 as they are.
> 
> If I expected the contents of the first column of each list to match
> exactly (character for character) - this would be a simple task with C++
> or the like.  However, the contents will not necessarily be perfectly
> identical.  I do believe they are nearly identical enough though to use
> pattern matching via Perl or the like.  Personally this is difficult for
> me (as a Perl noob), I know how to scan through a file for a
> pre-determined pattern - I don't understand how to scan through a file
> for a pattern that is essentially given by a line in another file...?  I
> have not found anything in my reading of Perl documentation that
> explains how to read a file and use its contents as an argument for the
> pattern to search for in another file (suggestions on excellent Perl doc
> sources appreciated also!).
> 
> This is what the contents of the lists may look like:
> 
> TALL0047A
> TAL0047A
> TAL047A
> TAL47A
> TA0047A
> TA047A
> TA47A
> T0047A
> T047A
> T47A
> T0047
> T047
> T47
> 
> Examples of matching:
> 
> TALL0047A    TALL047A    match
> TALL0047A    TAL0047A	    not a match
> TALL0047A    TAL0470A	    not a match
> 
> 
> The contents will always be one to four alpha characters followed by one
> to four numeric characters possibly followed by one or two alpha
> characters.
> 
> A match would be defined as the following criteria being met:
> 
> - The last one to four digits being identical (excluding leading zeroes)
> - The first one to four letters being identical
> 

I never learned Perl, but Python does everything Perl does. The re
module is your friend.

import re
m = re.compile(r'^([a-zA-Z]+)0*(\d+)\w*$')

Now use m to pull out the two pieces that need to be equal:

m.findall("TALL0047A") returns
[('TALL', '47')]

m.findall("TALL047A") also returns
[('TALL', '47')]

But m.findall("TAL0047A") returns
[('TAL', '47')]

Basically, just run through the lists, using

m.findall(a)==m.findall(b)

as the test. Obviously you have to break out column 1 in the first list,
but that is easy with the split method of the string object.

Bruce Wolk







More information about the vox-tech mailing list