[vox-tech] Matching Contents of Lists

Jay Strauss me at heyjay.com
Fri Jul 8 11:52:38 PDT 2005



Lango, Trevor M. wrote:
> First I apologize for the lame "reply" format - I am forced to use Microsoft Outlook Web Access (shudder) at work and wouldn't you know - it doesn't offer any options for mail format...?  
>  
>  
> 
>>Based on your rules above, TALL0047A and TAL0047A do in fact match
> 
> 
> No, actually - however many characters are present in each have to match.  If the number of alpha characters in the first set of each field in the two lists differ - no match.
> 
> 
>>Are you really saying:
> 
> 
>>From both items
>>remove trailing alphas
> 
>  
> No.  If trailing alphas are present they must also match.
>  
> 
> 
>>take the last 4 digits
>>remove any leading zeros
> 
>  
> Yes.
> 
> 
>>Do the strings always start with alphas? Or are there sometimes numerics
>>within the first 1-4 characters?
> 
>  
> Yes - always start with alphas.
> 
> 
> 
>>Is there stuff between the leading and ending portions, such that the
>>entries may be more than 10 characters long?
> 
> 
> There will never be more than 4 leading alphas, 5 numerics, and 2 trailing alphas.
>  

So if, the string always starts with alphas, followed by digits, 
followed (optionally) by alphas, and the digits must match when leading 
zeros are removed then you could:

# here is one method (as always there are many ways to do it)
# read each file, parse the sections of each str, put those in a hash
# then compare the hashs, deleting the keys when you get a match

use strict;

my (@new, %file1, %file2);

open (FILE, "file1");
while (<FILE>) {
	my $key = join("",parse($_));
	$file1{$key} = $_;
}
close (FILE);

open (FILE, "file2");
while (<FILE>) {
	my $key = join("",parse($_));
	$file2{$key} = $_;
}
close (FILE);

my %tmp = %file1;
while (my ($key,$value)	= each %tmp) {

	if (defined $file2{$key}) {
		delete $file1{$key};
		delete $file2{$key};

		push @new, $value;
	}
}

print "matching\n";
print join("\n", @new),"\n";

print "in file1 but not file2\n";
print join("\n", sort values %file1),"\n";

print "in file2 but not file1\n";
print join("\n", sort values %file2),"\n";

sub parse {
	my $str = $_[0];

	# Capture the parts, leading alpha, followed by n digits,
	# followed optionally by alphas
	$str =~ /([a-zA-Z]+)(\d+)([a-zA-Z]+)?/;
	
	my @str = ($1,$2,$3);	# put the matches back into an array
	$str[1] =~ s/^0+//;     # strip leading 0s from digit portion

	return @str;
}


More information about the vox-tech mailing list