[vox-tech] Matching Contents of Lists
Jay Strauss
me at heyjay.com
Fri Jul 8 11:52:38 PDT 2005
Lango, Trevor M. wrote:
> First I apologize for the lame "reply" format - I am forced to use Microsoft Outlook Web Access (shudder) at work and wouldn't you know - it doesn't offer any options for mail format...?
>
>
>
>>Based on your rules above, TALL0047A and TAL0047A do in fact match
>
>
> No, actually - however many characters are present in each have to match. If the number of alpha characters in the first set of each field in the two lists differ - no match.
>
>
>>Are you really saying:
>
>
>>From both items
>>remove trailing alphas
>
>
> No. If trailing alphas are present they must also match.
>
>
>
>>take the last 4 digits
>>remove any leading zeros
>
>
> Yes.
>
>
>>Do the strings always start with alphas? Or are there sometimes numerics
>>within the first 1-4 characters?
>
>
> Yes - always start with alphas.
>
>
>
>>Is there stuff between the leading and ending portions, such that the
>>entries may be more than 10 characters long?
>
>
> There will never be more than 4 leading alphas, 5 numerics, and 2 trailing alphas.
>
So if, the string always starts with alphas, followed by digits,
followed (optionally) by alphas, and the digits must match when leading
zeros are removed then you could:
# here is one method (as always there are many ways to do it)
# read each file, parse the sections of each str, put those in a hash
# then compare the hashs, deleting the keys when you get a match
use strict;
my (@new, %file1, %file2);
open (FILE, "file1");
while (<FILE>) {
my $key = join("",parse($_));
$file1{$key} = $_;
}
close (FILE);
open (FILE, "file2");
while (<FILE>) {
my $key = join("",parse($_));
$file2{$key} = $_;
}
close (FILE);
my %tmp = %file1;
while (my ($key,$value) = each %tmp) {
if (defined $file2{$key}) {
delete $file1{$key};
delete $file2{$key};
push @new, $value;
}
}
print "matching\n";
print join("\n", @new),"\n";
print "in file1 but not file2\n";
print join("\n", sort values %file1),"\n";
print "in file2 but not file1\n";
print join("\n", sort values %file2),"\n";
sub parse {
my $str = $_[0];
# Capture the parts, leading alpha, followed by n digits,
# followed optionally by alphas
$str =~ /([a-zA-Z]+)(\d+)([a-zA-Z]+)?/;
my @str = ($1,$2,$3); # put the matches back into an array
$str[1] =~ s/^0+//; # strip leading 0s from digit portion
return @str;
}
More information about the vox-tech
mailing list