[vox-tech] shell script challenge - Now MD5sum erratia

Charles Polisher vox-tech@lists.lugod.org
Thu, 8 Aug 2002 12:35:06 -0700


Micah Cowan wrote:
> GNU Linux writes:
>  > Found a very interesting page on md5sum. It's:
>  > 
>  > http://hills.ccsf.org/~jharri01/project.html
>  > 
>  > "So why does MD5 seem so secure? Because 128 bits allows you to have
>  > 2128=340,282,366,920,938,463,463,374,607,431,768,211,456 different
>  > possible MD5 codes"
>  > 
>  > Lots of good reading for insomniacs.
> 
> It still shouldn't be relied upon, however, that two identical MD5
> checksums are sufficient evidence that the corresponding files are
> identical; I've heard more than one person claim to have encountered
> identical MD5 sums for different files, and its certainly not
> impossible, just improbable.

  I'm dubious ;^)

  <H. Lector voice>
    The voices tell you they've seen MD5's collide; do they
    tell you other things, Micah? 
  </H. Lector voice>


> But it's a heluva lot better than running diff from one file to every
> other file - a factorial-time operation! :)

  
  And now, a quibble:
  Actually, a comparison of the entire file would be
  no different than a comparison of the md5sum. From a
  Big O standpoint, it's just a constant factor. If 
  comparing md5's isn't factorial, neither should a full
  diff. If there were a ton of files and the lengths were
  spread out, the comparisons could be further reduced 
  by sorting the list by file size, then comparing only
  among groups of the same size.


-- 
Fscking Pedants. I mean that in the nicest possible way of course.