[vox-tech] shell script challenge

Shawn P. Neugebauer vox-tech@lists.lugod.org
Wed, 7 Aug 2002 21:22:55 -0700


On Wednesday 07 August 2002 08:35 pm, Micah Cowan wrote:
> GNU Linux writes:
>  > On Wed, Aug 07, 2002 at 12:20:14PM -0700, Chris McKenzie wrote:
>  > > I'm using cygwin and I was given the request by my boss to remove all
>  > > duplicate files from the server
>  > > the server is on the x: drive of the windows machine which means that
>  > > cygwin saw it as /cygwin/x
>  > > I forget exactly what command I ran toget checksums.txt
>  > > but it is in the format
>  > >
>  > > <checksum> *x:<filename>
>  > >
>  > > The challenge is to find the duplicate checksums and print the file
>  > > name of those checksums.  This is tricky because the directories
>  > > contain spaces
>  >
>  > md5 is a better way to go than checksums.
>
> Er... no. MD5 *is* a checksum.

his point is valid, even if the semantics are confusing:
* md5 is a cryptographic hash function.  it was designed so that two inputs
  would collide with incredibly small probability *and* such collisions would
  be very difficult to find.  md5sum uses it to produce a value that can act
  as a "checksum" for a file or other input data.  for most purposes, it can
  be assumed that two inputs producing the same md5 "checksum" are identical.
* sum also produces a value that can act as a "checksum," but it is far
  from cryptographically secure.  no doubt it has a much higher probability of
  collision.  

the implication is that two inputs producing the same sum "checksum" have a
higher probability of actually being different than two inputs producing the
same md5 "checksum".

shawn.