[vox-tech] shell script challenge
Shawn P. Neugebauer
vox-tech@lists.lugod.org
Wed, 7 Aug 2002 21:22:55 -0700
On Wednesday 07 August 2002 08:35 pm, Micah Cowan wrote:
> GNU Linux writes:
> > On Wed, Aug 07, 2002 at 12:20:14PM -0700, Chris McKenzie wrote:
> > > I'm using cygwin and I was given the request by my boss to remove all
> > > duplicate files from the server
> > > the server is on the x: drive of the windows machine which means that
> > > cygwin saw it as /cygwin/x
> > > I forget exactly what command I ran toget checksums.txt
> > > but it is in the format
> > >
> > > <checksum> *x:<filename>
> > >
> > > The challenge is to find the duplicate checksums and print the file
> > > name of those checksums. This is tricky because the directories
> > > contain spaces
> >
> > md5 is a better way to go than checksums.
>
> Er... no. MD5 *is* a checksum.
his point is valid, even if the semantics are confusing:
* md5 is a cryptographic hash function. it was designed so that two inputs
would collide with incredibly small probability *and* such collisions would
be very difficult to find. md5sum uses it to produce a value that can act
as a "checksum" for a file or other input data. for most purposes, it can
be assumed that two inputs producing the same md5 "checksum" are identical.
* sum also produces a value that can act as a "checksum," but it is far
from cryptographically secure. no doubt it has a much higher probability of
collision.
the implication is that two inputs producing the same sum "checksum" have a
higher probability of actually being different than two inputs producing the
same md5 "checksum".
shawn.