[vox-tech] getting woody ISO's -- my experience with jigdo

vox-tech@lists.lugod.org vox-tech@lists.lugod.org
Wed, 24 Apr 2002 22:03:14 -0400


On Wed, Apr 24, 2002 at 11:24:57AM -0700, Peter Jay Salzman wrote:
> http://www.dirac.org/linux/debian/jigdo/

#   The only Woody ISO's I know about are in Hungary or somewhere else far
#   away. The bandwidth to these places, to put it bluntly, stink. 

  Possibly a better phrase is "bandwidth between my house and that
site in Hungary (a place practically on the other side of the plant) 
stinks".  ;)

#   It took me almost 2 days to download a single ISO image. I suspect
#   these sites intentionally limit bandwidth for downloading ISO's
#   to discourage

  It's more likely that everyone else on the planet released that
hosting ISO images of huge rapidly changing targets is a waste of 
bandwidth and left that role upto people who didn't figure it out.  ;)

  ISO images of a any changing system are impractical, because even
things like rsync (which can resume transfers and transfer only
the chunks of large files that change), use quite a lot of CPU
on the server compared to FTP/HTTP and transfer a some more data 
than what actually changes.

# 0Dr8g8VIfJz0AkV1sg57NQ=
#   Debian:dists/potato/main/binary-all/doc/jargon_4.0.0-4.deb
# nbVOdS9wBCXz6X-VGCZS7Q=
#   Debian:dists/potato/main/binary-all/net/mason_0.13.0.92-2.deb
#
#   and so on. I imagine these are md5sums along with package names and
#   locations within the Debian CD. 
# QUESTION: why are there references to potato here?

  Those packages have not changed since package pools were implemented
and they have not changed since potato was released.

  I'm still reading through the docs for jigdo, after I try it out I'll
post my experience.

    Later,
      Mike

  I don't have first hand knowledge of events... what is below is my
understanding and may possibly have no basis in actual fact.
========================

  The root ftp maintainers bend over backwards to make sure that disk
space and bandwidth used keeping the mirrors uptodate stays a low and
constant as possible.

  Prior to package pools all of the packages for a release were directly
under the "dist/foo" on the ftp server.  So even if slink, potato, and sid
had the very same file .deb file it would need to be stored multiple
places.

  Each time a new release was cut a new release tree was needed to be
made via something like a "cp -r sid woody"... that wastes a bunch of
disk space *and* bandwidth... when the mirrors sync, because every file
has to be transfered a second time.
  
  At some point someone decided to switch to a bunch of symbolic links
so the file is really one place so a "cp -sr sid woody" was used which 
creates a "tree" of symbolic links.  That creates a few tricky issues
for partial mirrors... if your ftp mirror program is good this fixes 
the disk space and bandwidth but if it handles symlinks by omitting 
them you have a mirror with a bunch of missing packages, if it handles 
them via fetching the whole file you are back to the waste of disk 
and bandwidth situation above...
  
  Later on the concept of package pools where created, where files
stopped being stored inside of each distribution they got stored
inside of the "pool" directory... and originally things where
just symbolic links to the pool from the main disk area (if you 
look at potato, you will see a mix of direct files and links into
the pool).  At some point the link farm was dropped.

  Now if there are 5 releases that share the same package there is only
one in the pool.  Package pools where what lead to the "testing"
branch practically everyone uses... 

- with a pool it is very easy to build your own virtual release
  based on and criteria you want like file age, bug tracking results, 
  dependencies, day of week uploaded... all you need to do is generate
  a packages file.
- pool allows you to keep around in one place many different
  versions of the same package which might not even be used by a current
  release.