[vox-tech] Perl Net::HTTP Content-encoding/gzip support broken...

Mike Simons vox-tech@lists.lugod.org
Tue, 25 Mar 2003 02:24:55 -0500


On Mon, Mar 24, 2003 at 01:41:02AM -0500, Mike Simons wrote:
>   Trying to use Net::HTTP to pull compressed web content but it appears
> to be broken.
[...]
> - Know of any perl HTTP modules that really handles compressed content?

  Bleh... I patched up the HTTP module to request and handle
content-encoded data streams.  Sent patches to the libwww-perl list,
but it's not really clean yet.

... Everything is a mess.

1) "mod_gzip" requires configuration to actually compress things dispute
   what the debian package maintainer says in his readme.  This
   configuration you need to basically list every
   filetime/uri/path/mimetype that you want to be compressed, or not
   compressed... so you have a huge list of (don't compress my jpgs, 
   mp3s, pngs, mpegs, etc).

2) "mod_gzip" only sends 'gzip' style Content-encoding, even though 
   gzip and deflate are the exact same compression algorithm... which
   has a slightly different header in front (*: I think this is the case
   but haven't verified it yet)... what;s really bizarre is the gzip
   header has the *length* of uncompressed data in it... which means
   that you can't *stream* through it.

3) Compress::Zlib, doesn't provide hooks to do block by block
   inflation of data... you have to have the whole data stream to 
   use the published API.  Although poorly documented there is a way
   to do block by block decompression of data...

4) I wasn't paying enough attention to details and wasted two hours 
   trying to *see* that the gzip header had to be removed before I could
   feed the data to the inflate function call.  Then again trying to
   figure out that I had to keep track of "bytes pending on the socket"
   and "bytes already decompressed in the local buffer".

5) the libwww-perl authors started work on a revised API which would
   handle all of this much better and HTTP/1.1 support, but the most
   "resent" snapshot has is code from 1998.

> - Know of good documentation source detailing 'Contect-encoding' 
>   data flow over HTTP?

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

  This is a really handy RFC.

    Later,
      Mike