[vox-tech] Perl Net::HTTP Content-encoding/gzip support broken...
Mike Simons
vox-tech@lists.lugod.org
Tue, 25 Mar 2003 02:24:55 -0500
On Mon, Mar 24, 2003 at 01:41:02AM -0500, Mike Simons wrote:
> Trying to use Net::HTTP to pull compressed web content but it appears
> to be broken.
[...]
> - Know of any perl HTTP modules that really handles compressed content?
Bleh... I patched up the HTTP module to request and handle
content-encoded data streams. Sent patches to the libwww-perl list,
but it's not really clean yet.
... Everything is a mess.
1) "mod_gzip" requires configuration to actually compress things dispute
what the debian package maintainer says in his readme. This
configuration you need to basically list every
filetime/uri/path/mimetype that you want to be compressed, or not
compressed... so you have a huge list of (don't compress my jpgs,
mp3s, pngs, mpegs, etc).
2) "mod_gzip" only sends 'gzip' style Content-encoding, even though
gzip and deflate are the exact same compression algorithm... which
has a slightly different header in front (*: I think this is the case
but haven't verified it yet)... what;s really bizarre is the gzip
header has the *length* of uncompressed data in it... which means
that you can't *stream* through it.
3) Compress::Zlib, doesn't provide hooks to do block by block
inflation of data... you have to have the whole data stream to
use the published API. Although poorly documented there is a way
to do block by block decompression of data...
4) I wasn't paying enough attention to details and wasted two hours
trying to *see* that the gzip header had to be removed before I could
feed the data to the inflate function call. Then again trying to
figure out that I had to keep track of "bytes pending on the socket"
and "bytes already decompressed in the local buffer".
5) the libwww-perl authors started work on a revised API which would
handle all of this much better and HTTP/1.1 support, but the most
"resent" snapshot has is code from 1998.
> - Know of good documentation source detailing 'Contect-encoding'
> data flow over HTTP?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
This is a really handy RFC.
Later,
Mike