[vox-tech] reading a .gz .Z after offset

Eric Engelhard vox-tech@lists.lugod.org
Thu, 07 Mar 2002 23:21:42 -0800


"Mark K. Kim" wrote:
> 
> On Thu, 7 Mar 2002, Jeff Newmiller wrote:
> 
> > I don't think you can do seeks in a compressed file... you have to read it
> > sequentially.
> >
> > If you have a plan for dividing up the uncompressed data, perhaps you
> > should do that first and store the split data as separate files
> > (recompressed or not) for purposes of computation.
> 
> The zlib library offers a seek function in its utility function API,
> "gzseek(gzFile, z_off_t, int)".  Since the zlib compression uses the
> deflation algorithm that compresses data in blocks of a known size, it can
> find the block you're seeking, inflate just that block, and return the
> data (I'm not sure if that's how gzseek works, but I'm just sain' it can
> be done.)  I'm sure in all PERL's ingenuity, it can be done in PERL, too.
> 
> Go Eric!  Keep looking! :)

Thanks Mark. I was poking around with zlib when I got a letter from the
biocluster list saying that NCBI released an unintentional solution in
December. I can now quickly create indexed volumes of specified sizes on
the fly and on a single box.

... but I'm still going to learn something about compression. :-)

--
Eric Engelhard - www.cvbig.org - www.sagresdiscovery.com