[vox-tech] reading a .gz .Z after offset
Eric Engelhard
vox-tech@lists.lugod.org
Thu, 07 Mar 2002 23:21:42 -0800
"Mark K. Kim" wrote:
>
> On Thu, 7 Mar 2002, Jeff Newmiller wrote:
>
> > I don't think you can do seeks in a compressed file... you have to read it
> > sequentially.
> >
> > If you have a plan for dividing up the uncompressed data, perhaps you
> > should do that first and store the split data as separate files
> > (recompressed or not) for purposes of computation.
>
> The zlib library offers a seek function in its utility function API,
> "gzseek(gzFile, z_off_t, int)". Since the zlib compression uses the
> deflation algorithm that compresses data in blocks of a known size, it can
> find the block you're seeking, inflate just that block, and return the
> data (I'm not sure if that's how gzseek works, but I'm just sain' it can
> be done.) I'm sure in all PERL's ingenuity, it can be done in PERL, too.
>
> Go Eric! Keep looking! :)
Thanks Mark. I was poking around with zlib when I got a letter from the
biocluster list saying that NCBI released an unintentional solution in
December. I can now quickly create indexed volumes of specified sizes on
the fly and on a single box.
... but I'm still going to learn something about compression. :-)
--
Eric Engelhard - www.cvbig.org - www.sagresdiscovery.com