[vox-tech] OCR on the fly

Karsten M. Self kmself at ix.netcom.com
Wed Feb 23 02:15:46 PST 2005


on Tue, Feb 22, 2005 at 04:45:38PM -0800, Dave Margolis (margolisdm at earthlink.net) wrote:
> Does anyone know of a program that I could run a few thousand GIF images 
> through, perform an OCR-like operation on each, and get some kind of 
> text back for putting into a database for searching purposes.
> 
> I'm looking into making my collection of daily comics searchable.  I 
> know the fonts in most comics don't lend themselves to very good OCR, 
> but I'm thinking a certain margin of error would be acceptable.
> 
> And in case you're wondering, no, I'm not planning to make this 
> public...just for me.

No specific pointers, mostly bad news...

I've looked at a few free GNU/Linux-based OCR solutions and found *very*
mixed results.  Output is *highly* dependent on inputs, and poor
quality, dirty, misaligned, etc., images dramatically impact quality.

I'm not sure what the paid-up options are.  One alternative that works
very well for Groklaw is the IGM method.  That's Internet group mind.
Piece out the material to be OCRd, have different people text it, and
assemble the results.  For dealing with legal faxes, it's great (I can
testify to this, having typed out a few myself).


Peace.

-- 
Karsten M. Self <kmself at ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
    I call bullshit on that one, sorry, no man pages no docs.  Come on
    now, what are they supposed do?  Call up the Psychic Hotline?
    - tek, describing GNOME documentation, on linux-elitists
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://ns1.livepenguin.com/pipermail/vox-tech/attachments/20050223/15690852/attachment.bin


More information about the vox-tech mailing list