[texhax] search for text in a pdf file
Tom Schneider
toms at ncifcrf.gov
Fri Aug 6 17:45:21 CEST 2004
> so now i'm back where i started, only just a bit smarter. so what else do
> y'all use to pull text out of a pdf such as this one?
If I understand you, you have an image and want text. That requires
optical character recognition (OCR), a difficult thing.
Fortunately there is at least one open source project:
http://jocr.sourceforge.net/
It's called GOCR or JOCR.
It's still in development but might do the trick. I'm sure they would
appreciate the attention ...
Tom
Dr. Thomas D. Schneider
National Cancer Institute
Laboratory of Experimental and Computational Biology
Frederick, Maryland 21702-1201
toms at ncifcrf.gov
permanent email: toms at alum.mit.edu (use only if first address fails)
http://www.lecb.ncifcrf.gov/~toms/
More information about the texhax
mailing list