OCR for Scripty Fonts?

gdzyn's picture

I think I already know the answer to this, but here it goes ...

I understand the basic technology behind OCR and it's limitations. What I'm trying to do is somewhat out of the ordinary ... I need to capture text off of printed pieces of stationery. Fonts, colors and background colors vary across the board. Ie. There could be a scripty, blue font on a green background.

Does anyone know of ANY OCR software out there that would serve my purposes?

Any help is much appreciated ...

Monoecus's picture

Why don't you put a small barcode to the bottom of any piece?

gdzyn's picture

I need to pull the text from each card into some kind of text editor to refer back to at a later date.

James Scriven's picture

not sure what your purposes are. . . easier steps. . . post the fonts here, some typophile will ID them at light speed, then either purchase or comp the fonts respectively from their sources. . . ?

gdzyn's picture

I need to clarify:

I don't need to identify the font. All I need to do is capture the actual text on the cards into a text editor to archive. The amount of text on each card will vary from 1 line up to 30 lines of text.

timd's picture

I haven't recently used OCR*, but when I did it was, at best, hit and miss on anything but the most basic text. Scripts, I would guess, are the least likely to successfully translate. For the time spent scanning, cleaning up etc. and then proof-reading and correcting it would probably be better to employ a copy typist to retype the items.

Tim

* things might have (must have) improved.

AzizMostafa's picture

Or use speech-to-text Converter if you are English-Speaking, not Hinglish Speaking like me.

aluminum's picture

It's often cheaper to get an intern/temp to manually input the text manually rather than OCR it all and have to go back and fix all the typos.

Gus Winterbottom's picture

If you go to

http://en.wikipedia.org/wiki/Optical_character_recognition

and scroll down to near the bottom to "proprietary software," there are a number of packages that claim to OCR handwriting and non-Latin fonts (for example, Arabic, Hebrew, and Asian). One of these might work on script fonts. Also, IIRC, software like OmniPage can be "trained" to recognize specific symbols, so you might be able to train it on your script fonts, although that would be mighty tedious.

BTW, if you scan in black and white or grey scale instead of full color, color won't matter too much unless something drops out.

Lex Kominek's picture

As someone with a lot of experience with OCR (it used to be my job to scan and OCR full books) I would recommend typing the text out manually, depending on the number of cards you have.

OCR generally only works on 1-bit images (some OCR programs can accept other formats, but they're converted to 1-bit internally). For best results, you'd have to photoshop each card to make sure the text comes through properly in a 1-bit setting. If you're doing a book, you can just run a batch on the pages since they will all be quite similar, but it sounds as if your cards each have a different design with different colours.

Also, script fonts are generally harder to OCR, depending on a number of factors including how connected they are, etc. A good OCR program should be able to recognize script fonts, but that can get expensive.

Finally, OCR will always produce typos. Sometimes it's just an extra period or semicolon if there is dust on the page, sometimes 1 becomes l, but I have yet to see a perfect OCR, especially from a colour image. If accuracy is important to you, you'll have to proofread all of your text.

So, if you still want to go with OCR, go for it, but I'd recommend a fast typist instead.

- Lex

Syndicate content Syndicate content