OCR enhancement through Neighbor Embedding and fast approximate nearest neighbors

D. C. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution (LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-image super-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which we subsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstrate that mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in one Latin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). This is very important in practice, since for each font size, the number of training sets required for each category may be reduced from dozens to just one. We also incorporate randomized κ-d trees into our NE implementation to perform approximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligible MSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin models into a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font category and size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi (pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates (CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100 dpi) in the 6-10 pt range. By contrast, using κ-d trees and the ULM, CER after NE preprocessing averaged less than 7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement. Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.

Original languageEnglish
Title of host publicationApplications of Digital Image Processing XXXV
DOIs
StatePublished - 2012
Externally publishedYes
EventApplications of Digital Image Processing XXXV - San Diego, CA, United States
Duration: Aug 13 2012Aug 16 2012

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume8499
ISSN (Print)0277-786X

Conference

ConferenceApplications of Digital Image Processing XXXV
Country/TerritoryUnited States
CitySan Diego, CA
Period08/13/1208/16/12

Keywords

  • Computer vision
  • Image enhancement
  • Neighbor Embedding
  • OCR character error rate
  • Optical character recognition

Fingerprint

Dive into the research topics of 'OCR enhancement through Neighbor Embedding and fast approximate nearest neighbors'. Together they form a unique fingerprint.

Cite this