Enhancement of optical character recognition through neighbor embedding

David C. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Optical character recognition (OCR) is the electronic translation of scanned images of text documents. Frequently text documents are scanned at low resolution (LR) to conserve memory, and OCR engines have difficulty translating such LR images. In this article we apply the neighbor embedding (NE) single-image super-resolution (SISR) technique to LR images of text documents and obtain high resolution (HR) versions, which we subsequently process with OCR. We repeat this experimental procedure using bicubic interpolation (BI) in the preprocessing step. We report our experimental findings comparing the character error rates (CER) of OCR translations before and after NE and BI preprocessing. Our experiments with Latin fonts in the 6pt-10pt range show that at 3x (LR scanning at 100 dpi) and 4x (LR scanning at 75dpi) magnification, CER after NE preprocessing was nearly an order of magnitude lower than CER after BI preprocessing. We also observed that in this point range, OCR applied to LR images scanned at 75 dpi completely failed, and CER was at least 94% for OCR applied to LR images scanned at 100dpi. By contrast, at 3x and 4x magnification, CER after NE preprocessing was under 10% at 6pt and under 3% at 8pt and 10pt.

Original languageEnglish
Title of host publicationProceedings of the IASTED International Conference on Signal and Image Processing, SIP 2012
Pages1-8
Number of pages8
DOIs
StatePublished - 2012
Externally publishedYes
Event14th IASTED International Conference on Signal and Image Processing, SIP 2012 - Honolulu, HI, United States
Duration: Aug 20 2012Aug 22 2012

Publication series

NameProceedings of the IASTED International Conference on Signal and Image Processing, SIP 2012

Conference

Conference14th IASTED International Conference on Signal and Image Processing, SIP 2012
Country/TerritoryUnited States
CityHonolulu, HI
Period08/20/1208/22/12

Keywords

  • Bicubic interpolation
  • Image enhancement
  • Neighbor embedding
  • Optical character recognition
  • Single image super-resolution

Fingerprint

Dive into the research topics of 'Enhancement of optical character recognition through neighbor embedding'. Together they form a unique fingerprint.

Cite this