Medical Intelligence and Language Engineering Lab    TTS Demo   |   Downloads   |   Videos   |   Contact Us   |   Site Map      
     Home   |    About Mile   |    Projects   |    Research Area   |    Publications   |    Alumni   |    FAQ's    |    News & Events   |    Gallery
Research Area
  Machine Listening
  Document Analysis and recognition

  Speech Synthesis in Indian Languages
  Medical Image Processing
  Past Research Areas
Current Projects
Download Area

Processing of Camera Captured Images

The camera provides a new alternative for document acquisition in less-constraint imaging environments. In addition to imaging hard copy documents, they can be used to capture non-paper document images like text printed on 3-D real world objects. However, due to the variations in the imaging condition as well as the target document type, traditional scanner-based OCR systems cannot be directly applied on camera-captured images and a new level of processing needs to be addressed.

Complex documents with both graphics and text, where the text varies in color and size, call for specialized binarization techniques. We propose a novel method for binarization of color documents whereby the foreground text is output as black and the background as white regardless of the polarity of foreground and background shades.

Camera-captured images inherently exhibit perspective distortion and can impair the performance of conventional OCR systems. We have used the horizontal and vertical vanishing points to obtain a fronto-parallel view of the document.

Document image analysis often requires mosaicing since it is not possible to capture a large document at a reasonable resolution in a single exposure. Such a document is captured in parts and a mosaicing technique is used to stitch them into a single image. We investigate a feature-based approach for automated mosaicing of camera-captured document images.


Any camera-based image can originate from non-paper documents like text on 3-D real world objects such as buildings, billboards, road signs, license plates or even on a T-shirt which normally would be inaccessible. Unlike that of processing conventional document images, scene text understanding normally involves a pre-processing step of text region location and extraction before subjecting the acquired image for character recognition task. The subsequent recognition task is performed only on the detected text regions so as to mitigate the effects of background complexity. There is a significant need for methods to extract and recognize text in scenes since such text is the primal target of camera-based document analysis systems.


Conventional OCR systems cannot handle document images that contain multi-oriented text lines. Just like the skew detection and correction steps in conventional OCRs, alignment of curvilinear to rectilinear text is an indispensable preprocessing step in the analysis of newer document types that contain multi-oriented text.

2010 Medical Intelligence and Language Engineering Lab - IISc Campus, Bangalore.