Medical Intelligence and Language Engineering Lab    TTS Demo   |   Downloads   |   Videos   |   Contact Us   |   Site Map      
     Home   |    About Mile   |    Projects   |    Research Area   |    Publications   |    Alumni   |    FAQ's    |    News & Events   |    Gallery
Research Area
  Machine Listening
  Document Analysis and recognition

  Speech Synthesis in Indian Languages
  Medical Image Processing
  Past Research Areas
Current Projects
Download Area

Optical Character Recognition

Optical Character Recognition (OCR) is the machine recognition of characters in a document image, obtained by scanning printed text on paper. A complete working model of Optical Character Recognizer for Tamil , Kannada (monolingual) and bilingual (Tamil+Roman) script has been developed at our lab.

To obtain a copy of the software, contact:
Prof. A.G.Ramakrishnan,
MILE Lab, Dept. of EE
IISc, Bangalore, India - 560 012
Ph: +91-80-22932556


User Instructions & FAQ's for Thamizh Padi are here. The product is designed to run on Windows platform. The current overall recognition rate is around 95%.

Features of our System

  • Precise skew correction method
  • Binarization scheme is noise tolerant
  • Hierarchical tree-structured classifier is used
  • Recognition rate is approximately 98% on good quality documents
  • Output in RTF file format for easy viewing
  • Bold & Italicized characters in the original document have appropriate representation in RTF output
  • Multiple font size and style
  • Inbuilt scanner interface in order to get the document image directly from the scanner
  • Uses TAB code for output
  • Evaluated by Standard for Testing & Quality Control (STQC) , New Delhi
  • FIFO Software Technologies, Salem is using the software for digitizing books

© 2010 Medical Intelligence and Language Engineering Lab - IISc Campus, Bangalore.