Optical Character Recognition
Optical Character Recognition (OCR) is the machine recognition of characters in a document image, obtained by scanning printed text on paper. A complete working model of Optical Character Recognizer for Tamil , Kannada (monolingual) and bilingual (Tamil+Roman) script has been developed at our lab.
To obtain a copy of the software, contact:
MILE Lab, Dept. of EE
IISc, Bangalore, India - 560 012
User Instructions & FAQ's for Thamizh Padi are here.
The product is designed to run on Windows platform. The current overall
recognition rate is around 95%.
Features of our System
- Precise skew correction method
- Binarization scheme is noise tolerant
- Hierarchical tree-structured classifier is used
- Recognition rate is approximately 98% on good quality documents
- Output in RTF file format for easy viewing
- Bold & Italicized characters in the original document have appropriate representation in RTF output
- Multiple font size and style
- Inbuilt scanner interface in order to get the document image directly from the scanner
- Uses TAB code for output
- Evaluated by Standard for Testing & Quality Control (STQC) , New Delhi
- FIFO Software Technologies, Salem is using the software for digitizing books