Document Analysis and Recognition
Traditional document image analysis systems use flatbed scanners for imaging hard copy paper manuscripts. However, in recent years, it has become hard to define the term document due to the blur in the distinction between documents and user-interfaces. A document is no longer confined to scanned pages. Any camera-based image can originate from non-paper documents like text on 3-D real world objects. Pen-based and touch screen devices such as Tablet PCs can also be used to acquire handwritten data where a sensor picks up the pen-tip movement and interpret the movements of the pen across the writing surface, translating the resulting curves into digital text.
A complete working model of printed OCR for Tamil, Kannada (monolingual) and bilingual (Tamil + Roman) script has been developed at our lab.
We are also developing an online hand-writing recognition system for regional Indian languages such as Tamil and Kannada.
Specialized pre-processing techniques have also been developed for handling camera-captured images with emphasis on multi-script documents.