Online Hand-writing Recognition
On-line handwriting recognition involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements X(t),Y(t) as well as pen-up/pen-down switching. That kind of data is known as digital ink and can be regarded as a dynamic representation of handwriting. The obtained signal is converted into letter codes which are usable within computer and text-processing applications.
The elements of an on-line handwriting recognition interface typically include:
- A pen or stylus for the user to write with.
- A touch sensitive surface, which may be integrated with, or adjacent to, an output display.
- A software application which interprets the movements of the stylus across the writing surface, translating the resulting curves into digital text.
A modern handwriting recognition system can be seen in Microsoft's version of Windows XP operating system for Tablet PCs.
We in our lab are developing an online hand-writing recognition system for regional Indian languages such as Tamil and Kannada.
The OHWR Consortium funded by TDIL, DIT is a collaborative research project on online handwritten character recognition in multiple Indian languages. The consortium comprises 4 academic institutes (IISc, IIT Madras, IIIT Hyderabad, ISI Kolkata) and 3 application partners (CDAC Pune, Learnfun Systems, CK Technologies). The academic partners research on the various issues involved in recognition of online handwritten text. The project has already completed its first phase and the engines in 6 languages have been integrated into 3 different form filling applications (Census data collection, Report of Motor Vehicle Inspector on a Vehicle Involved in an Accident and Application for Encumbrance Certificate) developed by the application partners.
MILE Laboratory deals with developing state of art technologies for recognizing text written in Tamil and Kannada. Our contributions for the first phase are summarized below:
- A comprehensive set of symbols have been proposed to represent all possible aksharas (simple and compound characters) in Tamil and Kannada. A minimal set of words has been generated in each language, to cover these symbols.
- 100,000 word samples have been collected from over 1000 native writers in Tamil Nadu and Karnataka. As many as 10 educational institutions participated in the data collection process.
- A common XML standard has been proposed for the purpose of annotation. The quality of handwritten data have been graded into 5 classes (A, B, C, D and R) - 'A' being the highest and R the lowest. Figure 1 below illustrates the annotation tool being used to annotate a Kannada handwritten word.
- The Statistical Dynamic Time Warping classifier has been implemented to recognize the symbols in a given word.
- The recognition engines currently give an average of 88% accuracy on good data (class A).
Our goals in the second phase of the consortium project include:
- Improving the recognition accuracy at the symbol level to 95% for class A data by using state of art techniques such as verification /feedback strategies, classifier fusion/ combination, post processing (disambiguation of confused symbols) and statistical language modeling techniques.
- Feeding the output of the word recognizer to our text to speech engine and testing the usefulness of the device for patients who had laryngectomy.
- Work on writer adaptation and recognition at the sentence level.