Medical Intelligence and Language Engineering Lab    TTS Demo   |   Downloads   |   Videos   |   Contact Us   |   Site Map      
     Home   |    About Mile   |    Projects   |    Research Area   |    Publications   |    Alumni   |    FAQ's    |    News & Events   |    Gallery


MAST: Multi-Script Annotation toolkit for Scene Text

MAST is a semi-automatic tool for annotation of multi-script text from camera captured scene images. The toolkit can be used to generate ground truth data in most Indian scripts for any generic image at the word level or character/symbol level depending on the user’s requirement. The ground truth text regions are represented at the pixel-level. Presently, we have provision for tagging 10 Indic scripts, namely Bangla, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Manipuri, Oriya, Tamil and Telugu besides English. The Devanagari virtual keyboard interface can be used to tag words in other languages like Marathi, Hindi, Konkani and Sanskrit, since all of them use the same script. Likewise, the Bangla virtual keyboard interface can also be used to tag Assamese and Manipuri text. The tool has provision to create interfaces for other scripts easily.
We hope that researchers worldwide will find it useful in creating ground-truth for any generic document image database. By downloading and using the toolkit, you agree to acknowledge their source and cite the paper given below in related publications. MAST tool kit can be downloaded here.
For citation, refer the paper, T. Kasar, D. Kumar, M.N. Anil Prasad, D. Girish and A.G. Ramakrishnan, “MAST: Multi-Script Annotation toolkit for Scenic Text,” Proc. Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data (J-MOCR-AND), Sept. 17, 2011. Beijing, China. (Download)


USAGE: MAST folder contains a list of matlab m-files and script_image folder. In script_image folder, all the virtual keyboards required for annotating in specific scripts can be found. In Matlab command, change to MAST directory, and run the command 'mile_mast_gt'. A GUI will appear as shown. The UI has a set of menu items and on the top, it displays 'WORD LEVEL’ indicating that the UI is used for word annotation.

Use 'LOAD' button, to load the image to be tagged. The MAST folder contains an example image named as 'DSC02617.JPG', which is also used here for explaining the use of the tool. Once the image is loaded, the GUI will be as shown in the adjacent figure. During the process of loading, the GUI may take some time due to the conversion of the RGB format image to LAB format.

Zoom into any word using 'Zoom button' present in horizontal menu bar at the top of the GUI. This allows the user to zoom onto a particular word in the image. In the example image, there are six words, with two words each from three scripts. The zoomed image of one of the words is shown. Use 'SEGMENT WORD' button for placing seeds within the text strokes using the mouse. A left mouse click will add seeds for region growing. A right click will stop seeding the images and initiate the region-growing process around the seeds. The result of region growing is displayed in the same size as the zoomed image. A thin gray boundary is also shown to indicate the Canny edge to aid the user in deciding the quality of the segmented word.

If the output of region growing is not satisfactory, use slider bar to vary the threshold. The default threshold value is fixed to '25'. This does not involve any change in the number of seeds. Use 'RESEGMENT' button after modifying the threshold value. If the word is not segmented properly even after varying the threshold, use 'RELOAD' button for loading the original image and repeat the process of seeding using 'SEGMENT WORD' button. Old seeds will be removed and new seeds are selected.
Use 'BINARIZE' button if the stroke width of the word is thin and seeds couldn't be placed properly within the text strokes. An 'INVERT' option is also provided for inverting the result in the case of reverse text polarity. Binarization is carried out using Otsu's method. Binarization can be invoked only when the background is clear and distinct. Note: Binarization may yield stray pixels which go unnoticed and change the word annotation values. Such stray pixels can be deleted using ‘DELETE PATCH’ button.

After the word is properly segmented, use 'SELECT SCRIPT' drop down menu to choose specific script in which the word needs to be annotated. Presently, the drop down list has 11 scripts. One is 'ENGLISH' and the annotation word is entered through keyboard. Other ten scripts are Bangla, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Manipuri, Oriya, Tamil and Telugu and the annotation is done using virtual keyboards and mouse. If any of these 10 scripts is chosen, a virtual keyboard will pop up in front of the main GUI. The user can maximize the virtual keyboard and start selecting the symbols using the left click button of the mouse. End the entry of the annotation word with a right click. A right click will return the annotated word to main GUI. Figure shows one such virtual keyboard for tagging Kannada script.

The tagged characters/symbols are displayed after clicking the right button. This image shows the selected symbols for the example word to be annotated. Click 'OK' button to return. The selected symbols are converted into Unicode format before being saved in the text file. Matlab supports Unicode in strings. If there is any mistake in tagging the word, the ‘UNDO’ button can be used.

The word annotated will be displayed in the corresponding position of the figure to indicate that it has been tagged. Annotation for another word in Devanagari script is shown below. For 'ENGLISH', a keyboard entry input box is displayed in front of the main GUI.

The adjacent figure shows all the annotated words. Now, use 'SAVE' command for saving the annotated information. Annotation produces the following three types of information:
  • Pixel-accurate segmented image named as 'imagename_seg_full.png'
  • A text file named as 'imagename.txt'
  • A folder named as 'imagename', containing the individual segmented word images.


Example image is 'DSC02617.JPG'. A pixel-accurate segmented image named as 'DSC02617_seg_full.png' is created. A folder is shown, containing the word images, listed in the order in which they were tagged. A text file with the name 'DSC02617.txt' is also created, which gives the following information: First row gives the number of words annotated from the image. Each of the subsequent rows give the word_imagename, position y, position x, height h, width w, script name and ground truth Unicode.

Use 'WORD BOUNDARY' button for cross checking whether the boundary tagged for the words are proper or not. The adjacent figure shows the result for an example image. Zoomed version for few words are displayed below.

Use 'SEGMENT CHAR' button for segmenting the word into its constituent characters/symbols. In this example, we describe symbol level annotation. When 'SEGMENT CHAR' button is clicked, a new symbol level annotation User Interface (UI) will pop up as shown. Symbol level annotation matlab script can be called separately using matlab command window with the name 'mile_mast_chgt'. Here, if it is called from main GUI, it will choose current directory containing the word images and display the first segmented word image. The symbol level tagged image is saved in '.bmp' file format.

A display is provided to indicate the total number of words and the number of the word currently being displayed on the screen. 'PREV WORD' and 'NEXT WORD' buttons are provided to crawl back and forth in the word image folder. Use 'SAVE' button to save the word image in symbol level tagged ‘.bmp' file format. A display is provided to indicate if the word image has been tagged or not. 
Some characters may be merged or broken. After enabling the ‘POLY MASK' button, we can specify the vertices of a polygon using the mouse to merge/split broken/split characters. One can form a polygon around one or more symbols using the left mouse button. For example, the letters 'i' and 'j' require merging while cursive text may require splitting.

After every merge/split operation, the symbols are relabeled from left-to-right. However, the automatic labeling order fails for vertically-aligned or curved text. Using the 'ORDER SYMBOLS', the desired order in which the symbols can be extracted for training can be assigned manually.

Use 'EXIT' button for exiting the GUI.
© 2010 Medical Intelligence and Language Engineering Lab - IISc Campus, Bangalore.