TTS stands for Text to Speech synthesis. TTS engine takes text as input and produces speech in a particular language. TTS efficiency is determined based on the quality of synthetic speech. Broadly classified, there are two types of synthesis. One is format based or rule based and the other is concatenative based synthesis. TTS engine developed in our department uses an efficient unit selection algorithm and is based on concatenative speech synthesis method. Earlier TTS engines developed used phoneme, diphone and syllable as units while the latest version of TTS uses a polyphone as unit.
TTS engine for tamil language is developed and the synthesized speech is intelligible and natural. The database used consists of 1027 phonetically rich tamil sentences. Below find some of the synthesized tamil sentences using the above database. These sentences are not completely available in the database. Subunits from the database are selected using an efficient unit-selection algorithm, concatenated, coupled and smoothened to produce a natural sounding synthetic speech.
Visit http://mile.ee.iisc.ernet.in/tts to test our quality Kannada/Tamil TTS.
Natural Language Processing (NLP)
Estimation of pauses in TTS
The quality of a TTS system is measured by the intelligibility and naturalness of the synthesized speech. The right amount of pause inserted at the right place contributes significantly to the naturalness of speech; whereas, a sentence without pause or with equal pause intervals between successive words sounds robotic.
In MILE Lab, Dept of Electrical Engg, IISc, we have developed a pause model for the TTS system, based on syntactic information. In the current version, the parts of speech (POS) information is used to find the right amount of pause in Tamil sentences. We have developed a simple rule-based POS tagger for this purpose, which does not use a root word dictionary. Nine different levels of pause are considered. This predicted pause levels are used by the DSP module during synthesis of the speech waveform. The evaluation based on the mean opinion score of 10 natives on 10 sentences gives encouraging results. The POS tagger gives 78% accuracy. More context sensitive rules are to be added so that the accuracy of the POS tagger can be improved.