Improvement of Tone Intelligibility for Average-Voice-Based Thai Speech Synthesis
- 1 Department of Electrical Engineering, Faculty of Engineering at Si Racha, Kasetsart University, 199 M.6, Tungsukhla, Si Racha, Chonburi, 20230, Thailand
- 2 School of Social and Environmental Development, National Institute of Development Administration, 118 M.3, Serithai Road, Klong-Chan, Bangkapi, Bangkok, 10240, Thailand
Abstract
Problem statement: Tone intelligibility in speech synthesis is an important attribute that should be taken into account. The tone correctness of the synthetic speech is degraded considerably in the average-voice-based HMM-based Thai speech synthesis. The tying mechanism in the decision tree based context clustering without appropriate criterion causes unexpected tone neutralization. Incorporation of the phrase intonation to the context clustering process in the training stage was proposed early. However, the tone correctness is not satisfied. Approach: This study proposes a number of tonal features including tone-geometrical features and phrase intonation features to be exploited in the context clustering process of HMM training stage. Results: In the experiments, subjective evaluations of both average voice and adapted voice in terms of the intelligibility of tone are conducted. Effects on decision trees of the extracted features are also evaluated. By considering gender in training speech, two core experiments were conducted. The first experiment shows that the proposed tonal features can improve the tone intelligibility for female speech model above that of male speech model, while the second experiment shows that the proposed tonal features improve the tone intelligibility for gender dependent model than for gender independent model. Conclusion: All of the experimental results confirm that the tone correctness of the synthesized speech from the average-voice-based HMM-based Thai speech synthesis is significantly improved when using most of the extracted features.
DOI: https://doi.org/10.3844/ajassp.2012.358.364
Copyright: © 2012 Suphattharachai Chomphan and Chutarat Chompunth. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,415 Views
- 2,853 Downloads
- 0 Citations
Download
Keywords
- Thai speech
- speech synthesis
- tone intelligibility
- tone correctness
- generative model
- context clustering
- average voice
- hidden Markov models