Comparative Evaluation of Phone Duration Models for Greek Emotional Speech
Abstract
Problem statement: In this study we cope with the task of phone duration modeling for Greek emotional speech synthesis. Approach: Various well established machine learning techniques are applied for this purpose to an emotional speech database consisting of five archetypal emotions. The constructed phone duration prediction models are built on phonetic, morphosyntactic and prosodic features that can be extracted only from text. We employ model and regression trees, linear regression, lazy learning algorithms and meta-learning algorithms using regression trees as base classifiers, trained on a Modern Greek emotional database consisting of five emotional categories: anger, fear, joy, neutral and sadness. Results: Model trees based on the M5’ algorithm and meta-learning algorithms using as base classifier regression trees based on the M5’ algorithm proved to perform better. Conclusion: It was observed that the emotional categories of the speech database with the most uniform distribution of phone durations built the most accurate models.
DOI: https://doi.org/10.3844/jcssp.2010.341.349
Copyright: © 2010 Alexandros Lazaridis, Vasiliki Bourna and Nikos Fakotakis. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,676 Views
- 2,336 Downloads
- 2 Citations
Download
Keywords
- Phone duration modeling
- statistical modeling
- emotional speech
- text-to-speech synthesis