Xiaojun Zou, Xiao Bao and Lidong Luo, Key Laboratory of Machine Perception, Peking University
Present study in speech synthesis places more and more emphasis on the spectral continuities and diverse prosodic effects. The trainable HMM-based speech synthesis method tends to generate more continuous spectral structures than the traditional unit selection method. However, the F0 trajectory generated by HMM-based speech synthesis is often excessively smoothed and lacks prosodic variance. This paper proposed an approach to improve the effect of F0 trajectory prediction in mandarin speech synthesis in the framework of multi-space probability distribution HMMs (MSD-HMMs). In the proposed approach, the intonation, which is predicted by context-dependent decision trees, is integrated to the F0 trajectory generated by the MSD-HMMs as a weighted bias term. The experiments indicate that it has an encouraging improvement in the prosodic effectiveness of Mandarin speech synthesis.