Realization of Prosodic Focuses in Corpus-based Generation of Fundamental Frequency Contours of Japanese Based on the Generation Process Model

Keiko Ochi, University of Tokyo
Keikichi Hirose, University of Tokyo
Nobuaki Minematsu, University of Tokyo

A method was developed for generating sentence F0 contours of Japanese, when a focus is placed in one of the “bunsetsu” of an utterance. It controls F0 based on the F0 model; not frame-by-frame F0 prediction as in the case of HMM-based speech synthesis. The method first predicts differences in the F0 model commands between utterances with and without focus, and then applies them to the F0 model commands predicted beforehand by the baseline method without focus assignment. The baseline method is trained using a large corpus, while corpus for training command differences can be small and not necessarily be uttered by the same speaker of the large corpus. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis, including interpolation/extrapolation of the F0 model commands for focus level control.