Incorporation of Excitation Source and Duration Variations in Speech Synthesized at Different Speaking Rates

Sri Harish Reddy Mallidi, International Institute of Information Technology, Hyderabad
B Yegnanarayana, International Institute of Information Technology, Hyderabad

The effect of speaking rate on the excitation source is examined using instantaneous fundamental frequency ($F_{0}$) and perceived loudness ($\eta$). The instantaneous $F_{0}$ and $\eta$ seem to increase in the case of normal to fast speech, where as they are speaker-specific for the case of normal to slow speech. The study on duration variations of voiced, unvoiced and silence segments show that the duration changes are not uniform when speaking rate is varied. These observed variations in the excitation source and durations are incorporated in the epoch-based duration modification method. Perceptual studies show that these variations are significant for the perception of speaking rate.