K Sreenivasa Rao, Ramu Reddy, Sudhamay Maity, Shashidhar Koolagudi, IIT
In this paper the dynamics of prosodic parameters are explored for recognizing the emotions from speech. The dynamics of prosodic parameters refer to local or fine variations in prosodic parameters with respect to time. The proposed dynamic features of prosody are represented by : (1) sequence of durations of syllables in the utterance (duration contour), (2) sequence of fundamental frequency values (pitch contour) and (3) sequence of frame energy values (energy contour). Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) is used for analyzing the proposed prosodic features for recognizing the emotions . The emotions considered in this work are anger, disgust, fear, happiness neutral and sadness. Support vector machines (SVM) are explored to discriminate the emotions using the proposed prosodic features. Emotion recognition performance is analyzed separately, using duration patterns of the sequence of syllables, pitch contours and energy contours, and their recognition performance is observed to be 64\%, 67\% and 53\% respectively. Fusion techniques are explored at feature and score levels. The performance of the fusion-based emotion recognition systems is observed to be 69\% and 74\% for feature and score level fusions,respectively.