Helena Moniz, Fernando Batista, Hugo Meinedo, Alberto Abad, Isabel Trancoso, Ana Isabel Mata, Nuno Mamede, FLUL/CLUL
This work explores prosodic/acoustic cues for improving a baseline phone segmentation module. The baseline version is provided by a large vocabulary continuous speech recognition system. An analysis of the baseline results revealed problems in word boundary detection, that we tried to solve by using post-processing rules based on prosodic features (pitch, energy and duration). These rules achieved better results in terms of inter-word pause detection, durations of silent pauses previously detected, and also durations of phones at initial and final sentence-like unit level. These improvements may be relevant not only for retraining acoustic models, but also for the automatic punctuation task. These two tasks were evaluated. Results based on more reliable boundaries are promising. This work allows us to tackle more challenging problems, combining prosodic and lexical features for the identification of sentence-like units.