Grégory Beller, IRCAM
This article presents a complete system, Expresso, which can apply to a synthesized or recorded sentence a chosen expression with a chosen degree of intensity, through high quality transformation of the speech signal. The transformation parameters depend on the context and are generated by a Bayesian network, after a learning phase using a corpus of expressive speech examples. This article presents the general system, the recorded expressive corpus, a new hierarchical prosodic model including the degree of articulation and the voice quality, the bayesian network used to generate parameters of transformation, the speech processing algorithms and an evaluation. This system is operational for sentences in French. It has been created to answer the artistic needs of music composers, of dubbing studios and of video production studios.