Interspeech 2012 Paper: Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech

Nanette Veilleux, Jonathan Barnes, Alejna Brugos & Stefanie Shattuck-Hufnagel (2012) “Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech,” Poster to be presented at Interspeech 2012, September 9-13, Portland, Oregon. [pdf]

This poster will be included in a session entitled “Speech Synthesis: Selected Topics,” on Thursday, September 13th.

Recent studies have shown that the Tonal Center of Gravity is a better classifier than F0 Turning Points for at least two contrastively timed pitch accents in American English intonation contours. Within this framework, a binary F0 weighting function derived from the F0 contour can be used instead of the natural F0 contour without a degradation in discrimination performance. This success has important implications for speech synthesis. Just as we can capture the functional equivalence of a multitude of auditorily distinct F0 contour shapes in terms of their mapping to a single parameter (the TCoG) via a set of binary weighting functions, this same mapping could be run in reverse as a source to generate natural-sounding variability in speech synthesis.
Index Terms: Tonal Center of Gravity, F0 alignment, pitch accent classification, prosody, speech synthesis

Post a Comment

Your email address is never shared. Required fields are marked *