Collaboration on PoLaR annotation of intonation

Nanette Veilleux, Stefanie Shattuck-Hufnagel and Alejna Brugos are collaborating with Byron Ahn at Princeton in the ongoing development of PoLaR, a framework for systematic annotation of suprasegmental annotating prosody and prosodic characteristics. “PoLaR” stands for Points, Levels, and Ranges – three of the core suprasegmental characteristics of intonation that are transcribed in the system.

A chief characteristic of PoLaR is that it aims to decompose prosodic labels into individual prosodic characteristics and cues to prosodic categories, as opposed to transcribing bundles of features (i.e., phones) or cognitive categories (i.e., phonemes). PoLaR facilitates the labelling of individual prosodic characteristics and cues to those characteristics. Characteristics/cues are unbundled from one another on individual tiers, with labels being maximally transparent. To learn more about using PoLaR to label intonation, please visit polarlabels.com.

Speech Prosody 2016 Call for Papers

Have you seen the Speech Prosody 2016 Call for Papers? The deadline for submission of 4-page papers is November 15th, and the submission page is up and running. We hope you'll join is in Boston next May!

Speech Prosody 2016 to be hosted in Boston

We are pleased to share that we will be hosting Speech Prosody 2016 at Boston University from Tuesday, May 31 through Friday, June 3, 2016. Please visit the Speech Prosody 2016 website for details on upcoming deadlines and other updates.

Talk at Speech Prosody 7: Segmental Influences on the Perception of Pitch Accent Scaling in English

Jonathan Barnes, Alejna Brugos, Nanette Veilleux & Stefanie Shattuck Hufnagel. (2014) “Segmental Influences on the Perception of Pitch Accent Scaling in English.” In Proceedings of Speech Prosody 7, Campbell, Gibbon, and Hirst (eds.), pp. 1125-1129.

Full paper: [pdf (4.8 mb)]

Abstract:
In both tone and intonation systems, segmental context is known to influence production and perception of target F0 contours in various ways. Many languages, for example, prefer to realize critical F0 events during maximally sonorous intervals, either by varying the timing of pitch movements, or by virtue of distributional limitations on certain contour types. Current analytic practice, by contrast, routinely ig- nores segmental backdrop when estimating the perceptual efficacy of putative cues, such as F0 turning points, to tone scaling and timing patterns. Results of the perception study presented here argue that pitch accent scaling is best modeled using a weighted average of F0 sampled over a defined region of interest, and that individual sample weights are determined in part by the sonority of the segments from which they are taken. That is, samples from lower sonority segments contribute less to integrated scaling percepts than those from higher sonority segments. This model, called TCoG-F(requency), accounts for crosslinguistic tonal timing and distribution patterns in the literature, and underscores the danger of analyzing tonal phenomena completely apart from the segments that express them.

Poster at Speech Prosody 7: Effects of dynamic pitch and relative scaling on the perception of duration and prosodic grouping in American English

Alejna Brugos & Jonathan Barnes. (2014) “Effects of dynamic pitch and relative scaling on the perception of duration and prosodic grouping in American English.” In Proceedings of Speech Prosody 7, Campbell, Gibbon, and Hirst (eds.), pp. 388-392.

Full paper: pdf (4.5 mb)

Abstract:
Results of two perception experiments suggest that using timing measures alone to compute prosodic structure misses valuable information from pitch. Previous research showed that pitch can distort per- ceived duration: tokens with dynamic or higher f0 are perceived as longer than comparable level-f0 or lower-f0 tokens, and silent intervals bounded by tokens of widely differing pitch are heard as longer than those bounded by tokens closer in pitch (the kappa effect). Phrase edges (signalled by increased dura- tion, pause, phrase tones, and f0 reset) set the scene for pitch to modulate perceived duration. Two new experiments used the same duration and f0 manipulations (level vs. varying-slope rises, at varying pitch ranges) of segmentally-identical base files, in two separate tasks: 1) a linguistic grouping task using an ambiguously-structured phrase and 2) a psychoacoustic study on perceived duration. Results show that effects on perceived duration due to dynamic pitch can be either strengthened or nullified depending on relative scaling of compared tokens. These same manipulations push grouping judgments beyond what would be expected from distortions of perceived duration. This suggests that listeners integrate pitch and timing cues when judging linguistic structure, supporting measures of relative boundary size that combine duration and pitch measures.

Poster: pdf (14.2 mb)

Poster at ASA: Individual differences in the perception of fundamental frequency scaling in American English speech

Nanette Veilleux, Jonathan Barnes, Alejna Brugos, & Stefanie Shattuck-Hufnagel. (2014) “Individual differences in the perception of fundamental frequency scaling in American English speech.” Poster presented at the 167th Meeting of the Acoustical Society of America, Providence, May, 2014.

Abstract:
Although most participants (N = 62) in an F0 scaling experiment judged open syllables (day) as higher in pitch than closed syllable tokens (dane, dave) with the same F0 contour, a subset did not. Results indicate that, in general, listeners perceptually discount F0 over coda regions when judging overall F0 level, and the degree of discount is related to the (lack of) sonority in the coda: day tokens are judged significantly higher than dane tokens which are judged significantly higher than dave tokens with the same F0 contour (dane-dave p < 0.001, dane-day p < 0.01). However, individual dif- ferences are observed: ten listeners showed no significant differences in the perception of F0 levels between the three types of tokens. On the other hand, a contrasting subset of ten subjects demonstrated highly significant differences (p< 0.001). The remaining 42 subjects behaved similarly to the entire subject pool with only slightly less significant differences between dane and day F0 level judgments (p < 0.05). Therefore, for about 16% of subjects, the F0 over the coda is not discounted in judging F0 level. These individual responses in F0 scaling perception mirror differences found in the Frequency Following Response (e.g., [1]) and could indicate individual differences in F0 processing.

Poster at ASA: Dynamic pitch and pitch range interact in distortions of perceived duration of American English speech tokens

Alejna Brugos & Jonathan Barnes. (2014) “Dynamic pitch and pitch range interact in distortions of perceived duration of American English speech tokens.” Poster presented at the 167th Meeting of the Acoustical Society of America, Providence, May, 2014.

Abstract:

Previous research showed that pitch factors can distort perceived duration: tokens with dynamic or higher f0 tend to be perceived as longer than comparable level-f0 or lower-f0 tokens, and silent intervals bounded by tokens of widely differing pitch are heard as longer than those bounded by tokens closer in pitch (the kappa effect). Fourteen subjects were asked to judge which of two exemplars of a spoken word sounded longer. All tokens were created from the same base file with manipulations of objective dura- tion, f0 contour (plateaux vs. rises of different slopes) and pitch range. Results show that pitch range relation between the two exemplars was a stronger predictor of perceived duration distortion than f0 contour. In addi- tion to previously demonstrated effects of f0 height (Yu, 2010), greater f0 discontinuity between tokens increases the likelihood that the first token of a pair will be judged as longer, suggesting that some previous findings showing the effects of dynamic pitch on perceived duration may actually be magnified by the kappa effect. Listeners may be responding to perceived prosodic distance that integrates information from timing (filled and silent intervals) and pitch (pitch slope and pitch jumps across silent intervals).

Poster: [pdf (5.1 mb)]

Poster: A proposal for labelling prosodic disfluencies in ToBI

Alejna Brugos and Stefanie Shattuck-Hufnagel (2012). "A proposal for labelling prosodic disfluencies in ToBI." Poster presented at Advancing Prosodic Transcription for Spoken Language Science and Technology, July 31, 2012, Stuttgart, Germany. [pdf of poster (large size)] [pdf of abstract]

Interspeech 2012 Paper: Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech

Nanette Veilleux, Jonathan Barnes, Alejna Brugos & Stefanie Shattuck-Hufnagel (2012) "Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech," Poster to be presented at Interspeech 2012, September 9-13, Portland, Oregon. [pdf]

This poster will be included in a session entitled "Speech Synthesis: Selected Topics," on Thursday, September 13th.

Abstract
Recent studies have shown that the Tonal Center of Gravity is a better classifier than F0 Turning Points for at least two contrastively timed pitch accents in American English intonation contours. Within this framework, a binary F0 weighting function derived from the F0 contour can be used instead of the natural F0 contour without a degradation in discrimination performance. This success has important implications for speech synthesis. Just as we can capture the functional equivalence of a multitude of auditorily distinct F0 contour shapes in terms of their mapping to a single parameter (the TCoG) via a set of binary weighting functions, this same mapping could be run in reverse as a source to generate natural-sounding variability in speech synthesis.
Index Terms: Tonal Center of Gravity, F0 alignment, pitch accent classification, prosody, speech synthesis

Slides from talk presented at Speech Prosody 2012

Here are the presentation slides used for the talk "The Auditory Kappa Effect in a Speech Context," by Alejna Brugos and Jonathan Barnes, presented at Speech Prosody 6 on May 22, 2012 in Shanghai, China.[ppt]

First author Alejna Brugos was presented a Best Student Paper Award for this paper, based "on the basis of both the written papers and the oral/poster presentations." The written paper is also available for download. [pdf]

Brugos, Alejna & Barnes, Jonathan. (2012). “The auditory kappa effect in a speech context.” Spech Prosody, Shanghai, China.