A Kinetic Analysis of Visual Prosody: Head Movements in Habitual and Loud Speech - Q&A with Dr. Doris Mücke, Lena Pagel, Dr. Simon Roessig and Dr. Márton Sóskuthy

June 2, 2023

Prosodic prominence manifests itself in intonation, timing and magnitude of supra-laryngeal articulation, as well as speech-accompanying gestures. The interplay of prosody and gesture has been described as ‘visual prosody’ and is known to play an important role in communication. However, few studies have investigated visual prosody across different speaking styles.

In the study, 'A Kinetic Analysis of Visual Prosody: Head Movements in Habitual and Loud Speech', researchers Dr. Doris Mücke (University of Cologne), Lena Pagel (University of Cologne), Dr. Simon Roessig (Cornell University) and Dr. Márton Sóskuthy (University of British Columbia), examine co-speech head motion related to prosodic prominence in habitual and loud speech, which is further explored in this Q&A with Language Sciences.

What is visual prosody, and what are some expressions of visual prosody that people use regularly?

In spoken communication, prosody refers to speech melody (also known as intonation), the duration of speech sounds, loudness and a variety of other things. When we communicate, prosody is often used to package information and highlight the most important parts of what we are saying. For example, we would use prosody to emphasize the word "Beatrice" in the utterance "It was Beatrice that I called, not Juan." But communication is more than what is spoken: not only the auditory but also the visual channel is involved, making it a multimodal phenomenon. During speech, we continuously produce body movements that are closely timed with our sentences and serve important communicative functions. It has been found that these body movements often occur together with certain words in an utterance, namely those that are highlighted because they contain important information (i.e. prosodic prominence) or those that stand at rhythmic breaks in the stream of speech, such as the end of a sentence (i.e. prosodic boundaries). Because prosody can not only be heard in speech, but also seen in these movements, we call them visual prosody. Our study focuses on prosodic highlighting, like in the "Beatrice, not Juan" example above. For prosodic highlighting, the most common expressions of visual prosody include hand gestures, head nods or tilts and eyebrow raises; however, the range of possible movements is vast.

How might head motion act as a function of speaking style?

Of course, speakers do not always interact in quiet and undisturbed communicative settings. Depending on their speaking partners and surroundings, they might have to adapt their speaking style: speak more clearly, whisper, or raise their voice to speak over background noise. Given the tight connection between speech and the movements that accompany it, it is not surprising that the choice of speaking style also affects the co-speech gestures that speakers produce. For example, previous studies on both whispered speech and speech in noise have shown that head movements are larger and sometimes have a longer duration than in a regular speaking style. But, as far as we know, there haven’t been any studies looking at whether larger movements also accompany loud speech in the absence of background noise. This is an interesting question as it gets to the mechanism behind changes in movement size: if larger movements are simply a way to reinforce the signal when spoken communication is hampered by noise, we would not expect to see the same increases in loud speech in a noiseless environment. If we do see larger movements in these conditions, that suggests that loud speech is inherently multimodal in ways that go beyond compensation for communicative barriers such as noise.

What differences did you find in head movement between habitual and loud speech?

In our study, we recorded 20 German speakers interacting with a virtual avatar, using 3D Electromagnetic Articulography to capture head kinematics. This involves gluing multiple sensors to the speakers’ head and tracking their movements in three dimensions. We looked at continuous head movement during specific target words, which were produced in regular and loud speaking styles. The results show several robust differences between these styles: 1) in a loud speaking style, the head is in a different position overall, namely higher and/or tilted upwards, 2) head movements are larger and 3) also faster. We were therefore able to show that the increased production effort of a loud speaking style in the vocal tract is mirrored by enhanced co-speech movements such as visible head movements.

Can you explain why between-focus differences are stronger in loud speech than in habitual speech?

Our analysis did not only find differences in head motion between speaking styles but also between so-called focus conditions. Focus is about what part of the utterance is highlighted (like in the "Beatrice, not Juan" example above), and how strongly it is highlighted. We looked at words that are highlighted moderately (in broad focus) vs. words that are highlighted strongly (in corrective focus). In our study, speakers produced the specific target words with both degrees of highlighting in regular and loud speaking styles. It has been shown previously that corrective focus involves more production effort in the vocal tract than broad focus. Our data showed that these two focus types are also differentiated in head motion, that is, in visual prosody. Specifically, head movements were faster in corrective focus than in broad focus in both speaking styles. Additionally, movements were also larger in corrective than in broad focus, though only in loud and not in regular speech. We hypothesize that the fact that visual prosody in more apparent in loud speech may partly be a listener-directed strategy: in order to get the message across, speakers seem to rely more on the visual modality in cases where the auditory channel is thought to be disturbed. Additionally, it is possible that the high effort required for loud speech may enhance all coupled systems, including gross motor control. In other words, if the effort is 'scaled up' in speech production, the production of body movements may involuntarily become more effortful, as well.

What’s next for this research?

We used 3D Electromagnetic Articulography to capture head kinematics. With this technique, we also recorded the simultaneous movements of the jaw, lips, tongue tip and tongue body. As a next step, it would be really interesting to relate the head motion data to these movements of the vocal tract as well as acoustically recorded intonational parameters. This will allow for a comprehensive analysis of prosodic highlighting in regular and loud speech across multiple modalities. We will be able to investigate how speech gestures and co-speech gestures are temporally coordinated. In a next step, we want to study two speakers in a dialogue, to find out if these multimodal speech patterns change (and maybe become more alike) when they interact.

Click here to read the full study.

Written by Kelsea Franzke

A Kinetic Analysis of Visual Prosody: Head Movements in Habitual and Loud Speech - Q&A with Dr. Doris Mücke, Lena Pagel, Dr. Simon Roessig and Dr. Márton Sóskuthy

What is visual prosody, and what are some expressions of visual prosody that people use regularly?

How might head motion act as a function of speaking style?

What differences did you find in head movement between habitual and loud speech?

Can you explain why between-focus differences are stronger in loud speech than in habitual speech?

What’s next for this research?

UBC Language Sciences

About UBC

UBC Campuses

UBC Sites

A Kinetic Analysis of Visual Prosody: Head Movements in Habitual and Loud Speech - Q&A with Dr. Doris Mücke, Lena Pagel, Dr. Simon Roessig and Dr. Márton Sóskuthy

What is visual prosody, and what are some expressions of visual prosody that people use regularly?

How might head motion act as a function of speaking style?

What differences did you find in head movement between habitual and loud speech?

Can you explain why between-focus differences are stronger in loud speech than in habitual speech?

What’s next for this research?

First Nations land acknowledegement

UBC Language Sciences