The effect of healthy aging on within-speaker rhythmic variability: A case study on Noam Chomsky

Elisa Pellegrino1,2

1URPP Language and Space, University of Zurich, Switzerland

2Department of Computational Linguistics, University of Zurich, Switzerland

elisa.pellegrino@uzh.ch ORCID: https://orcid.org/0000-0001-5332-8411

 

ABSTRACT

Speech rhythm varies noticeably from language to language, and within the same language as a function of numerous linguistic, prosodic and speaker-dependent factors, among which is the speaker’s age.

Cross-sectional studies comparing the acoustic characteristics of young and old voices have documented that healthy aging affects speech rhythm variability. This kind of studies, however, presents one fundamental limitation: They group together people with different life experiences, healthy conditions and aging rate. This makes it very difficult to disentangle the effect of aging from that of other factors when interpreting the rhythmic differences between younger and older adults.

In the present paper, we overcame such difficulty by tracing rhythmic variability within one single individual longitudinally. We examined 5 public talks held by Noam Chomsky, from when he was 40 to when he was 89. Within-speaker rhythmic variability was quantified through a variety of rate measures (segment/consonant and vowel rate) and rhythmic metrics (%V, %Vn, nPVI-V, n-PVI-C). The results showed that physiological aging affected speech rate measures, but not the durational characteristics of vocalic and consonantal intervals. More longitudinal data from numerous speakers of the same language are necessary to identify generalizable patterns in age-related rhythmic variability.

 

RESUMEN

El efecto del envejecimiento saludable en la variabilidad rítmica intra-hablante: Un ejemplo de estudio con Noam Chomsky.– Las propiedades rítmicas de un lenguaje varían enormemente entre variedades, así como dentro de una misma variedad en función de numerosos factores lingüísticos, prosódicos y factores dependientes del hablante, entre los cuales se encuentra la edad. Estudios transversales que comparan las características acústicas de voces jóvenes y de avanzada edad han documentado asimismo el efecto de la edad del locutor sobre la variabilidad rítmica del habla. Este tipo de estudios, sin embargo, presentan una limitación fundamental al agrupar personas con diferentes experiencias vitales, estados de salud y ritmo de envejecimiento. De esta forma, discernir el efecto del envejecimiento de aquel producido por otros factores, a la hora de interpretar las diferencias rítmicas entre adultos más jóvenes y mayores, resulta considerablemente más difícil. En el presente artículo, superamos dicha dificultad al hacer un seguimiento longitudinal de la variabilidad rítmica de la voz de un único individuo. Hemos examinado 5 charlas públicas de Noam Chomsky, realizadas entre sus 40 y 89 años. La variabilidad rítmica intra-hablante fue cuantificada mediante una variedad de medidas de la velocidad de elocución (segmentales / índices consonánticos y vocálicos) y métricas rítmicas (%V, %Vn, nPVI-V, n-PVI-C). Los resultados muestran que el envejecimiento fisiológico afecta a las medidas de la velocidad de elocución, pero no a las características duracionales de los intervalos vocálicos y consonánticos. Más datos longitudinales procedentes de numerosos hablantes de la misma lengua son necesarios para identificar patrones generales respecto a la variabilidad rítmica relacionada con el envejecimiento.

 

Submitted: 07/03/2019; Accepted: 29/05/2019; Published online: 04/07/2019

Citation / Cómo citar este artículo: Elisa Pellegrino. (2019). The effect of healthy aging on within-speaker rhythmic variability: A case study on Noam Chomsky. Loquens, 6(1), e060. https://doi.org/10.3989/loquens.2019.060

Keywords: vocal aging; speech rhythm; rate measures; within-speaker rhythmic variability.

Palabras clave: envejecimiento vocal; ritmo del habla; medidas de velocidad de elocución; variabilidad rítmica intra-hablante.

Copyright: © 2019 CSIC. Este es un artículo de acceso abierto distribuido bajo los términos de la licencia de uso y distribución Creative Commons Reconocimiento 4.0 Internacional (CC BY 4.0).


 

CONTENTS

ABSTRACT

RESUMEN

INTRODUCTION

THE STUDY

CONCLUSIONS

ACKNOWLEDGEMENTS

NOTES

REFERENCES

APPENDIX A

APPENDIX B

1. INTRODUCTIONTOP

Over the last decades language-specific rhythmic characteristics have been related either to the durational properties of consonantal, vocalic or voicing intervals, (Dellwo, 2006; Dellwo & Fourcin, 2013; Grabe & Low, 2002; Ramus, Nespor, & Mehler, 1999; White & Mattys, 2007) or to the speech amplitude envelope (ENV) characteristics (He & Dellwo, 2014, 2016, 2017; Tilsen & Arvaniti, 2013), and a plethora of acoustic measures have been designed to quantify between-language rhythmic variability. Appendix A provides the list and description of rhythm metrics that are mentioned in this paper.

Numerous studies, mostly using duration-based measurements, have documented that speech rhythm does not vary only across languages but also within the same language as a function of several distinct factors. There is evidence, indeed, that different data collection methods (Arvaniti, 2012), distinct complexity in sentence phonotactics (Arvaniti, 2012; Prieto, Vanrell, Astruc, Payne, & Post, 2012; Wiget et al., 2010), distinct stress and intonation patterns (Prieto et al., 2012), interlocutors’ age (Payne, Post, Astruc, Prieto, & Vanrell, 2009) and speakers’ health conditions (Liss, LeGendre, & Lotto, 2010; Liss et al., 2009) may give rise to substantial within-language rhythmic variability.

Cross-linguistic research comparing the rhythmic variability between children and adults have also shown that the durational characteristics of consonantal and vocalic intervals (henceforth CV intervals) vary greatly also based on the speaker’s age. Payne, Astruc, Prieto, and Vanrell (2011), for example, analysed the durational variability of CV intervals of 2-, 4- and 6-year-old English, Catalan and Spanish children and compared them with the speech of their mothers. The researchers found that compared to adult speech rhythm: a) child speech rhythm in early English learners is more syllable-timed, namely it presents higher proportion over which speech is vocalic (%V) and lower variability in vocalic intervals (e.g., deltaV); b) over time child speech rhythm approximates to the adult stress-timed profile; c) no drastic developmental changes occur in the speech rhythm of children speaking syllable-timed languages. Analyzing the development of speech rhythm in monolingual British English children ranging in age from 4 to 11 years old, Polyanskaya and Ordin (2015) found that the consonantal interval durational variability reaches the values of adult monolinguals at the age of 8, syllable variability is mastered by the age of 10–11, and the acquisition of vocalic intervals gradually develops with increasing age. A reverse pattern was found when rhythmic characteristics are retrieved from syllable intensity variability. Using the same corpus as in Polyanskaya and Ordin (2015), He (2018) found that the scores of metrics based on mean syllable intensity variability (stdev-I, and PVI-I) are higher in children than in adults and that syllable intensity variability decreases from intermediate-aged children to older children.

Despite the fact that speech rhythm research has tackled nearly all factors contributing to speech rhythm variability, relatively little is known about the effect of healthy aging on speech rhythm. The aim of the present paper is precisely to fill this gap. We investigated the extent to which healthy aging contributes to rhythmic variability, using a longitudinal research design. In view of the world-wide trend towards an increasingly aging population, such kind of research may augment the understanding of how world languages will change their acoustic characteristics in near future.

What is currently known about the effect of healthy aging on speech rhythm? Rhythmic variability in later adulthood has been explored in pilot cross-sectional and longitudinal studies in Italian, as well as in research comparing the rhythmic characteristics of younger and older Zurich German speakers. Pettorino, Pellegrino, and Maffia (2014), for example, analyzed utterances of identical lexical content produced by one Italian journalist (Piero Angela) at the ages of 40 and 79. It was found that the sentences spoken at the age of 79, besides the typical variations associated with aged voice (wider tonal range and register, longer and more frequent silences, decreased articulation and speech rates), presented also a remarkable increase in %V and in mean duration of the interval between two consecutive vowel onset points (VtoV) compared to the corresponding sentences produced 39 years before. Based on these findings, Pettorino and Pellegrino (2014) conducted a follow-up cross-sectional study to test whether the rhythmic variations found in the previous longitudinal study had been speaker- or age-dependent. In this research, 4 younger adults (aged between 20 and 25) and 4 older adults (aged between 75 and 80) were asked to read 4 sentences at 4 different articulation rates. The results confirmed the increase in %V in the aged voices found in the longitudinal study. A similar pattern was found in a cross-sectional study comparing the performance of 16 younger adults (aged between 18 and 31) and 10 older adults (aged between 65 and 81), reading aloud 20 sentences in Zurich German with equivalent lexical content (Pellegrino, He, & Dellwo, 2018). It was found that older adults typically speak more slowly, and present higher %V, deltaV and deltaC than younger adults, but the two groups are comparable in terms of the durational characteristics of CV intervals when these are normalized for speech rate.

Although the cross-sectional studies have provided intriguing findings regarding the effect of aging on speech rhythm variability, such kind of research present at least two fundamental limitations (Mann, 2003; Schaie & Hofer, 2001). They are based on a single-occasion observation and group similarly aged subjects together, but it is most likely that these subjects have different life experiences, health conditions and aging rates. In view of these considerations, it is then highly questionable whether: a) the rhythmic differences across the age groups, based on one single-occasion observation, can be generalized to other time periods; and b) physiological aging, exclusively, can account for such differences or whether they are a result of the complex interplay between aging and other speaker-specific factors.

An alternative approach that would enable to assess the impact healthy aging has on rhythmic variations is to investigate this variability longitudinally. In the present study, we will precisely examine age-related rhythmic variability, using a corpus of public speeches given by Noam Chomsky over a time span of nearly 60 years. Based on preliminary evidence in Italian and Swiss German, we expected that with advancing age, the rhythm of Noam Chomsky’s speech presents the following characteristics: slower speech rate, increased %V and constant durational variability of adjacent CV intervals.

One might object, however, that analyses of data of a single speaker are idiosyncratic in nature. Although this is surely an important concern, in the analysis of age-related rhythmic variability, however, there is an added benefit to a single speaker study. Given the evidence that a) rhythmic characteristics of one language are also highly speaker-specific (for varieties of American English, e.g., Yoon, 2010; for Standard British English, e.g., Arvaniti, 2012; Wiget et al., 2010; for Swiss German, e.g., Dellwo, Leeman, & Kolly 2015; He & Dellwo, 2014; Leeman, Kolly, & Dellwo, 2014); b) articulatory factors related to individual differences in the anatomy and movements of speech organs are among the most plausible explanations for between-speaker rhythmic variability (Dellwo, Leeman, & Kolly, 2015), a single speaker study permits to exclude the influence of interspeaker variations on within-language rhythmic variability and to track more precisely the progression of rhythmic characteristics due to aging.

The next question that may arise is why we chose Noam Chomsky to study age-related rhythmic variability. There are mainly two reasons motivating this choice: (a) a practical-methodological reason and (b) a data comparability reason. With regard to (a) we selected Chomsky as the subject of the present investigation for the relatively ease of access to his speech recordings across the life-span. Contrary to the limited availability of longitudinal speech database, mostly accessible only upon request, for Chomsky there instead exists a sizable systematic collection of audio(-visual) recordings of public talks, debates and interviews that are directly downloadable from internet (cf. Noam Chomsky WebPage, via Youtube, Google Video or Vimeo). Moreover, for most of his audio-visual recordings the relative orthographic transcriptions or subtitles are available, something that is not provided in every existing longitudinal database (e.g., Trinity College Dublin Speaker Database – Kelly, Drygajlo, & Harte, 2012). This constitutes an enormous advantage for the experimenters as, with recordings and orthographic transcriptions at hand, they can skip the laborious and time-consuming step of audio-transcription. After the due manual corrections to the orthographic transcriptions, the experimenters can thus proceed to the step of automatic speech annotation with littler effort.

With regard to (b) it has been pointed out that one of the main disadvantages of cross-sectional research was that they group together people with different life experiences, socio-cultural background and health condition. In research focusing on vocal aging, drawing comparisons is yet possible provided the speakers present some fundamental similarities. In this regard, the advantage of collecting and analyzing longitudinally Noam Chomsky’s public speech lies in the comparability of a) his socio-cultural, professional background and b) the speaking style of his speech performance, to those of other American English public speakers for which longitudinal speech corpora have been already collected (e.g., Trinity College Dublin Speaker Database).

Last but not least, for future investigations, comparing the developmental pattern of rhythmic features across speakers of the same language will also permit to provide a more precise picture of which measures (e.g., only rate, or rate and rhythm?) are more susceptible to variation and which are most robust in relation to the aging process.

2. THE STUDYTOP

2.1. Speech material and annotation for rhythmic analysisTOP

To study the contribution of the aging process to speech rhythmic variability, we analyzed a sample of 5 public talks held by Noam Chomsky from 1968, when he was 40, to 2017, when he was 89. There is approximately a 10-year time lag between each of the selected recordings (1968, 1988, 1998, 2008 and 2017), except for the first two samples (1968 and 1988), whose time span is of 20 years. This is due to the difficulty to find online speech samples of comparable speaking style (i.e., public lecture) for 1978 (cf. Appendix B for the title of the public lectures from which the utterances were extracted). Chomsky’s age at each recording was respectively: 40, 60, 70, 80 and 89 years old.

From the individual recordings, we extracted 50 utterances fulfilling the following criteria:

•  minimum five words pronounced without silent or filled pauses;
•  absence of non-vocal phenomena (e.g., applause);
•  absence of vocal non-lexical phenomena (e.g., cough, laugh);
•  absence of verbal non-lexical phenomena (e.g., filled pauses).

The resulting corpus was thus composed of 250 utterances (50 utterances × 5 recording years). All utterances were phonologically labelled and segmented using Munich AUtomatic Segmentation Systems (MAUS; see Schiel, 1999; Kisler, Reichel, & Schiel, 2017). To guarantee maximum accuracy, the automatic segmentation was manually corrected using Praat (version 6.0.39; Boersma & Weenink, 2018). Based on the tier containing the manually annotated speech segments (Tier 1), three other tiers—including consonantal and vocalic interval information—were automatically derived by means of the Praat plug-in CV Tier Creator.[1]

•  Tier 2 (‘CV segments’) included information about whether the segment was consonantal or vocalic.
•  Tiers 3 (‘CV seg int’) and 4 (‘CV intervals’) segmented the signal in consecutive consonantal or vocalic intervals, with the only difference being that each interval of tier 3 also contained the number of consonantal or vocalic segments included in such an interval.

Figure 1 shows an example of annotation for rhythmic analysis.

Figure 1: Levels of annotation for rhythmic analysis in Praat.

2.2. Data analysis and statisticsTOP

In line with previous cross-sectional studies on age-related rhythmic variability in Italian and Swiss German, we calculated several rate and rhythmic measures using the Praat Plug-in Duration Analyzer.[2]

From tier 1, we calculated:

•  segment rate: actual number of segments per second (seg/s).

From tier 2 we computed:

•  consonant rate: actual number of consonants per second (cons/s);
•  vowel rate: actual number of vowels per second (vow/s).

From tier 4 we calculated:

•  %V: Percent of utterance duration composed of vocalic intervals (Ramus, Nespor, & Mehler, 1999);
•  %Vn: Percent of utterance duration composed of vocalic intervals, normalized for number of vocalic intervals. This measure was included to avoid that %V variability over the years (if any) might have been be biased by differences in sentence phonotactics, since the utterances of this corpus had not identical lexical content;
•  nPVI_V: Average durational differences between successive vocalic intervals divided by their sum (× 100) (Grabe & Low, 2002);
•  nPVI_C: Average durational differences between successive consonantal intervals divided by their sum (× 100) (adapted from Grabe & Low, 2002).

To analyse the effect that healthy aging has on the rate and rhythmic features of Noam Chomsky’s speech, we ran the Univariate General Linear Model using IBM SPSS Statistics 22. Rhythm and rate measures were entered as dependent variables, while year at recording (1968, 1988, 1998, 2008, 2017) was entered as a fixed factor. Individual pair-wise comparisons were performed using Bonferroni-corrected post hoc comparisons.

2.3. Results and discussionTOP

A significant main effect of year at recording was found for rate measures and nPVI-V, but not for %V, %Vn and nPVI_C (Table 1).

Table 1: Results of the statistical analysis for rate and rhythm measures.

Measure df F P
segment rate 4, 245 24.951 < 0.0001
vowel rate 4, 245 16.058 < 0.0001
consonant rate 4, 245 13.270 < 0.0001
nPVI_V 4, 245 2.524 0.042
%V 4, 245 1.545 0.19
%Vn 4, 245 2.042 0.089
nPVI_C 4, 245 2.272 0.062

In line with previous findings on age-related speech variations in Italian and Swiss German, with advancing age Chomsky’s speech decreases in rate (Figure 2), and this slowing down affects both consonants and vowels without exception (Figures 3 and 4). Post hoc comparisons confirm the visual impression that a) the slowing down starts gradually from 1998 onwards; b) 2017 utterances were pronounced slower than in all other years (Tables 2a–c).

Figure 2: Segment rate across years at recording.

Figure 3: Consonant rate across years at recording.

Figure 4: Vowel rate across years at recording.

a. Segment rate

Tables 2a–c: Post hoc comparisons with Bonferroni correction for rate measures.

Comparisons p
1998 vs 1968 0.023
2008 vs 1968 0.001
2008 vs 1988 0.009
2017 vs 1968 0.001
2017 vs 1988 0.001
2017 vs 1998 0.001
2017 vs 2008 0.001

b. Consonant rate

Comparisons p
1998 vs 1968 0.018
2008 vs 1968 0.003
2017 vs 1968 0.001
2017 vs 1988 0.001
2017 vs 1998 0.001
2017 vs 2008 0.007

c. Vowel rate

Comparisons p
1998 vs 1988 0.006
2017 vs 1968 0.003
2017 vs 1988 0.001
2017 vs 1998 0.001
2017 vs 2008 0.001

What might have determined the observed age-related rate changes in Chomsky’s speech? The decrease in segment, vowel and consonant rate might relate to the general decay in motor functions in both the peripheral and central nervous system, as well as to the degenerative changes in the laryngeal and supra-laryngeal systems (e.g., atrophy of laryngeal, pharyngeal, mastication and facial muscles, degeneration of the temporo-mandibular joints) (e.g., Amerman & Parnell, 1992; Linville, 2004; Ramig & Ringel, 1983). Nevertheless, in view of findings on the articulatory and acoustic variability across age, the slowing down of speech rate in Chomsky’s speech might be accounted for reasons other than diminished oro-facial strength or neuro-muscular degeneration. From an articulatory viewpoint, there is evidence that the slower speech is not imputable to constrained lip or jaw movements in older adults (Bilodeau-Mercure & Tremblay, 2016; Mefferd & Corder, 2014). According to Bilodeau-Mercure and Tremblay (2016), indeed, older adults reduce their speech rate to compensate for the decrease in lip and jaw stiffness when asked to produce speech at fast and very fast rates. From an acoustic point of view, for example, Fletcher and McAuliffe (2015) found that New Zealand English speakers aged over 65 tended to speak with longer average vowel durations as well as to produce more acoustically distinct vowels. Based on this, we might tentatively interpret the reduced segment/vowel and consonant rate in Chomsky’ speech in view of his attempt to keep his pronunciation clear while giving public speeches.

While the general decrease in speech rate with advancing age is in line with previous research on vocal aging and age-related rhythmic variability (Pellegrino, He, & Dellwo, 2018; Pettorino & Pellegrino, 2014; Pettorino, Pellegrino, & Maffia, 2014), the findings related to %V are more controversial. If in the above mentioned cross-sectional and longitudinal studies in Italian and Swiss German, it was shown that with advancing age %V significantly increased but the durational characteristics of CV intervals remained comparable, in Chomsky’s corpus we found an opposite trend: no effect of recording year for %V and %Vn, but a main effect for PVI-V, although none of the post hoc comparisons reach the significance level (Figure 5).

Figure 5: nPVI-V across years at recording.

How can we explain that difference in %V variability between Chomsky’s speech and previous cross-sectional and longitudinal studies? Given the limitations of cross-sectional studies, one might argue that differences in %V between younger and older adults in Italian and Swiss German may not be accountable for by age-differences exclusively. It is not possible to exclude that age-related changes in %V result from the interplay between aging and interspeaker variability. Studies on interspeaker rhythmic variability have, indeed, shown that %V is one of the rhythmic metrics more prone to vary as function of individual speakers (Dellwo, Kolly, & Leeman, 2015). The findings about Chomsky greatly support this view, as they show that this metrics is also robust against variability induced by the aging process.

Another plausible explanation for the comparatively little effect of aging on Chomsky’s rhythmic variability might have to do with the fact that Chomsky, unlike the older adults involved in the above-mentioned studies, is an experienced public speaker, whose speech organs, muscles and ligaments have undergone intensive training that may likely slow down the degenerative effect of the aging process. Additionally, being a public charismatic speaker, he may exercise conscious control over his speech features in ways that preserve a pretty high durational variability of vocalic intervals, thus confirming previous evidence in charismatic speech (Niebuhr, Voße, & Brem, 2016).

While the “public speaker argument” seems convincing to account for Chomsky age-related rhythmic invariance, this leaves unanswered why for the Italian journalist Piero Angela, %V noticeably increased with aging. They are both public speakers, and share similar socio-educational level, as well as professional background. One might thus argue that the differences between the two speakers are likely to be imputable to methodological differences in data collection methods. In the study on Noam Chomsky, the speech material included 5 actual public speeches, given in front of an audience. In the case of Piero Angela, instead, the analyses were conducted on two speech samples, of which only the first was extracted from an actual TV news broadcast. The second performance instead was elicited, and the journalist was instructed to read the same script of 1968 as if he were hosting a real TV news broadcast. Although the lexical content between the two recordings was controlled, it is not possible to exclude that %V differences may have arisen from a combined effect of aging and speaking style differences. Even more importantly is that in the study on Piero Angela no statistical evidence was provided to confirm the visual impression that %V significantly increases from 1968 to 2007. More longitudinal data are thus required to investigate in depth the role of aging on speech rhythm variability.

What can linguists and speech scientists learn from research on age-related changes in rhythm? And why, at present, is closer examination of age-related rhythmic variability across world languages a worthwhile pursuit? There are several compelling reasons why this topic should be among the top research priorities in linguistic research. As is widely acknowledged, aging is a global issue of increasing significance, transcending national borders:

Compared to 2017, the number of persons aged 60 or above is expected to more than double by 2050 and more than triple by 2100, rising from 962 million in 2017 to 2.1 billion in 2050 and 3.1 billion in 2100. For this age range, 65 per cent of the global increase between 2017 and 2050 will occur in Asia, 14 per cent in Africa, 11 per cent in Latin America and the Caribbean, and the remaining 10 per cent in other areas (United Nations, 2017, p. 13).

Since aging is a worldwide phenomenon, collecting longitudinal data and examining age-related rhythmic variability will not only advance knowledge about vocal aging, it will also contribute to predicting how world languages will change their acoustic characteristics in near future.

This area of research will also have crucial implications for the field of speech pathology. As mentioned above, the rhythmic characteristics of American English speakers were largely influenced by the insurgence of age-related pathological conditions such as dysarthria (Liss, Le Gendre, & Lotto, 2010; Liss et al., 2009). Knowing more about the effect of healthy aging on rhythmic variability will contribute to the creation of normative data, which will be useful for clinicians in delineating between changes related to healthy aging with those attributable to pathological aging.

Another research area that will benefit from studies on vocal aging is the field of speech technologies. Given that with aging voices speech recognition performance drastically decreases in accuracy (Vipperla, Renals, & Frankel, 2010), an in-depth theoretical understanding of age-related changes in speech and voice can be used to create more robust speaker and speech recognition algorithms.

Understanding the relationship between aging and speech rhythm variability will also provide valuable information for forensic speaker identification that may be applied in forensic speaker comparisons. Given that: a) in forensic caseworks the time lag between the recording of the reference speakers and that of a suspect voice can sometimes be in the region of a few years; b) rhythmic metrics based on durational characteristics of CV intervals as well as on syllable intensity variability are highly speaker-specific, knowing the effect that aging has on within speaker rhythmic variability may contribute to a more precise attribution of suspect utterances to the reference speaker.

3. CONCLUSIONSTOP

Based on one speaker, Noam Chomsky, the results of this study show an effect of physiological aging for all rate measures. The older the speaker, the lower the rate. Unlike previous cross-sectional and longitudinal studies in Italian and Swiss German, in the case of analysing Chomsky’s speech over several decades, age-related degenerative changes cannot account for %V variability.

The relatively little effect that aging appears to have on Chomsky’s American English speech could also be related to the intensive training his vocal apparatus undergoes on a daily basis and, due to being an experienced speaker, to the control he is able to exercise over his acoustic features. Given that individuals may vary tremendously in their speech behaviour over time, more longitudinal data are necessary to identify generalizable patterns in age-related rhythmic variability and invariance.

4. ACKNOWLEDGEMENTSTOP

The author would like to thank Volker Dellwo, Sandra Schwab and He Lei for their extremely valuable feedback on a first version of this manuscript. Further thanks go to Sarah Lim and Iuliia Nigmatulina for their contribution to the manual correction of automatic segmentation. Thanks to Carolina Baslino for the Spanish translation of the abstract and keywords.


REFERENCESTOP

Amerman, J. D., & Parnell, M. M. (1992). Speech timing strategies in elderly adults. Journal of Phonetics, 20, 65–76.

Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40, 351–373. https://doi.org/10.1016/j.wocn.2012.02.003

Bilodeau-Mercure, M., & Tremblay, P. (2016). Age differences in sequential speech production: Articulatory and physiological factors. Journal of the American Geriatrics Society, 64, e177–e182. https://doi.org/10.1111/jgs.14491

Boersma, P., & Weenink, D. (2018). Praat: Doing phonetics by computer [Computer Program]. Version 6.0.39. www.praat.org

Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for deltaC. In P. Karnowski & I. Szigetieds (Eds.), Language and language-processing (pp. 231–241). Frankfurt am Main: Peter Lang.

Dellwo, V., & Fourcin, A. (2013). Rhythmic characteristics of voice between and within languages. TRANEL - Travaux neuchâtelois de linguistique, 59, 87–107.

Dellwo, V., Leemann, A., & Kolly, M.-J. (2015). Rhythmic variability between speakers: Articulatory, prosodic, and lexical factors. The Journal of the Acoustical Society of America, 137, 1513–1528. https://doi.org/10.1121/1.4906837

Fletcher, A. R., & McAuliffe, M. J. (2015). The relationship between speech segment duration and vowel centralization in a group of older speakers. The Journal of the Acoustical Society of America, 138, 2–32. https://doi.org/10.1121/1.4930563

Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 515–546). Berlin: De Gruyter Mouton.

He, L. (2018). Development of speech rhythm in first language: The role of syllable intensity variability. The Journal of the Acoustical Society of America, 143, EL463–EL467. https://doi.org/10.1121/1.5042083

He, L., & Dellwo. V. (2014). Speaker idiosyncratic variability of intensity across syllables. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH-2014), pp. 233–237. Retrieved from https://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_0233.pdf

He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23, 243–273. https://doi.org/10.1558/ijsll.v23i2.30345

He, L., & Dellwo, V. (2017). Between-speaker variability in temporal organizations of intensity contours. The Journal of the Acoustical Society of America, 141, EL488–EL494. https://doi.org/10.1121/1.4983398

Kelly, F., Drygajlo, A., & Harte, N. (2012). Speaker verification with long-term ageing data. In Proceedings of the 5th IAPR International Conference on Biometrics (ICB), pp. 478–483. https://doi.org/10.1109/ICB.2012.6199796

Kisler, T., Reichel, U. D., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45(C), 326–347. https://doi.org/10.1016/j.csl.2017.01.005

Leemann, A., Kolly, M.-J., & Dellwo. V. (2014). Speaker-individuality in suprasegmental temporal features: Implications for forensic voice comparison. Forensic Science International, 238, 59–67. https://doi.org/10.1016/j.forsciint.2014.02.019

Linville, S. E. (2004). The aging voice. The American Speech-Language-Hearing Association (ASHA) Leader, pp. 12–21.

Liss, J. M., LeGendre, S., & Lotto, A. J. (2010). Discriminating dysarthria type from envelope modulation spectra. Journal of Speech, Language, and Hearing Research, 53, 1246–1255. https://doi.org/10.1044/1092-4388(2010/09-0121)

Liss, J. M., White, L., Mattys, S. L., Lansford, K., Lotto, A. J., Spitzer, S. M., & Caviness, J. N. (2009). Quantifying speech rhythm abnormalities in the dysarthrias. Journal of Speech Language and Hearing Research, 52, 1334–1352. https://doi.org/10.1044/1092-4388(2009/08-0208)

Mann, C. J. (2003). Observational research methods. Research design II: Cohort, cross sectional, and case-control studies. Emerging Medicine Journal, 20, 54–60. https://doi.org/10.1136/emj.20.1.54

Mefferd, A. S., & Corder, E. E. (2014). Assessing articulatory speed performance as a potential factor of slowed speech in older adults. Journal of Speech, Language, and Hearing Research, 57, 347–360. https://doi.org/10.1044/2014_JSLHR-S-12-0261

Niebuhr, O., Voße, J., & Brem, A. (2016). What makes a charismatic speaker? A Computer-based acoustic-prosodic analysis of Steve Jobs tone of voice. Computers in Human Behavior, 64, 366–382. https://doi.org/10.1016/j.chb.2016.06.059

Payne, E., Post, B., Astruc, L., Prieto, P., & Vanrell, M. d. M. (2009). Rhythmic modification in child directed speech. Oxford University Working Papers in Linguistics, Philology & Phonetics, 12, 123–144.

Payne, E., Post, B., Astruc, L., Prieto, P., & Vanrell, M. d. M. (2011). Measuring child rhythm. Language and Speech, 55, 1–27.

Pellegrino, E., He, L., & Dellwo, V. (2018). The effect of ageing on speech rhythm: A study on Zurich German. In Proceedings of the 9th International Conference on Speech Prosody, pp. 133–137. https://doi.org/10.21437/SpeechProsody.2018-27

Pettorino, M., Pellegrino, E. (2014). Age and rhythmic variations: A study on Italian. In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH-2014), pp. 1234–1237. Retrieved from https://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_1234.pdf

Pettorino, M., Pellegrino, E., & Maffia, M. (2014). ‘Young’ and ‘old’ voices: The prosodic auto-transplantation technique for speaker’s age recognition. In Proceedings of the 7th International Conference on Speech Prosody, pp. 135–139.

Polyanskaya, L., & Ordin. M. (2015). Acquisition of speech rhythm in first language. The Journal of the Acoustical Society of America, 138, EL199–EL204. https://doi.org/10.1121/1.4929616

Prieto, P., Vanrell, M. d. M., Astruc, L., Payne, E., & Post, B. (2012). Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication, 54, 681–702. https://doi.org/10.1016/j.specom.2011.12.001

Ramig, L. A., & Ringel, R. L. (1983). Effects of physiological aging on selected acoustic characteristics of voice. Journal of Speech and Hearing Research, 26, 22–30. https://doi.org/10.1044/jshr.2601.22

Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73, 265–292. https://doi.org/10.1016/S0010-0277(99)00058-X

Schaie, K. W., & Hofer, S. M. (2001). Longitudinal studies in aging research. In K. W. Schaie & S. M. Hofer (Eds.), Handbook of psychology of aging (5th ed., pp. 53–77). San Diego: Academic Press.

Schiel, F. (1999). Automatic phonetic transcription of non-prompted speech. 14th International Congress of Phonetic Sciences (ICPhS-14), 607–610. Retrieved from https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS1999/papers/p14_0607.pdf

Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. The Journal of the Acoustical Society of America, 134, 628–639. https://doi.org/10.1121/1.4807565

United Nations (2017). The World Population Prospects: the 2017 Revision (Report ESA/P/WP/248). New York: United Nations. Retrieved from https://esa.un.org/unpd/wpp/publications/files/wpp2017_keyfindings.pdf

Vipperla, R., Renals, S., & Frankel, J. (2010). Ageing voices: The effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music Processing, 525783.

White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35, 501–522. https://doi.org/10.1016/j.wocn.2007.02.003

Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O., & Mattys, S. L. (2010). How stable are acoustic metrics of contrastive speech rhythm? The Journal of the Acoustical Society of America, 127, 1559–1569. https://doi.org/10.1121/1.3293004

Yoon, T.-J. (2010). Capturing inter-speaker invariance using statistical measures of speech rhythm. In Proceedings of the 5th International Conference on Speech Prosody, pp. 1–4.

APPENDIX ATOP

List and description of rhythmic metrics based on durational properties of CV and syllable intensity variability.

Metrics Description
%V Percent of utterance duration composed of vocalic intervals
%Vn Percent of utterance duration composed of vocalic intervals, normalized for number of vocalic intervals
deltaV Standard deviation of the duration of vocalic intervals
deltaC Standard deviation of the duration of consonantal intervals
nPVI_V Normalized pairwise variability index for vocalic intervals. Average durational differences between successive vocalic intervals divided by their sum (× 100)
nPVI_C Normalized pairwise variability index for consonantal intervals. Average durational differences between successive vocalic intervals divided by their sum (× 100)
stdev-I Standard deviation of syllable mean intensity
PVI-I Pairwise variability index for mean syllable intensity. Average intensity differences between successive syllable divided by their sum (× 100)

APPENDIX BTOP

Title and year of public lectures. The recordings are available at Chomsky’s home web page (https://chomsky.info/) and YouTube.

1968: U.S. Interest in Vietnam. Speech delivered at a draft resistance meeting in New York.

1988: Media Lecture. Montclair State University.

1998: On the World Economy. Democracy Now.

2008: Lectures on Modern-Day American Imperialism: Middle East and Beyond. Boston University School of Law and the Boston University Anti-War Coalition.

2017: Starr Forum: Racing to the Precipice: Global Climate, Political Climate. MIT Center for International Studies.