Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope




speech rhythm, spectral coherence, temporal envelope, jaw displacement


Lower modulation rates in the temporal envelope (ENV) of the acoustic signal are believed to be the rhythmic backbone in speech, facilitating speech comprehension in terms of neuronal entrainments at δ- and θ-rates (these rates are comparable to the foot- and syllable-rates phonetically). The jaw plays the role of a carrier articulator regulating mouth opening in a quasi-cyclical way, which correspond to the low-frequency modulations as a physical consequence. This paper describes a method to examine the joint roles of jaw oscillation and ENV in realizing speech rhythm using spectral coherence. Relative powers in the frequency bands corresponding to the δ-and θ-oscillations in the coherence (respectively notated as %δ and %θ) were quantified as one possible way of revealing the amount of concomitant foot- and syllable-level rhythmicities carried by both acoustic and articulatory domains. Two English corpora (mngu0 and MOCHA-TIMIT) were used for the proof of concept. %δ and %θ were regressed on utterance duration for an initial analysis. Results showed that the degrees of foot- and syllable-sized rhythmicities are different and are contingent upon the utterance length.


Download data is not yet available.


Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press.

Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. In B. Bel & I. Marlien (Eds.), Proceedings of Speech Prosody 2002 (pp. 163-166). Aix-en-Provence, France: Laboratoire Parole et Langage, SProSIG.

Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.

Bertrán, A. P. (1999). Prosodic typology: On the dichotomy between stress-timed and syllable-timed languages. Language Design, 2, 103-131.

Boucher, V. J., Gilbert, A. C., & Jemel, B. (2019). The role of low-frequency neural oscillations in speech processing: Revising delta entrainment. Journal of Cognitive Neuroscience, 31(8), 1205-1215. PMid:30990387

Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436. PMid:19609344 PMCid:PMC2700967

Cichocki, W., Selouani, S.-A., & Perreault, Y. (2014). Measuring rhythm in dialects of New Brunswick French: Is there a role for intensity? Canadian Acoustics · Acoustique Canadienne, 42(3), 90-91.

Cohen, M. X. (2017). Matlab for Brain and Cognitive Scientists. Cambridge, MA: MIT Press.

Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2), 145-171.

Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51-62.

Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for ΔC. In P. Karnowski & I. Szigeti (Eds.), Sprache und Sprachverarbeitung - Language and language-processing (Linguistik International 15, pp. 231-241). Frankfurt a/M: Peter Lang.

Dellwo, V. (2009). Choosing the right rate normalization method for measurements of speech rhythm. In S. Schmid, M. Schwarzenbach & D. Studer (Eds.), La dimensione temporale del parlato: Atti del 5° Convegno Nazionale AISV 2009 (pp. 13-32). Torriana: EDK Editore.

Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage, 85(2), 761-768. PMid:23791839 PMCid:PMC3839250

Erickson, D., & Kawahara, S. (2016). Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistics Vanguard, 2(1), 20150025.

Erickson, D., Suemitsu, A., Shibuya, Y., & Tiede, M. (2012). Metrical structure and production of English rhythm. Phonetica, 69(3), 180-190. PMid:23258465

Eriksson, A. (1991). Aspects of Swedish Speech Rhythm (Gothenburg monographs in linguistics 9). Gothenburg: University of Gothenburg Dissertation.

Fuchs, R. (2016). Speech Rhythm in Varieties of English. Singapore: Springer.

Ghazanfar, A. A., Chandrasekaran, C., & Morrill, R. J. (2010). Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31(10), 1807-1817. PMid:20584185 PMCid:PMC2898901

Ghitza, O. (2017). Acoustic-driven delta rhythms as prosodic markers. Language, Cognition and Neuroscience, 32(5), 545-561.

Giraud, A-L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511-517. PMid:22426255 PMCid:PMC4461038

Grabe, E., & Low, E. L. (2002). Durational variability in speech and rhythm class hypothesis. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 515-543). Berlin & New York: Mouton de Gruyter.

He, L. (2012). Syllabic intensity variations as quantification of speech rhythm: Evidence from both L1 and L2. In Q. Ma, H. Ding & D. Hirst (Eds.), Proceedings of Speech Prosody 2012 (pp. 466-469). Shanghai, China.

He, L. (2018). Development of speech rhythm in first language: The role of syllable intensity variability. Journal of the Acoustical Society of America, 143(6), EL463-EL467. PMid:29960429

He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23(2), 243-273.

Huang, T., & Erickson, D. (2019). Articulation of English "prominence" by L1 (English) and L2 (French) speaker. In S. Calhoun, P. Escudero, M. Tabain & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS-19), paper 134, Melbourne, Australia.

Jones, D. (1922). An Outline of English Phonetics. New York: G. E. Stechert & Co.

Lancia, L., Krasovitsky, G., & Stuntebeck, F. (2019). Coordinative patterns underlying cross-linguistic rhythmic differences. Journal of Phonetics, 72, 66-80.

Lee, C. S., & Todd, N. P. M. (2004). Towards an auditory account of speech rhythm: Application of a model of the auditory "primal sketch" to two multi-language corpora. Cognition, 93(3), 225-254. PMid:15178378

Leong, V., Stone, M. A., Turner, R. E., & Goswami, U. (2014). A role for amplitude modulation phase relationships in speech rhythm perception. Journal of the Acoustical Society of America, 136(1), 366-381. PMid:24993221

Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8(2), 249-336.

Lloyd James, A. (1940). Speech Signals in Telephony. London: Sir I. Pitman.

Lykartsis, A., & Lerch, A. (2015). Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. In P. Svensson & U. Kristiansen (Eds.), Proceedings of the 18th International Conference on Digital Audio Effects (DAFx), paper 42. Trondheim, Norway: Department of Music and Department of Electronics and Telecommunications. Norwegian University of Science and Technology.

MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4), 499-511. PMid:10097020

Morrill, R. J., Paukner, A., Ferrari, P. F., & Ghazanfar, A. A. (2012). Monkey lipsmacking develops like the human speech rhythm. Developmental Science, 15(4), 557-568. PMid:22709404 PMCid:PMC3383808

Nespor, M., & Vogel, I. (1986). Prosodic Phonology. Dordrecht: Foris.

O'Dell, M. L., & Nieminen, T. (1999). Coupled oscillator model of speech rhythm. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS-14), pp. 1075-1078. San Francisco, California: University of California.

Park, H., Kayser, C., Thut, G., & Gross, J. (2016). Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility. eLife, 5, e14521.

Pike, K. L. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press.

Poeppel, D., & M. Assaneo, M. F. (2020). Speech rhythm and their neural foundations. Nature Reviews Neuroscience, 21(6), 322-334. PMid:32376899

Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292.

Richmond, K. (2009). Preliminary inversion mapping results with a new EMA corpus. In Proceedings of the 10th Annual Conference of the International Speech Communication Association - INTERSPEECH 2009 (pp. 2835-2838). Brighton, UK.: ISCA Archive,

Richmond, K., Hoole, P., & King, S. (2011). Announcing the electromagnetic articulography (Day 1) subset of the mngu0 articulatory corpus. In P. Cosi, R. De Mori, G. Di Fabbrizio, & R. Pieraccini (Eds.), Proceedings of the 12th Annual Conference of the International Speech Communication Association - INTERSPEECH 2011 (pp. 1505-1508). Florence, Italy: ISCA Archive,

Roach, P. (1982). On the distinction between "stress-timed" and "syllable-timed" languages. In D. Crystal (Ed.), Linguistic Controversies: Essays in Linguistic Theory and Practice in Honour of F. R. Palmer (pp. 73-79). London: Edwards Arnold.

Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic Inquiry, 11(3), 563-605.

Strauß, A., & Schwartz, J-L. (2017). The syllable in the light of motor skills and neural oscillations. Language, Cognition and Neuroscience, 32(5), 562-569.

Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134(1), 628-639. PMid:23862837

Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. Journal of the Acoustical Society of America, 124(2), EL34-EL39. PMid:18681499 PMCid:PMC5570052

Wenk, B. J., & Wioland, F. (1982). Is French really syllable-timed? Journal of Phonetics, 10(2), 193-216.

Wrench, A. (1999). MOCHA MultiCHannel Articulatory database: English (MOCHA-TIMIT). (accessed 25 December 2018).



How to Cite

He, L., & Zhang, Y. . (2020). Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope. Loquens, 7(2), e074.




Funding data

Universität Zürich
Grant numbers FK-19-069;FK-20-078

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Grant numbers P2ZHP1_178109