Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope

Lei He; Yu  Zhang

doi:10.3989/loquens.2020.074

Authors

Lei He Department of Computational Linguistics, University of Zurich https://orcid.org/0000-0002-9552-9075
Yu Zhang Department of Computational Linguistics, University of Zurich https://orcid.org/0000-0002-0865-7897

DOI:

https://doi.org/10.3989/loquens.2020.074

Keywords:

speech rhythm, spectral coherence, temporal envelope, jaw displacement

Abstract

Lower modulation rates in the temporal envelope (ENV) of the acoustic signal are believed to be the rhythmic backbone in speech, facilitating speech comprehension in terms of neuronal entrainments at δ- and θ-rates (these rates are comparable to the foot- and syllable-rates phonetically). The jaw plays the role of a carrier articulator regulating mouth opening in a quasi-cyclical way, which correspond to the low-frequency modulations as a physical consequence. This paper describes a method to examine the joint roles of jaw oscillation and ENV in realizing speech rhythm using spectral coherence. Relative powers in the frequency bands corresponding to the δ-and θ-oscillations in the coherence (respectively notated as %δ and %θ) were quantified as one possible way of revealing the amount of concomitant foot- and syllable-level rhythmicities carried by both acoustic and articulatory domains. Two English corpora (mngu0 and MOCHA-TIMIT) were used for the proof of concept. %δ and %θ were regressed on utterance duration for an initial analysis. Results showed that the degrees of foot- and syllable-sized rhythmicities are different and are contingent upon the utterance length.

Downloads

Download data is not yet available.

References

Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press.

Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. In B. Bel & I. Marlien (Eds.), Proceedings of Speech Prosody 2002 (pp. 163-166). Aix-en-Provence, France: Laboratoire Parole et Langage, SProSIG.

Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. https://doi.org/10.18637/jss.v067.i01

Bertrán, A. P. (1999). Prosodic typology: On the dichotomy between stress-timed and syllable-timed languages. Language Design, 2, 103-131.

Boucher, V. J., Gilbert, A. C., & Jemel, B. (2019). The role of low-frequency neural oscillations in speech processing: Revising delta entrainment. Journal of Cognitive Neuroscience, 31(8), 1205-1215. https://doi.org/10.1162/jocn_a_01410 PMid:30990387

Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436. https://doi.org/10.1371/journal.pcbi.1000436 PMid:19609344 PMCid:PMC2700967

Cichocki, W., Selouani, S.-A., & Perreault, Y. (2014). Measuring rhythm in dialects of New Brunswick French: Is there a role for intensity? Canadian Acoustics · Acoustique Canadienne, 42(3), 90-91.

Cohen, M. X. (2017). Matlab for Brain and Cognitive Scientists. Cambridge, MA: MIT Press.

Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2), 145-171. https://doi.org/10.1006/jpho.1998.0070

Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51-62. https://doi.org/10.1016/S0095-4470(19)30776-4

Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for ΔC. In P. Karnowski & I. Szigeti (Eds.), Sprache und Sprachverarbeitung - Language and language-processing (Linguistik International 15, pp. 231-241). Frankfurt a/M: Peter Lang.

Dellwo, V. (2009). Choosing the right rate normalization method for measurements of speech rhythm. In S. Schmid, M. Schwarzenbach & D. Studer (Eds.), La dimensione temporale del parlato: Atti del 5° Convegno Nazionale AISV 2009 (pp. 13-32). Torriana: EDK Editore.

Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage, 85(2), 761-768. https://doi.org/10.1016/j.neuroimage.2013.06.035 PMid:23791839 PMCid:PMC3839250

Erickson, D., & Kawahara, S. (2016). Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistics Vanguard, 2(1), 20150025. https://doi.org/10.1515/lingvan-2015-0025

Erickson, D., Suemitsu, A., Shibuya, Y., & Tiede, M. (2012). Metrical structure and production of English rhythm. Phonetica, 69(3), 180-190. https://doi.org/10.1159/000342417 PMid:23258465

Eriksson, A. (1991). Aspects of Swedish Speech Rhythm (Gothenburg monographs in linguistics 9). Gothenburg: University of Gothenburg Dissertation.

Fuchs, R. (2016). Speech Rhythm in Varieties of English. Singapore: Springer. https://doi.org/10.1007/978-3-662-47818-9

Ghazanfar, A. A., Chandrasekaran, C., & Morrill, R. J. (2010). Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31(10), 1807-1817. https://doi.org/10.1111/j.1460-9568.2010.07209.x PMid:20584185 PMCid:PMC2898901

Ghitza, O. (2017). Acoustic-driven delta rhythms as prosodic markers. Language, Cognition and Neuroscience, 32(5), 545-561. https://doi.org/10.1080/23273798.2016.1232419

Giraud, A-L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511-517. https://doi.org/10.1038/nn.3063 PMid:22426255 PMCid:PMC4461038

Grabe, E., & Low, E. L. (2002). Durational variability in speech and rhythm class hypothesis. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 515-543). Berlin & New York: Mouton de Gruyter. https://doi.org/10.1515/9783110197105.515

He, L. (2012). Syllabic intensity variations as quantification of speech rhythm: Evidence from both L1 and L2. In Q. Ma, H. Ding & D. Hirst (Eds.), Proceedings of Speech Prosody 2012 (pp. 466-469). Shanghai, China.

He, L. (2018). Development of speech rhythm in first language: The role of syllable intensity variability. Journal of the Acoustical Society of America, 143(6), EL463-EL467. https://doi.org/10.1121/1.5042083 PMid:29960429

He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23(2), 243-273. https://doi.org/10.1558/ijsll.v23i2.30345

Huang, T., & Erickson, D. (2019). Articulation of English "prominence" by L1 (English) and L2 (French) speaker. In S. Calhoun, P. Escudero, M. Tabain & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS-19), paper 134, Melbourne, Australia.

Jones, D. (1922). An Outline of English Phonetics. New York: G. E. Stechert & Co.

Lancia, L., Krasovitsky, G., & Stuntebeck, F. (2019). Coordinative patterns underlying cross-linguistic rhythmic differences. Journal of Phonetics, 72, 66-80. https://doi.org/10.1016/j.wocn.2018.08.004

Lee, C. S., & Todd, N. P. M. (2004). Towards an auditory account of speech rhythm: Application of a model of the auditory "primal sketch" to two multi-language corpora. Cognition, 93(3), 225-254. https://doi.org/10.1016/j.cognition.2003.10.012 PMid:15178378

Leong, V., Stone, M. A., Turner, R. E., & Goswami, U. (2014). A role for amplitude modulation phase relationships in speech rhythm perception. Journal of the Acoustical Society of America, 136(1), 366-381. https://doi.org/10.1121/1.4883366 PMid:24993221

Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8(2), 249-336.

Lloyd James, A. (1940). Speech Signals in Telephony. London: Sir I. Pitman.

Lykartsis, A., & Lerch, A. (2015). Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. In P. Svensson & U. Kristiansen (Eds.), Proceedings of the 18th International Conference on Digital Audio Effects (DAFx), paper 42. Trondheim, Norway: Department of Music and Department of Electronics and Telecommunications. Norwegian University of Science and Technology.

MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4), 499-511. https://doi.org/10.1017/S0140525X98001265 PMid:10097020

Morrill, R. J., Paukner, A., Ferrari, P. F., & Ghazanfar, A. A. (2012). Monkey lipsmacking develops like the human speech rhythm. Developmental Science, 15(4), 557-568. https://doi.org/10.1111/j.1467-7687.2012.01149.x PMid:22709404 PMCid:PMC3383808

Nespor, M., & Vogel, I. (1986). Prosodic Phonology. Dordrecht: Foris.

O'Dell, M. L., & Nieminen, T. (1999). Coupled oscillator model of speech rhythm. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS-14), pp. 1075-1078. San Francisco, California: University of California.

Park, H., Kayser, C., Thut, G., & Gross, J. (2016). Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility. eLife, 5, e14521. https://doi.org/10.7554/eLife.14521.018

Pike, K. L. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press.

Poeppel, D., & M. Assaneo, M. F. (2020). Speech rhythm and their neural foundations. Nature Reviews Neuroscience, 21(6), 322-334. https://doi.org/10.1038/s41583-020-0304-4 PMid:32376899

Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X

Richmond, K. (2009). Preliminary inversion mapping results with a new EMA corpus. In Proceedings of the 10th Annual Conference of the International Speech Communication Association - INTERSPEECH 2009 (pp. 2835-2838). Brighton, UK.: ISCA Archive, http://www.isca-speech.org/archive/interspeech_2009. https://doi.org/10.21437/Interspeech.2009-724

Richmond, K., Hoole, P., & King, S. (2011). Announcing the electromagnetic articulography (Day 1) subset of the mngu0 articulatory corpus. In P. Cosi, R. De Mori, G. Di Fabbrizio, & R. Pieraccini (Eds.), Proceedings of the 12th Annual Conference of the International Speech Communication Association - INTERSPEECH 2011 (pp. 1505-1508). Florence, Italy: ISCA Archive, http://www.isca-speech.org/archive/interspeech_2011 https://doi.org/10.21437/Interspeech.2011-316

Roach, P. (1982). On the distinction between "stress-timed" and "syllable-timed" languages. In D. Crystal (Ed.), Linguistic Controversies: Essays in Linguistic Theory and Practice in Honour of F. R. Palmer (pp. 73-79). London: Edwards Arnold.

Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic Inquiry, 11(3), 563-605.

Strauß, A., & Schwartz, J-L. (2017). The syllable in the light of motor skills and neural oscillations. Language, Cognition and Neuroscience, 32(5), 562-569. https://doi.org/10.1080/23273798.2016.1253852

Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134(1), 628-639. https://doi.org/10.1121/1.4807565 PMid:23862837

Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. Journal of the Acoustical Society of America, 124(2), EL34-EL39. https://doi.org/10.1121/1.2947626 PMid:18681499 PMCid:PMC5570052

Wenk, B. J., & Wioland, F. (1982). Is French really syllable-timed? Journal of Phonetics, 10(2), 193-216. https://doi.org/10.1016/S0095-4470(19)30957-X

Wrench, A. (1999). MOCHA MultiCHannel Articulatory database: English (MOCHA-TIMIT). http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html (accessed 25 December 2018).