Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope
DOI:
https://doi.org/10.3989/loquens.2020.074Keywords:
speech rhythm, spectral coherence, temporal envelope, jaw displacementAbstract
Lower modulation rates in the temporal envelope (ENV) of the acoustic signal are believed to be the rhythmic backbone in speech, facilitating speech comprehension in terms of neuronal entrainments at δ- and θ-rates (these rates are comparable to the foot- and syllable-rates phonetically). The jaw plays the role of a carrier articulator regulating mouth opening in a quasi-cyclical way, which correspond to the low-frequency modulations as a physical consequence. This paper describes a method to examine the joint roles of jaw oscillation and ENV in realizing speech rhythm using spectral coherence. Relative powers in the frequency bands corresponding to the δ-and θ-oscillations in the coherence (respectively notated as %δ and %θ) were quantified as one possible way of revealing the amount of concomitant foot- and syllable-level rhythmicities carried by both acoustic and articulatory domains. Two English corpora (mngu0 and MOCHA-TIMIT) were used for the proof of concept. %δ and %θ were regressed on utterance duration for an initial analysis. Results showed that the degrees of foot- and syllable-sized rhythmicities are different and are contingent upon the utterance length.
Downloads
References
Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press.
Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. In B. Bel & I. Marlien (Eds.), Proceedings of Speech Prosody 2002 (pp. 163-166). Aix-en-Provence, France: Laboratoire Parole et Langage, SProSIG.
Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. https://doi.org/10.18637/jss.v067.i01
Bertrán, A. P. (1999). Prosodic typology: On the dichotomy between stress-timed and syllable-timed languages. Language Design, 2, 103-131.
Boucher, V. J., Gilbert, A. C., & Jemel, B. (2019). The role of low-frequency neural oscillations in speech processing: Revising delta entrainment. Journal of Cognitive Neuroscience, 31(8), 1205-1215. https://doi.org/10.1162/jocn_a_01410 PMid:30990387
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436. https://doi.org/10.1371/journal.pcbi.1000436 PMid:19609344 PMCid:PMC2700967
Cichocki, W., Selouani, S.-A., & Perreault, Y. (2014). Measuring rhythm in dialects of New Brunswick French: Is there a role for intensity? Canadian Acoustics · Acoustique Canadienne, 42(3), 90-91.
Cohen, M. X. (2017). Matlab for Brain and Cognitive Scientists. Cambridge, MA: MIT Press.
Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2), 145-171. https://doi.org/10.1006/jpho.1998.0070
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51-62. https://doi.org/10.1016/S0095-4470(19)30776-4
Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for ΔC. In P. Karnowski & I. Szigeti (Eds.), Sprache und Sprachverarbeitung - Language and language-processing (Linguistik International 15, pp. 231-241). Frankfurt a/M: Peter Lang.
Dellwo, V. (2009). Choosing the right rate normalization method for measurements of speech rhythm. In S. Schmid, M. Schwarzenbach & D. Studer (Eds.), La dimensione temporale del parlato: Atti del 5° Convegno Nazionale AISV 2009 (pp. 13-32). Torriana: EDK Editore.
Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage, 85(2), 761-768. https://doi.org/10.1016/j.neuroimage.2013.06.035 PMid:23791839 PMCid:PMC3839250
Erickson, D., & Kawahara, S. (2016). Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistics Vanguard, 2(1), 20150025. https://doi.org/10.1515/lingvan-2015-0025
Erickson, D., Suemitsu, A., Shibuya, Y., & Tiede, M. (2012). Metrical structure and production of English rhythm. Phonetica, 69(3), 180-190. https://doi.org/10.1159/000342417 PMid:23258465
Eriksson, A. (1991). Aspects of Swedish Speech Rhythm (Gothenburg monographs in linguistics 9). Gothenburg: University of Gothenburg Dissertation.
Fuchs, R. (2016). Speech Rhythm in Varieties of English. Singapore: Springer. https://doi.org/10.1007/978-3-662-47818-9
Ghazanfar, A. A., Chandrasekaran, C., & Morrill, R. J. (2010). Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31(10), 1807-1817. https://doi.org/10.1111/j.1460-9568.2010.07209.x PMid:20584185 PMCid:PMC2898901
Ghitza, O. (2017). Acoustic-driven delta rhythms as prosodic markers. Language, Cognition and Neuroscience, 32(5), 545-561. https://doi.org/10.1080/23273798.2016.1232419
Giraud, A-L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511-517. https://doi.org/10.1038/nn.3063 PMid:22426255 PMCid:PMC4461038
Grabe, E., & Low, E. L. (2002). Durational variability in speech and rhythm class hypothesis. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 515-543). Berlin & New York: Mouton de Gruyter. https://doi.org/10.1515/9783110197105.515
He, L. (2012). Syllabic intensity variations as quantification of speech rhythm: Evidence from both L1 and L2. In Q. Ma, H. Ding & D. Hirst (Eds.), Proceedings of Speech Prosody 2012 (pp. 466-469). Shanghai, China.
He, L. (2018). Development of speech rhythm in first language: The role of syllable intensity variability. Journal of the Acoustical Society of America, 143(6), EL463-EL467. https://doi.org/10.1121/1.5042083 PMid:29960429
He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23(2), 243-273. https://doi.org/10.1558/ijsll.v23i2.30345
Huang, T., & Erickson, D. (2019). Articulation of English "prominence" by L1 (English) and L2 (French) speaker. In S. Calhoun, P. Escudero, M. Tabain & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS-19), paper 134, Melbourne, Australia.
Jones, D. (1922). An Outline of English Phonetics. New York: G. E. Stechert & Co.
Lancia, L., Krasovitsky, G., & Stuntebeck, F. (2019). Coordinative patterns underlying cross-linguistic rhythmic differences. Journal of Phonetics, 72, 66-80. https://doi.org/10.1016/j.wocn.2018.08.004
Lee, C. S., & Todd, N. P. M. (2004). Towards an auditory account of speech rhythm: Application of a model of the auditory "primal sketch" to two multi-language corpora. Cognition, 93(3), 225-254. https://doi.org/10.1016/j.cognition.2003.10.012 PMid:15178378
Leong, V., Stone, M. A., Turner, R. E., & Goswami, U. (2014). A role for amplitude modulation phase relationships in speech rhythm perception. Journal of the Acoustical Society of America, 136(1), 366-381. https://doi.org/10.1121/1.4883366 PMid:24993221
Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8(2), 249-336.
Lloyd James, A. (1940). Speech Signals in Telephony. London: Sir I. Pitman.
Lykartsis, A., & Lerch, A. (2015). Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. In P. Svensson & U. Kristiansen (Eds.), Proceedings of the 18th International Conference on Digital Audio Effects (DAFx), paper 42. Trondheim, Norway: Department of Music and Department of Electronics and Telecommunications. Norwegian University of Science and Technology.
MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4), 499-511. https://doi.org/10.1017/S0140525X98001265 PMid:10097020
Morrill, R. J., Paukner, A., Ferrari, P. F., & Ghazanfar, A. A. (2012). Monkey lipsmacking develops like the human speech rhythm. Developmental Science, 15(4), 557-568. https://doi.org/10.1111/j.1467-7687.2012.01149.x PMid:22709404 PMCid:PMC3383808
Nespor, M., & Vogel, I. (1986). Prosodic Phonology. Dordrecht: Foris.
O'Dell, M. L., & Nieminen, T. (1999). Coupled oscillator model of speech rhythm. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS-14), pp. 1075-1078. San Francisco, California: University of California.
Park, H., Kayser, C., Thut, G., & Gross, J. (2016). Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility. eLife, 5, e14521. https://doi.org/10.7554/eLife.14521.018
Pike, K. L. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press.
Poeppel, D., & M. Assaneo, M. F. (2020). Speech rhythm and their neural foundations. Nature Reviews Neuroscience, 21(6), 322-334. https://doi.org/10.1038/s41583-020-0304-4 PMid:32376899
Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
Richmond, K. (2009). Preliminary inversion mapping results with a new EMA corpus. In Proceedings of the 10th Annual Conference of the International Speech Communication Association - INTERSPEECH 2009 (pp. 2835-2838). Brighton, UK.: ISCA Archive, http://www.isca-speech.org/archive/interspeech_2009. https://doi.org/10.21437/Interspeech.2009-724
Richmond, K., Hoole, P., & King, S. (2011). Announcing the electromagnetic articulography (Day 1) subset of the mngu0 articulatory corpus. In P. Cosi, R. De Mori, G. Di Fabbrizio, & R. Pieraccini (Eds.), Proceedings of the 12th Annual Conference of the International Speech Communication Association - INTERSPEECH 2011 (pp. 1505-1508). Florence, Italy: ISCA Archive, http://www.isca-speech.org/archive/interspeech_2011 https://doi.org/10.21437/Interspeech.2011-316
Roach, P. (1982). On the distinction between "stress-timed" and "syllable-timed" languages. In D. Crystal (Ed.), Linguistic Controversies: Essays in Linguistic Theory and Practice in Honour of F. R. Palmer (pp. 73-79). London: Edwards Arnold.
Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic Inquiry, 11(3), 563-605.
Strauß, A., & Schwartz, J-L. (2017). The syllable in the light of motor skills and neural oscillations. Language, Cognition and Neuroscience, 32(5), 562-569. https://doi.org/10.1080/23273798.2016.1253852
Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134(1), 628-639. https://doi.org/10.1121/1.4807565 PMid:23862837
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. Journal of the Acoustical Society of America, 124(2), EL34-EL39. https://doi.org/10.1121/1.2947626 PMid:18681499 PMCid:PMC5570052
Wenk, B. J., & Wioland, F. (1982). Is French really syllable-timed? Journal of Phonetics, 10(2), 193-216. https://doi.org/10.1016/S0095-4470(19)30957-X
Wrench, A. (1999). MOCHA MultiCHannel Articulatory database: English (MOCHA-TIMIT). http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html (accessed 25 December 2018).
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Consejo Superior de Investigaciones Científicas (CSIC)

This work is licensed under a Creative Commons Attribution 4.0 International License.
© CSIC. Manuscripts published in both the print and online versions of this journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.
All contents of this electronic edition, except where otherwise noted, are distributed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You may read the basic information and the legal text of the licence. The indication of the CC BY 4.0 licence must be expressly stated in this way when necessary.
Self-archiving in repositories, personal webpages or similar, of any version other than the final version of the work produced by the publisher, is not allowed.
Funding data
Universität Zürich
Grant numbers FK-19-069;FK-20-078
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Grant numbers P2ZHP1_178109