Caracterización del ritmo del habla usando la coherencia espectral entre el desplazamiento de la mandíbula y la envolvente temporal del habla
DOI:
https://doi.org/10.3989/loquens.2020.074Palabras clave:
ritmo del habla, coherencia espectral, envolvente temporal, desplazamiento de la mandíbulaResumen
Se piensa que las frecuencias de modulación más bajas en la envolvente temporal (ENV) de la señal acústica constituyen la columna vertebral rítmica del habla, facilitando su comprensión a nivel de enlaces neuronales en términos de los rangos δ y θ (estos rangos son comparables fonéticamente a los rangos de pie métrico y silábicos). La mandíbula funciona como un articulador que regula la abertura de la boca de una manera cuasi cíclica, lo que se corresponde, como una consecuencia física, con las modulaciones de baja frecuencia. Este artículo describe un método para examinar el papel conjunto de la oscilación de la mandíbula y de la envolvente ENV en la producción del ritmo del habla utilizando la coherencia espectral. Las potencias relativas en las bandas de frecuencia correspondientes a las oscilaciones δ y θ en la coherencia (indicadas respectivamente como %δ y %θ) se cuantificaron como un posible modo de revelar la cantidad de ritmicidad concomitante a nivel de pie métrico y de sílaba que los dominios acústicos y articulatorios comportan. Para someter a prueba esta idea, en este estudio se analizaron dos corpus en inglés (mngu0 y MOCHA-TIMIT). Para un primer análisis, se realizó una regresión de %δ y %θ en función de la duración del enunciado. Los resultados mostraron que los grados de ritmicidad del pie y de la sílaba son diferentes y dependen de la longitud del enunciado.
Descargas
Citas
Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh: Edinburgh University Press.
Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. In B. Bel & I. Marlien (Eds.), Proceedings of Speech Prosody 2002 (pp. 163-166). Aix-en-Provence, France: Laboratoire Parole et Langage, SProSIG.
Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. https://doi.org/10.18637/jss.v067.i01
Bertrán, A. P. (1999). Prosodic typology: On the dichotomy between stress-timed and syllable-timed languages. Language Design, 2, 103-131.
Boucher, V. J., Gilbert, A. C., & Jemel, B. (2019). The role of low-frequency neural oscillations in speech processing: Revising delta entrainment. Journal of Cognitive Neuroscience, 31(8), 1205-1215. https://doi.org/10.1162/jocn_a_01410 PMid:30990387
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436. https://doi.org/10.1371/journal.pcbi.1000436 PMid:19609344 PMCid:PMC2700967
Cichocki, W., Selouani, S.-A., & Perreault, Y. (2014). Measuring rhythm in dialects of New Brunswick French: Is there a role for intensity? Canadian Acoustics · Acoustique Canadienne, 42(3), 90-91.
Cohen, M. X. (2017). Matlab for Brain and Cognitive Scientists. Cambridge, MA: MIT Press.
Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2), 145-171. https://doi.org/10.1006/jpho.1998.0070
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51-62. https://doi.org/10.1016/S0095-4470(19)30776-4
Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for ΔC. In P. Karnowski & I. Szigeti (Eds.), Sprache und Sprachverarbeitung - Language and language-processing (Linguistik International 15, pp. 231-241). Frankfurt a/M: Peter Lang.
Dellwo, V. (2009). Choosing the right rate normalization method for measurements of speech rhythm. In S. Schmid, M. Schwarzenbach & D. Studer (Eds.), La dimensione temporale del parlato: Atti del 5° Convegno Nazionale AISV 2009 (pp. 13-32). Torriana: EDK Editore.
Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). Acoustic landmarks drive delta-theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage, 85(2), 761-768. https://doi.org/10.1016/j.neuroimage.2013.06.035 PMid:23791839 PMCid:PMC3839250
Erickson, D., & Kawahara, S. (2016). Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistics Vanguard, 2(1), 20150025. https://doi.org/10.1515/lingvan-2015-0025
Erickson, D., Suemitsu, A., Shibuya, Y., & Tiede, M. (2012). Metrical structure and production of English rhythm. Phonetica, 69(3), 180-190. https://doi.org/10.1159/000342417 PMid:23258465
Eriksson, A. (1991). Aspects of Swedish Speech Rhythm (Gothenburg monographs in linguistics 9). Gothenburg: University of Gothenburg Dissertation.
Fuchs, R. (2016). Speech Rhythm in Varieties of English. Singapore: Springer. https://doi.org/10.1007/978-3-662-47818-9
Ghazanfar, A. A., Chandrasekaran, C., & Morrill, R. J. (2010). Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience, 31(10), 1807-1817. https://doi.org/10.1111/j.1460-9568.2010.07209.x PMid:20584185 PMCid:PMC2898901
Ghitza, O. (2017). Acoustic-driven delta rhythms as prosodic markers. Language, Cognition and Neuroscience, 32(5), 545-561. https://doi.org/10.1080/23273798.2016.1232419
Giraud, A-L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511-517. https://doi.org/10.1038/nn.3063 PMid:22426255 PMCid:PMC4461038
Grabe, E., & Low, E. L. (2002). Durational variability in speech and rhythm class hypothesis. In C. Gussenhoven & N. Warner (Eds.), Laboratory Phonology 7 (pp. 515-543). Berlin & New York: Mouton de Gruyter. https://doi.org/10.1515/9783110197105.515
He, L. (2012). Syllabic intensity variations as quantification of speech rhythm: Evidence from both L1 and L2. In Q. Ma, H. Ding & D. Hirst (Eds.), Proceedings of Speech Prosody 2012 (pp. 466-469). Shanghai, China.
He, L. (2018). Development of speech rhythm in first language: The role of syllable intensity variability. Journal of the Acoustical Society of America, 143(6), EL463-EL467. https://doi.org/10.1121/1.5042083 PMid:29960429
He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23(2), 243-273. https://doi.org/10.1558/ijsll.v23i2.30345
Huang, T., & Erickson, D. (2019). Articulation of English "prominence" by L1 (English) and L2 (French) speaker. In S. Calhoun, P. Escudero, M. Tabain & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS-19), paper 134, Melbourne, Australia.
Jones, D. (1922). An Outline of English Phonetics. New York: G. E. Stechert & Co.
Lancia, L., Krasovitsky, G., & Stuntebeck, F. (2019). Coordinative patterns underlying cross-linguistic rhythmic differences. Journal of Phonetics, 72, 66-80. https://doi.org/10.1016/j.wocn.2018.08.004
Lee, C. S., & Todd, N. P. M. (2004). Towards an auditory account of speech rhythm: Application of a model of the auditory "primal sketch" to two multi-language corpora. Cognition, 93(3), 225-254. https://doi.org/10.1016/j.cognition.2003.10.012 PMid:15178378
Leong, V., Stone, M. A., Turner, R. E., & Goswami, U. (2014). A role for amplitude modulation phase relationships in speech rhythm perception. Journal of the Acoustical Society of America, 136(1), 366-381. https://doi.org/10.1121/1.4883366 PMid:24993221
Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8(2), 249-336.
Lloyd James, A. (1940). Speech Signals in Telephony. London: Sir I. Pitman.
Lykartsis, A., & Lerch, A. (2015). Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. In P. Svensson & U. Kristiansen (Eds.), Proceedings of the 18th International Conference on Digital Audio Effects (DAFx), paper 42. Trondheim, Norway: Department of Music and Department of Electronics and Telecommunications. Norwegian University of Science and Technology.
MacNeilage, P. F. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4), 499-511. https://doi.org/10.1017/S0140525X98001265 PMid:10097020
Morrill, R. J., Paukner, A., Ferrari, P. F., & Ghazanfar, A. A. (2012). Monkey lipsmacking develops like the human speech rhythm. Developmental Science, 15(4), 557-568. https://doi.org/10.1111/j.1467-7687.2012.01149.x PMid:22709404 PMCid:PMC3383808
Nespor, M., & Vogel, I. (1986). Prosodic Phonology. Dordrecht: Foris.
O'Dell, M. L., & Nieminen, T. (1999). Coupled oscillator model of speech rhythm. In J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS-14), pp. 1075-1078. San Francisco, California: University of California.
Park, H., Kayser, C., Thut, G., & Gross, J. (2016). Lip movements entrain the observers' low-frequency brain oscillations to facilitate speech intelligibility. eLife, 5, e14521. https://doi.org/10.7554/eLife.14521.018
Pike, K. L. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press.
Poeppel, D., & M. Assaneo, M. F. (2020). Speech rhythm and their neural foundations. Nature Reviews Neuroscience, 21(6), 322-334. https://doi.org/10.1038/s41583-020-0304-4 PMid:32376899
Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
Richmond, K. (2009). Preliminary inversion mapping results with a new EMA corpus. In Proceedings of the 10th Annual Conference of the International Speech Communication Association - INTERSPEECH 2009 (pp. 2835-2838). Brighton, UK.: ISCA Archive, http://www.isca-speech.org/archive/interspeech_2009. https://doi.org/10.21437/Interspeech.2009-724
Richmond, K., Hoole, P., & King, S. (2011). Announcing the electromagnetic articulography (Day 1) subset of the mngu0 articulatory corpus. In P. Cosi, R. De Mori, G. Di Fabbrizio, & R. Pieraccini (Eds.), Proceedings of the 12th Annual Conference of the International Speech Communication Association - INTERSPEECH 2011 (pp. 1505-1508). Florence, Italy: ISCA Archive, http://www.isca-speech.org/archive/interspeech_2011 https://doi.org/10.21437/Interspeech.2011-316
Roach, P. (1982). On the distinction between "stress-timed" and "syllable-timed" languages. In D. Crystal (Ed.), Linguistic Controversies: Essays in Linguistic Theory and Practice in Honour of F. R. Palmer (pp. 73-79). London: Edwards Arnold.
Selkirk, E. O. (1980). The role of prosodic categories in English word stress. Linguistic Inquiry, 11(3), 563-605.
Strauß, A., & Schwartz, J-L. (2017). The syllable in the light of motor skills and neural oscillations. Language, Cognition and Neuroscience, 32(5), 562-569. https://doi.org/10.1080/23273798.2016.1253852
Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134(1), 628-639. https://doi.org/10.1121/1.4807565 PMid:23862837
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. Journal of the Acoustical Society of America, 124(2), EL34-EL39. https://doi.org/10.1121/1.2947626 PMid:18681499 PMCid:PMC5570052
Wenk, B. J., & Wioland, F. (1982). Is French really syllable-timed? Journal of Phonetics, 10(2), 193-216. https://doi.org/10.1016/S0095-4470(19)30957-X
Wrench, A. (1999). MOCHA MultiCHannel Articulatory database: English (MOCHA-TIMIT). http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html (accessed 25 December 2018).
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2021 Consejo Superior de Investigaciones Científicas (CSIC)

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
© CSIC. Los originales publicados en las ediciones impresa y electrónica de esta Revista son propiedad del Consejo Superior de Investigaciones Científicas, siendo necesario citar la procedencia en cualquier reproducción parcial o total.
Salvo indicación contraria, todos los contenidos de la edición electrónica se distribuyen bajo una licencia de uso y distribución “Creative Commons Reconocimiento 4.0 Internacional ” (CC BY 4.0). Consulte la versión informativa y el texto legal de la licencia. Esta circunstancia ha de hacerse constar expresamente de esta forma cuando sea necesario.
No se autoriza el depósito en repositorios, páginas web personales o similares de cualquier otra versión distinta a la publicada por el editor.
Datos de los fondos
Universität Zürich
Números de la subvención FK-19-069;FK-20-078
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Números de la subvención P2ZHP1_178109