Toward a unified theory of voice production and perception
DOI:
https://doi.org/10.3989/loquens.2014.009Keywords:
voice quality, voice production, modeling, synthesis, acousticsAbstract
At present, two important questions about voice remain unanswered: When voice quality changes, what physiological alteration caused this change, and if a change to the voice production system occurs, what change in perceived quality can be expected? We argue that these questions can only be answered by an integrated model of voice linking production and perception, and we describe steps towards the development of such a model. Preliminary evidence in support of this approach is also presented. We conclude that development of such a model should be a priority for scientists interested in voice, to explain what physical condition(s) might underlie a given voice quality, or what voice quality might result from a specific physical configuration.
Downloads
References
Andics, A., McQueen, J. M., Petersson, K. M., Gál, V., Rudas, G., & Vidnyánszky, Z. (2010). Neural mechanisms for voice recognition. NeuroImage, 52, 1528–1540. http://dx.doi.org/10.1016/j.neuroimage.2010.05.048 PMid:20553895
Andruski, J., & Ratliff, M. (2000). Phonation types in production of phonological tone: The case of Green Mong. Journal of the International Phonetic Association, 30, 37–61. http://dx.doi.org/10.1017/S0025100300006654
Berke, G., Mendelsohn, A. H., Howard, N. S., & Zhang, Z. (2013). Neuromuscular induced phonation in a human ex vivo perfused larynx preparation. Journal of the Acoustical Society of America, 133, EL114–EL117. http://dx.doi.org/10.1121/1.4776776 PMid:23363190 PMCid:PMC3562273
Berry, D. A., Herzel, H., Titze, I. R., & Krischer, K. (1994). Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. Journal of the Acoustical Society of America, 95, 3595–3604. http://dx.doi.org/10.1121/1.409875
Blankenship, B. (2002). The timing of nonmodal phonation in vowels. Journal of Phonetics, 30, 163–191. http://dx.doi.org/10.1006/jpho.2001.0155
Buder, E.H. (2000). Acoustic analysis of voice quality: A tabulation of algorithms 1902-1990. In R.D. Kent & M.J. Ball (Eds.), Voice quality measurement (pp. 119–244). San Diego, CA: Singular.
de Krom, G. (1993). A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. Journal of Speech and Hearing Research, 36, 254–266. PMid:8487518
Denes, P. B., and Pinson, E. N. (1993). The speech chain (2nd ed.). New York, NY: WH Freeman.
DiCanio, C. T. (2009). The phonetics of register in Takhian Thong Chong. Journal of the International Phonetic Association, 39, 162–188. http://dx.doi.org/10.1017/S0025100309003879
Esposito, C. M. (2012). An acoustic and electroglottographic study of White Hmong phonation. Journal of Phonetics, 40, 466–476. http://dx.doi.org/10.1016/j.wocn.2012.02.007
Fant, G. (1995). The LF model revisited. Transformations and frequency domain analysis. STL-QPSR, 36(2–3), 119–156.
Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. STL-QPSR, 26(4), 1–13.
Fischer-Jørgensen, E. (1967). Phonetic analysis of breathy (murmured) vowels in Gujarati. Indian Linguistics, 28, 71–139.
Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of models for the glottal source waveform. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 11), 1605–1608.
Garellek, M., & Keating, P. (2011). The acoustic consequences of phonation and tone interactions in Jalapa Mazatec. Journal of the International Phonetic Association, 41, 185–205. http://dx.doi.org/10.1017/S0025100311000193
Garellek, M., Keating, P., Esposito, C., & Kreiman, J. (2013). Voice quality and tone identification in White Hmong. Journal of the Acoustical Society of America, 133, 1087–1089. http://dx.doi.org/10.1121/1.4773259 PMid:23363123 PMCid:PMC3574099
Granqvist, S. (2003). The visual sort and rate method for perceptual evaluation in listening tests. Logopedics Phoniatrics Vocology, 28, 109–116. http://dx.doi.org/10.1080/14015430310015255
Guenther, F. H. (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43–53. http://dx.doi.org/10.1007/BF00206237 PMid:7880914
Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America, 101, 466–481. http://dx.doi.org/10.1121/1.417991
Hanson, H. M., & Chuang, E. S. (1999). Glottal characteristics of male speakers: Acoustic correlates and comparison with female data. Journal of the Acoustical Society of America, 106, 1064–1077. http://dx.doi.org/10.1121/1.427116
Hillenbrand, J., & Houde, R. A. (1996). Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39, 311–321. PMid:8729919
Ishizaka, K., & Flanagan, J. L. (1972). Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell System Technical Journal, 51, 1233–1268. http://dx.doi.org/10.1002/j.1538-7305.1972.tb02651.x
Kreiman, J., Antoñanzas-Barroso, N., & Gerratt, B. R. (2010). Integrated software for analysis and synthesis of voice quality. Behavior Research Methods, 42, 1030–1041. http://dx.doi.org/10.3758/BRM.42.4.1030 PMid:21139170 PMCid:PMC3719850
Kreiman, J., Gabelman, B., & Gerratt, B. R. (2003). Perception of vocal tremor. Journal of Speech, Language, & Hearing Research, 46, 203–214. http://dx.doi.org/10.1044/1092-4388(2003/016)
Kreiman, J., Garellek, M., & Esposito, C. (2011). Perceptual importance of the voice source spectrum from H2 to 2 kHz. Journal of the Acoustical Society of America, 130, 2570. http://dx.doi.org/10.1121/1.3655295
Kreiman, J., Garellek, M., Samlan, R. A., and Gerratt, B. R. (2014). Perceptual sensitivity to a model of the voice source spectrum. Manuscript in preparation.
Kreiman, J., & Gerratt, B. R. (2005). Perception of aperiodicity in pathological voice. Journal of the Acoustical Society of America, 117, 2201–2211. http://dx.doi.org/10.1121/1.1858351
Kreiman, J., & Gerratt, B. R. (2010). Perceptual sensitivity to first harmonic amplitude in the voice source. Journal of the Acoustical Society of America, 128, 2085–2089. http://dx.doi.org/10.1121/1.3478784 PMid:20968379 PMCid:PMC2981120
Kreiman, J., & Gerratt, B. R. (2011). Modeling overall voice quality with a small set of acoustic parameters. Journal of the Acoustical Society of America, 129, 2529. http://dx.doi.org/10.1121/1.3588381
Kreiman, J., & Gerratt, B. R. (2012). Perceptual interaction of the harmonic source and noise in voice. Journal of the Acoustical Society of America, 131, 492–500. http://dx.doi.org/10.1121/1.3665997 PMid:22280610 PMCid:PMC3283904
Kreiman, J., Gerratt, B. R., & Antoñanzas-Barroso, N. (2007a). Measures of glottal source spectrum. Journal of Speech and Hearing Research, 50, 595–610. http://dx.doi.org/10.1044/1092-4388(2007/042)
Kreiman, J., Gerratt, B. R., & Berke, G. S. (1994). The multidimensional nature of pathologic vocal quality. Journal of the Acoustical Society of America, 96, 1291–1302. http://dx.doi.org/10.1121/1.410277
Kreiman, J., Gerratt, B. R., & Ito, M. (2007b). When and why listeners disagree in voice quality assessment tasks. Journal of the Acoustic Society of America, 122, 2354–2364. http://dx.doi.org/10.1121/1.2770547 PMid:17902870
Kreiman, J., Shue, Y.-L., Chen, G., Iseli, M., Gerratt, B. R., Neubauer, J., & Alwan, A. (2012). Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation. Journal of the Acoustical Society of America, 132, 2625–2632. http://dx.doi.org/10.1121/1.4747007 PMid:23039455 PMCid:PMC3477193
Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies. An interdisciplinary approach to voice production and perception. Malden, MA: Wiley-Blackwell. http://dx.doi.org/10.1002/9781444395068
Latinus, M., McAleer, P., Bestelmeyer, P. E. G., & Belin, P. (2013). Norm-based coding of voice identity in human auditory cortex. Current Biology, 23, 1075–1080. http://dx.doi.org/10.1016/j.cub.2013.04.055 PMid:23707425 PMCid:PMC3690478
Lavner, Y., Rosenhouse, J., & Gath, I. (2001). The prototype model in speaker identification by human listeners. International Journal of Speech Technology, 4, 63–74. http://dx.doi.org/10.1023/A:1009656816383
Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America, 49, 467–478. http://dx.doi.org/10.1121/1.1912375
Li, X., & Pastore, R. E. (1995). Perceptual constancy of a global spectral property: Spectral slope discrimination. Journal of the Acoustical Society of America, 98, 1956–68. http://dx.doi.org/10.1121/1.413315
Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., & Corthals, P. (2009). Acoustic measurement of overall voice quality: A meta-analysis. Journal of the Acoustical Society of America, 126, 2619–2634. http://dx.doi.org/10.1121/1.3224706 PMid:19894840
Melara, R. D., & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch, and loudness. Perception and Psychophysics, 48, 169–178. http://dx.doi.org/10.3758/BF03207084 PMid:2385491
Mendelsohn, A. H., & Zhang, Z. (2011). Phonation threshold pressure and onset frequency in a two-layer physical model of the vocal folds. Journal of the Acoustical Society of America, 130, 2961–2968. http://dx.doi.org/10.1121/1.3644913 PMid:22087924 PMCid:PMC3259665
Ni Chasaide, A., & Gobl, C. (1997). Voice source variation. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 427–461). Oxford, UK: Blackwell.
Roy, N., Barkmeier-Kraemer, J., Eadie, T., Sivasankar, M. P., Mehta, D., Paul, D., and Hillman, R. (2013). Evidence based clinical voice assessment: A systematic review. American Journal of Speech-Language Pathology, 22, 212–226. http://dx.doi.org/10.1044/1058-0360(2012/12-0014)
Samlan, R. A., & Story, B. H. (2011). Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. Journal of Speech, Language, & Hearing Research, 54, 1267–1283. http://dx.doi.org/10.1044/1092-4388(2011/10-0195)
Samlan, R. A., Story, B. H., & Bunton, K. (2013). Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling. Journal of Speech, Language, & Hearing Research, 56, 1209–1223. http://dx.doi.org/10.1044/1092-4388(2012/12-0194)
Schweinberger, S. R., Herholz, A., & Stief, V. (1997). Auditory long-term memory: Repetition priming of voice recognition. Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology, 50, 498–517. http://dx.doi.org/10.1080/713755724
Steinecke, I., & Herzel, H. (1995). Bifurcations in an asymmetric vocal fold model. Journal of the Acoustical Society of America, 97, 1874–1884. http://dx.doi.org/10.1121/1.412061
Story, B. H., & Titze, I. R. (1995). Voice simulation with a body-cover model of the vocal folds. Journal of the Acoustical Society of America, 97, 1249–1260. http://dx.doi.org/10.1121/1.412234
Titze, I. R. (1994). Principles of voice production. Englewood Cliffs, NJ: Prentice Hall.
Titze, I. R., & Talkin, D. T. (1979). A theoretical study of the effects of various laryngeal configurations on the acoustics of phonation. Journal of the Acoustical Society of America, 66, 60–74. http://dx.doi.org/10.1121/1.382973
Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: Patterns and parameters. Part I: Recognition of backward voices. Journal of Phonetics, 13, 19–38.
Van Lancker, D., Kreiman, J., & Wickens, T. D. (1985). Familiar voice recognition: Patterns and parameters. Part II: Recognition of rate-altered voices. Journal of Phonetics, 13, 39–52.
Xue, Q., Mittal, R., Zheng, X., & Bielamowicz, S. (2012). Computational modeling of phonatory dynamics in a tubular three dimensional model of the human larynx. Journal of the Acoustical Society of America, 132, 1602–1613. http://dx.doi.org/10.1121/1.4740485 PMid:22978889 PMCid:PMC3460983
Zhang, Z., Kreiman, J., Gerratt, B. R., & Garellek, M. (2013). Acoustic and perceptual effects of changes in body layer stiffness in symmetric and asymmetric vocal fold models. Journal of the Acoustical Society of America, 133, 453–462. http://dx.doi.org/10.1121/1.4770235 PMid:23297917 PMCid:PMC3548838
Zhang, Z., Neubauer, J., & Berry, D. A. (2006). The influence of subglottal acoustics in laboratory models of phonation. Journal of the Acoustical Society of America, 120, 1558–1569. http://dx.doi.org/10.1121/1.2225682
Zhang, Z., Neubauer, J., & Berry, D. A. (2007). Physical mechanisms of phonation onset: A linear stability analysis of an aeroelastic continuum model of phonation. Journal of the Acoustical Society of America, 122, 2279–2295. http://dx.doi.org/10.1121/1.2773949 PMid:17902864
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Consejo Superior de Investigaciones Científicas (CSIC)

This work is licensed under a Creative Commons Attribution 4.0 International License.
© CSIC. Manuscripts published in both the print and online versions of this journal are the property of the Consejo Superior de Investigaciones Científicas, and quoting this source is a requirement for any partial or full reproduction.
All contents of this electronic edition, except where otherwise noted, are distributed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You may read the basic information and the legal text of the licence. The indication of the CC BY 4.0 licence must be expressly stated in this way when necessary.
Self-archiving in repositories, personal webpages or similar, of any version other than the final version of the work produced by the publisher, is not allowed.