Toward a unified theory of voice production and perception

Jody Kreiman; Bruce R. Gerratt; Marc Garellek; Robin Samlan; Zhaoyan Zhang

doi:10.3989/loquens.2014.009

Authors

Jody Kreiman Bureau of Glottal Affairs, Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, CA USA
Bruce R. Gerratt Bureau of Glottal Affairs, Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, CA USA
Marc Garellek Department of Linguistics, UC San Diego, San Diego, CA USA
Robin Samlan Department of Speech, Language, & Hearing Sciences, University of Arizona, Tucson, AZ USA
Zhaoyan Zhang Bureau of Glottal Affairs, Department of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, CA USA

DOI:

https://doi.org/10.3989/loquens.2014.009

Keywords:

voice quality, voice production, modeling, synthesis, acoustics

Abstract

At present, two important questions about voice remain unanswered: When voice quality changes, what physiological alteration caused this change, and if a change to the voice production system occurs, what change in perceived quality can be expected? We argue that these questions can only be answered by an integrated model of voice linking production and perception, and we describe steps towards the development of such a model. Preliminary evidence in support of this approach is also presented. We conclude that development of such a model should be a priority for scientists interested in voice, to explain what physical condition(s) might underlie a given voice quality, or what voice quality might result from a specific physical configuration.

Downloads

Download data is not yet available.

References

Andics, A., McQueen, J. M., Petersson, K. M., Gál, V., Rudas, G., & Vidnyánszky, Z. (2010). Neural mechanisms for voice recognition. NeuroImage, 52, 1528–1540. http://dx.doi.org/10.1016/j.neuroimage.2010.05.048 PMid:20553895

Andruski, J., & Ratliff, M. (2000). Phonation types in production of phonological tone: The case of Green Mong. Journal of the International Phonetic Association, 30, 37–61. http://dx.doi.org/10.1017/S0025100300006654

Berke, G., Mendelsohn, A. H., Howard, N. S., & Zhang, Z. (2013). Neuromuscular induced phonation in a human ex vivo perfused larynx preparation. Journal of the Acoustical Society of America, 133, EL114–EL117. http://dx.doi.org/10.1121/1.4776776 PMid:23363190 PMCid:PMC3562273

Berry, D. A., Herzel, H., Titze, I. R., & Krischer, K. (1994). Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. Journal of the Acoustical Society of America, 95, 3595–3604. http://dx.doi.org/10.1121/1.409875

Blankenship, B. (2002). The timing of nonmodal phonation in vowels. Journal of Phonetics, 30, 163–191. http://dx.doi.org/10.1006/jpho.2001.0155

Buder, E.H. (2000). Acoustic analysis of voice quality: A tabulation of algorithms 1902-1990. In R.D. Kent & M.J. Ball (Eds.), Voice quality measurement (pp. 119–244). San Diego, CA: Singular.

de Krom, G. (1993). A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. Journal of Speech and Hearing Research, 36, 254–266. PMid:8487518

Denes, P. B., and Pinson, E. N. (1993). The speech chain (2nd ed.). New York, NY: WH Freeman.

DiCanio, C. T. (2009). The phonetics of register in Takhian Thong Chong. Journal of the International Phonetic Association, 39, 162–188. http://dx.doi.org/10.1017/S0025100309003879

Esposito, C. M. (2012). An acoustic and electroglottographic study of White Hmong phonation. Journal of Phonetics, 40, 466–476. http://dx.doi.org/10.1016/j.wocn.2012.02.007

Fant, G. (1995). The LF model revisited. Transformations and frequency domain analysis. STL-QPSR, 36(2–3), 119–156.

Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. STL-QPSR, 26(4), 1–13.

Fischer-Jørgensen, E. (1967). Phonetic analysis of breathy (murmured) vowels in Gujarati. Indian Linguistics, 28, 71–139.

Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of models for the glottal source waveform. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 11), 1605–1608.

Garellek, M., & Keating, P. (2011). The acoustic consequences of phonation and tone interactions in Jalapa Mazatec. Journal of the International Phonetic Association, 41, 185–205. http://dx.doi.org/10.1017/S0025100311000193

Garellek, M., Keating, P., Esposito, C., & Kreiman, J. (2013). Voice quality and tone identification in White Hmong. Journal of the Acoustical Society of America, 133, 1087–1089. http://dx.doi.org/10.1121/1.4773259 PMid:23363123 PMCid:PMC3574099

Granqvist, S. (2003). The visual sort and rate method for perceptual evaluation in listening tests. Logopedics Phoniatrics Vocology, 28, 109–116. http://dx.doi.org/10.1080/14015430310015255

Guenther, F. H. (1994). A neural network model of speech acquisition and motor equivalent speech production. Biological Cybernetics, 72, 43–53. http://dx.doi.org/10.1007/BF00206237 PMid:7880914

Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America, 101, 466–481. http://dx.doi.org/10.1121/1.417991

Hanson, H. M., & Chuang, E. S. (1999). Glottal characteristics of male speakers: Acoustic correlates and comparison with female data. Journal of the Acoustical Society of America, 106, 1064–1077. http://dx.doi.org/10.1121/1.427116

Hillenbrand, J., & Houde, R. A. (1996). Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39, 311–321. PMid:8729919

Ishizaka, K., & Flanagan, J. L. (1972). Synthesis of voiced sounds from a two-mass model of the vocal cords. Bell System Technical Journal, 51, 1233–1268. http://dx.doi.org/10.1002/j.1538-7305.1972.tb02651.x

Kreiman, J., Antoñanzas-Barroso, N., & Gerratt, B. R. (2010). Integrated software for analysis and synthesis of voice quality. Behavior Research Methods, 42, 1030–1041. http://dx.doi.org/10.3758/BRM.42.4.1030 PMid:21139170 PMCid:PMC3719850

Kreiman, J., Gabelman, B., & Gerratt, B. R. (2003). Perception of vocal tremor. Journal of Speech, Language, & Hearing Research, 46, 203–214. http://dx.doi.org/10.1044/1092-4388(2003/016)

Kreiman, J., Garellek, M., & Esposito, C. (2011). Perceptual importance of the voice source spectrum from H2 to 2 kHz. Journal of the Acoustical Society of America, 130, 2570. http://dx.doi.org/10.1121/1.3655295

Kreiman, J., Garellek, M., Samlan, R. A., and Gerratt, B. R. (2014). Perceptual sensitivity to a model of the voice source spectrum. Manuscript in preparation.

Kreiman, J., & Gerratt, B. R. (2005). Perception of aperiodicity in pathological voice. Journal of the Acoustical Society of America, 117, 2201–2211. http://dx.doi.org/10.1121/1.1858351

Kreiman, J., & Gerratt, B. R. (2010). Perceptual sensitivity to first harmonic amplitude in the voice source. Journal of the Acoustical Society of America, 128, 2085–2089. http://dx.doi.org/10.1121/1.3478784 PMid:20968379 PMCid:PMC2981120

Kreiman, J., & Gerratt, B. R. (2011). Modeling overall voice quality with a small set of acoustic parameters. Journal of the Acoustical Society of America, 129, 2529. http://dx.doi.org/10.1121/1.3588381

Kreiman, J., & Gerratt, B. R. (2012). Perceptual interaction of the harmonic source and noise in voice. Journal of the Acoustical Society of America, 131, 492–500. http://dx.doi.org/10.1121/1.3665997 PMid:22280610 PMCid:PMC3283904

Kreiman, J., Gerratt, B. R., & Antoñanzas-Barroso, N. (2007a). Measures of glottal source spectrum. Journal of Speech and Hearing Research, 50, 595–610. http://dx.doi.org/10.1044/1092-4388(2007/042)

Kreiman, J., Gerratt, B. R., & Berke, G. S. (1994). The multidimensional nature of pathologic vocal quality. Journal of the Acoustical Society of America, 96, 1291–1302. http://dx.doi.org/10.1121/1.410277

Kreiman, J., Gerratt, B. R., & Ito, M. (2007b). When and why listeners disagree in voice quality assessment tasks. Journal of the Acoustic Society of America, 122, 2354–2364. http://dx.doi.org/10.1121/1.2770547 PMid:17902870

Kreiman, J., Shue, Y.-L., Chen, G., Iseli, M., Gerratt, B. R., Neubauer, J., & Alwan, A. (2012). Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation. Journal of the Acoustical Society of America, 132, 2625–2632. http://dx.doi.org/10.1121/1.4747007 PMid:23039455 PMCid:PMC3477193

Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies. An interdisciplinary approach to voice production and perception. Malden, MA: Wiley-Blackwell. http://dx.doi.org/10.1002/9781444395068

Latinus, M., McAleer, P., Bestelmeyer, P. E. G., & Belin, P. (2013). Norm-based coding of voice identity in human auditory cortex. Current Biology, 23, 1075–1080. http://dx.doi.org/10.1016/j.cub.2013.04.055 PMid:23707425 PMCid:PMC3690478

Lavner, Y., Rosenhouse, J., & Gath, I. (2001). The prototype model in speaker identification by human listeners. International Journal of Speech Technology, 4, 63–74. http://dx.doi.org/10.1023/A:1009656816383

Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America, 49, 467–478. http://dx.doi.org/10.1121/1.1912375

Li, X., & Pastore, R. E. (1995). Perceptual constancy of a global spectral property: Spectral slope discrimination. Journal of the Acoustical Society of America, 98, 1956–68. http://dx.doi.org/10.1121/1.413315

Maryn, Y., Roy, N., De Bodt, M., Van Cauwenberge, P., & Corthals, P. (2009). Acoustic measurement of overall voice quality: A meta-analysis. Journal of the Acoustical Society of America, 126, 2619–2634. http://dx.doi.org/10.1121/1.3224706 PMid:19894840

Melara, R. D., & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch, and loudness. Perception and Psychophysics, 48, 169–178. http://dx.doi.org/10.3758/BF03207084 PMid:2385491

Mendelsohn, A. H., & Zhang, Z. (2011). Phonation threshold pressure and onset frequency in a two-layer physical model of the vocal folds. Journal of the Acoustical Society of America, 130, 2961–2968. http://dx.doi.org/10.1121/1.3644913 PMid:22087924 PMCid:PMC3259665

Ni Chasaide, A., & Gobl, C. (1997). Voice source variation. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 427–461). Oxford, UK: Blackwell.

Roy, N., Barkmeier-Kraemer, J., Eadie, T., Sivasankar, M. P., Mehta, D., Paul, D., and Hillman, R. (2013). Evidence based clinical voice assessment: A systematic review. American Journal of Speech-Language Pathology, 22, 212–226. http://dx.doi.org/10.1044/1058-0360(2012/12-0014)

Samlan, R. A., & Story, B. H. (2011). Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. Journal of Speech, Language, & Hearing Research, 54, 1267–1283. http://dx.doi.org/10.1044/1092-4388(2011/10-0195)

Samlan, R. A., Story, B. H., & Bunton, K. (2013). Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling. Journal of Speech, Language, & Hearing Research, 56, 1209–1223. http://dx.doi.org/10.1044/1092-4388(2012/12-0194)

Schweinberger, S. R., Herholz, A., & Stief, V. (1997). Auditory long-term memory: Repetition priming of voice recognition. Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology, 50, 498–517. http://dx.doi.org/10.1080/713755724

Steinecke, I., & Herzel, H. (1995). Bifurcations in an asymmetric vocal fold model. Journal of the Acoustical Society of America, 97, 1874–1884. http://dx.doi.org/10.1121/1.412061

Story, B. H., & Titze, I. R. (1995). Voice simulation with a body-cover model of the vocal folds. Journal of the Acoustical Society of America, 97, 1249–1260. http://dx.doi.org/10.1121/1.412234

Titze, I. R. (1994). Principles of voice production. Englewood Cliffs, NJ: Prentice Hall.

Titze, I. R., & Talkin, D. T. (1979). A theoretical study of the effects of various laryngeal configurations on the acoustics of phonation. Journal of the Acoustical Society of America, 66, 60–74. http://dx.doi.org/10.1121/1.382973

Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: Patterns and parameters. Part I: Recognition of backward voices. Journal of Phonetics, 13, 19–38.

Van Lancker, D., Kreiman, J., & Wickens, T. D. (1985). Familiar voice recognition: Patterns and parameters. Part II: Recognition of rate-altered voices. Journal of Phonetics, 13, 39–52.

Xue, Q., Mittal, R., Zheng, X., & Bielamowicz, S. (2012). Computational modeling of phonatory dynamics in a tubular three dimensional model of the human larynx. Journal of the Acoustical Society of America, 132, 1602–1613. http://dx.doi.org/10.1121/1.4740485 PMid:22978889 PMCid:PMC3460983

Zhang, Z., Kreiman, J., Gerratt, B. R., & Garellek, M. (2013). Acoustic and perceptual effects of changes in body layer stiffness in symmetric and asymmetric vocal fold models. Journal of the Acoustical Society of America, 133, 453–462. http://dx.doi.org/10.1121/1.4770235 PMid:23297917 PMCid:PMC3548838

Zhang, Z., Neubauer, J., & Berry, D. A. (2006). The influence of subglottal acoustics in laboratory models of phonation. Journal of the Acoustical Society of America, 120, 1558–1569. http://dx.doi.org/10.1121/1.2225682

Zhang, Z., Neubauer, J., & Berry, D. A. (2007). Physical mechanisms of phonation onset: A linear stability analysis of an aeroelastic continuum model of phonation. Journal of the Acoustical Society of America, 122, 2279–2295. http://dx.doi.org/10.1121/1.2773949 PMid:17902864