Automatic speaker recognition of spanish siblings: (monozygotic and dizygotic) twins and non-twin brothers

Eugenia San Segundo; Hermann Künzel

doi:10.3989/loquens.2015.021

Authors

Eugenia San Segundo Department of Linguistics and Language Science, University of York
Hermann Künzel Department of Phonetics, University of Marburg

DOI:

https://doi.org/10.3989/loquens.2015.021

Keywords:

forensic phonetics, twins, siblings, automatic speaker recognition, Spanish

Abstract

The performance of the automatic speaker recognition (ASR) system Batvox^TM (Version 4.1) has been tested with a male population of 24 monozygotic (MZ) twins, 10 dizygotic (DZ) twins, 8 non-twin siblings and 12 unrelated speakers (aged 18–52 with Standard Peninsular Spanish as their mother tongue). Since the cepstral features in which this ASR system is based depend largely on anatomical–physiological foundations, we hypothesized that such features ought to be gene-dependent. Therefore, higher similarity values should be found in MZ twins (100% shared genes) than in DZ twins, in brothers (B) or in a reference population of unrelated speakers (US). Results corroborated the expected decreasing scale MZ > DZ > B > US since the similarity coefficients yielded by the automatic system for these speakers decreased exactly in the same direction as the kinship degree of the four speaker groups diminishes. This suggests that the system features are to a great extent genetically conditioned and that they are hence useful and robust for comparing speech samples of known and unknown origin, as found in legal cases. Furthermore, the 9.9% EER (Equal Error Rate) obtained when testing MZ pairs lies around the same value (11% EER) found in Künzel (2010) with German twins.

Downloads

Download data is not yet available.

References

Agnitio Voice trics (2013). Batvox 4.1 Basic User Manual [Computer software].

Ariyaeeinia, A., Morrison, C., Malegaonkar, A., & Black, S. (2008). A test of the effectiveness of speaker verification for differentiating between identical twins. Science & Justice, 48(4), 182–186. http://dx.doi.org/10.1016/j.scijus.2008.02.002 PMid:19192680

Bimbot, F., Bonastre, J.?F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., . . . & Reynolds, D. A. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Advances in Signal Processing, 4, 1–22. http://dx.doi.org/10.1155/s1110865704310024

Brümmer, N., & du Preez, J. (2006). Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2–3), 230-275. http://dx.doi.org/10.1016/j.csl.2005.08.001

Campbell, W. M., Campbell, J. P., Reynolds, D. A, Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210-229. http://dx.doi.org/10.1016/j.csl.2005.06.003

Charlet, D., & Lecha, V. P. (2007). Voice biometrics within the family: Trust, privacy and personalisation. In J. Filipe, H. Coelhas, & M. Saramago, (Eds.): E-business and telecommunication networks: Second International Conference, ICETE 2005, Vol. 3 (pp. 93–100). Berlin: Springer. http://dx.doi.org/10.1007/978-3-540-75993-5_8

Debruyne, F., Decoster, W., Van Gijsel, A., & Vercammen, J. (2002). Speaking fundamental frequency in monozygotic and dizygotic twins. Journal of Voice, 16(4), 466–471. http://dx.doi.org/10.1016/S0892-1997(02)00121-2

Del Abril Alonso, Á., Ambrosio Flores, E., de Blas Calleja, M. d. R., Caminero Gómez, Á., García Lecumberri, C., & de Pablo González, J. M. (2009). Fundamentos de psicobiología. Madrid: Sanz y Torres.

Doddington, G., Liggett, W., Martin, A., Przybocki, M., & Reynolds, D. (1998). SHEEP, GOATS, LAMBS and WOLVES: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. Proceedings of the International Conference on Spoken Language (ICSLP '98), paper 0608.

Drygajlo, A. (2007). Forensic automatic speaker recognition [Exploratory DSP]. IEEE Signal Processing Magazine, 24(2), 132–135. http://dx.doi.org/10.1109/MSP.2007.323278

Feiser, H. S. (2009). Acoustic similarities and differences in the voices of same-sex siblings. Paper presented at the 18th Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), Cambridge, UK. PMid:19633830

Felson, J. (2014). What can we learn from twin studies? A comprehensive evaluation of the equal environments assumption. Social Science Research, 43, 184–199. http://dx.doi.org/10.1016/j.ssresearch.2013.10.004 PMid:24267761

Forrai, G., & Gordos, G. (1983). A new acoustic method for the discrimination of monozygotic and dizygotic twins. Acta paediatrica Academiae Scientiarum Hungarica, 24(4), 315–322.

Foulkes, P., & French, J. P. (2012). Forensic speaker comparison: A linguistic–acoustic perspective. In P. Tiersma & L. M. Solan (Eds.), Oxford handbook of language and law, 557–572. Oxford: Oxford University Press. http://dx.doi.org/10.1093/oxfordhb/9780199572120.013.0041

Galton, F. (1875). The history of twins, as a criterion of the relative powers of nature and nurture (Rev. ed.). Journal of the Anthropological Institute of Great Britain and Ireland, 5, 391–406.

Giles, H., Coupland, J., & Coupland, N. (1991). Contexts of accommodation: Developments in applied sociolinguistics. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511663673

Gómez-Vilda, P., Fernández?Baillo, R., Nieto, A., Díaz, F., Fernández?Camacho, F. J., Rodellar, V., . . . & Martínez, R. (2007). Evaluation of voice pathology based on the estimation of vocal fold biomechanical parameters. Journal of Voice, 21(4), 450–476. http://dx.doi.org/10.1016/j.jvoice.2006.01.008 PMid:16549321

Gonzalez-Rodriguez, J., Fierrez-Aguilar, J., Ortega-Garcia, J. (2003). Forensic identification reporting using automatic speaker recognition systems. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), 2, 93–96. http://dx.doi.org/10.1109/icassp.2003.1202302

Gonzalez-Rodriguez, J., Rose, P., Ramos, D., Toledano, D. T., & Ortega-Garcia, J. (2007). Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2104–2115. http://dx.doi.org/10.1109/TASL.2007.902747

Homayounpour, M. M., & Chollet, G. (1995). Discrimination of voices of twins and siblings for speaker verification. In Proceedings of the 4th European Conference on Speech Communication and Technology (EUROSPEECH 1995), 345–348.

Jain, A. K., Prabhakar, S., & Pankanti, S. (2002). On the similarity of identical twin fingerprints. Pattern Recognition, 35(11), 2653–2663. http://dx.doi.org/10.1016/S0031-3203(01)00218-7

Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass, 2(4), 671–711. http://dx.doi.org/10.1111/j.1749-818X.2008.00066.x

Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2005). Factor analysis simplified. Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), 1, 637–640. http://dx.doi.org/10.1109/ICASSP.2005.1415194

Kim, K. (2010). Automatic speaker identification of Korean male twins. Paper presented at the 19th Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), Trier.

Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40. http://dx.doi.org/10.1016/j.specom.2009.08.009

Kong, A. W. K., Zhang, D., & Lu, G. (2006). A study of identical twins' palmprints for personal verification. Pattern Recognition, 39(11), 2149–2156. http://dx.doi.org/10.1016/j.patcog.2006.04.035

Künzel, H. J. (1994). Current approaches to forensic speaker recognition. In Proceedings of the ESCA Workshop on Automatic Speaker Recognition, Identification, and Verification, 135–141.

Künzel, H. J. (2010). Automatic speaker recognition of identical twins. International Journal of Speech, Language and the Law, 17(2), 251–277.

Künzel, H. J., & Alexander, P. (2014). Forensic automatic speaker recognition with degraded and enhanced speech. Journal of the Audio Engineering Society, 62(4), 244–253. http://dx.doi.org/10.17743/jaes.2014.0014

Labov, W. (1972). The transformation of experience in the narrative syntax. In W. Labov, Language in the inner city: Studies in the Black English Vernacular (pp. 354–396). Philadelphia, PA: University of Philadelphia Press.

Loakes, D. (2006). A forensic phonetic investigation into the speech patterns of identical and non-identical twins (Doctoral dissertation). University of Melbourne.

Martino, D., Loke, Y. J., Gordon, L., Ollikainen, M., Cruickshank, M. N., Saffery, R., Craig, J. M. (2013). Longitudinal, genomescale analysis of DNA methylation in twins from birth to 18 months of age reveals rapid epigenetic change in early life and pair-specific effects of discordance. Genome Biology, 14(5): R42. http://dx.doi.org/10.1186/gb-2013-14-5-r42 PMid:23697701 PMCid:PMC4054827

Meuwly, D. (2001). Reconnaissance de locuteurs en sciences forensiques: l'apport d'une approche automatique (PhD dissertation). University of Laussane.

Morrison, G. S. (2010). Forensic voice comparison. In I. Freckelton & H. Selby (Eds.), Expert evidence (Chapter 99). Sydney: Thomson Reuters.

Morrison, G. S., & Kinoshita, Y. (2008). Automatic-type calibration of traditionally derived likelihood ratios: Forensic analysis of Australian English /o/ formant trajectories. Proceedings of the 9th INTERSPEECH Conference, 1501–1504.

Nolan, F. (1983). The phonetic bases of speaker recognition. Cambridge: Cambridge University Press.

Nolan, F. (1997). Speaker recognition and forensic phonetics. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 744–767). Oxford: Blackwell.

Nolan, F., & Oh, T. (1996). Identical twins, different voices. International Journal of Speech Language and the Law, 3(1), 39–49. http://dx.doi.org/10.1558/ijsll.v3i1.39

Pardo, J. S. (2006). On phonetic convergence during conversational interaction. The Journal of the Acoustical Society of America, 119(4), 2382–2393. http://dx.doi.org/10.1121/1.2178720 PMid:16642851

Philips, T. (2008). The role of methylation in gene expression, Nature Education, 1(1), 116.

Pickering, M. J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27(2), 169–190. http://dx.doi.org/10.1017/S0140525X04000056 PMid:15595235

Przybocki, M. A., Martin, A. F., & Le, A. N. (2007). NIST speaker recognition evaluations utilizing the Mixer corpora—2004, 2005, 2006. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 1951–1959. http://dx.doi.org/10.1109/TASL.2007.902489

Przybyla, B. D., Horii, Y., & Crawford, M. H. (1992). Vocal fundamental frequency in a twin sample: Looking for a genetic effect. Journal of Voice, 6(3), 261–266. http://dx.doi.org/10.1016/S0892-1997(05)80151-1

Ramos, D. (2007). Forensic evaluation of the evidence using automatic speaker recognition systems (Doctoral dissertation). Universidad Autónoma de Madrid. Retrieved from http://hdl.handle.net/10486/1774.

Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41. http://dx.doi.org/10.1006/dspr.1999.0361

Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83. http://dx.doi.org/10.1109/89.365379

Rose, P. (2002). Forensic speaker identification. London: Taylor & Francis. http://dx.doi.org/10.1201/9780203166369

Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence. Computer Speech & Language, 20(2–3), 159–191. http://dx.doi.org/10.1016/j.csl.2005.07.003

San Segundo, E. (2010a). Parametric representations of the formant trajectories of Spanish vocalic sequences for likelihood-ratio-based forensic voice comparison. The Journal of the Acoustical Society of America, 128(4), 2394. http://dx.doi.org/10.1121/1.3508586

San Segundo, E. (2010b). Variación inter e intralocutor: Parámetros acústicos segmentales que caracterizan fonéticamente a tres hermanos. Interlingü.stica, 21, 352–363.

San Segundo, E. (2012). Glottal source parameters for forensic voice comparison: An approach to voice quality in twins' voices. Paper presented at the 21st Annual Conference of the International Association for Forensic Phonetics and Acoustics (IAFPA), Santander, Spain.

San Segundo, E. (2013a). Guess who is laughing: A perceptual experiment on twin and non-twin siblings' identification. Paper presented at the 31st International Conference AESLA (Asociación Espa-ola de Lingüística Aplicada). San Cristóbal de La Laguna: Universidad de La Laguna.

San Segundo, E. (2013b). A phonetic corpus of Spanish male twins and siblings: Corpus design and forensic application. Procedia– Social and Behavioral Sciences, 95, 59–67. http://dx.doi.org/10.1016/j.sbspro.2013.10.622

San Segundo, E. (2014). Forensic speaker comparison of Spanish twins and non-twin siblings: A phonetic-acoustic analysis of formant trajectories in vocalic sequences, glottal source parameters and cepstral characteristics (PhD thesis). Consejo Superior de Investigaciones Científicas-Universidad Internacional Menéndez Pelayo, Spain.

San Segundo, E. (2015). Forensic speaker comparison of Spanish twins and non-twin siblings: A phonetic-acoustic analysis of formant trajectories in vocalic sequences, glottal source parameters and cepstral characteristics (Thesis abstract). International Journal of Speech Language and the Law, 22(2), 249–253. http://dx.doi.org/10.1558/ijsll.v22i2.28821

San Segundo, E., & G.mez?Vilda, P. (2013). Voice biometrical match of twin and non-twin siblings. In C. Manfredi (Ed.), Models and analysis of vocal emissions for biomedical applications: 8th International Workshop, Firenze, Italy, 2013, (pp. 253–256). Retrieved from http://digital.casalini.it/9788866554707.

San Segundo, E., & G.mez?Vilda, P. (2015). Evaluating the forensic importance of glottal source features through the voice analysis of twins and non-twin siblings, Language and Law/Linguagem e Direito, 1(2), 22–41.

Sataloff, R. T. (1995). Genetics of the voice. Journal of Voice, 9(1), 16–19. http://dx.doi.org/10.1016/S0892-1997(05)80218-8

Scheffer, N., Bonastre, J.?F., Ghio, A., & Teston, B. (2004). Gémellité et reconnaissance automatique du locuteur. Actes des XXV Journées d'Étude sur la Parole (JEP), 445–448.

Segal, N. L. (1993). Implications of twin research for legal issues involving young twins. Law and Human Behavior, 17(1), 43–58. http://dx.doi.org/10.1007/BF01044536

Srihari, S., Huang, C., & Srinivasan, H. (2008). On the discriminability of the handwriting of twins. Journal of Forensic Sciences, 53(2), 430–446. http://dx.doi.org/10.1111/j.1556-4029.2008.00682.x PMid:18366576

Stromswold, K. (2006). Why aren't identical twins linguistically identical? Genetic, prenatal and postnatal factors. Cognition, 101(2), 333–384. http://dx.doi.org/10.1016/j.cognition.2006.04.007 PMid:16797523

Tomblin, J. B., & Buckwalter, P. P. (1998). Heritability of poor language achievement among twins. Journal of Speech, Language, and Hearing Research, 41, 188–189. http://dx.doi.org/10.1044/jslhr.4101.188

Trouvain, J., & Truong, K. P. (2012). Convergence of laughter in conversational speech: Effects of quantity, temporal alignment and imitation. Paper presented at the International Symposium on Imitation and Convergence in Speech, Aix-en-Provence, France. PMCid:PMC3382493

van Leeuwen, D. A., & Brümmer, N. (2007). An introduction to application- independent evaluation of speaker recognition systems. In C. Müller (Ed.), Speaker classification I: Fundamentals, features, and methods (pp. 330–353). Heidelberg: Springer–Verlag. http://dx.doi.org/10.1007/978-3-540-74200-5_19 PMid:17498209

Van Lierde, K. M., Vinck, B., De Ley, S., Clement, G., & Van Cauwenberge, P. (2005). Genetics of vocal quality characteristics in monozygotic twins: A multiparameter approach. Journal of Voice, 19(4), 511–518. http://dx.doi.org/10.1016/j.jvoice.2004.10.005 PMid:16301097

Weirich, M., & Lancia, L. (2011). Perceived auditory similarity and its acoustic correlates in twins and unrelated speakers. In Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS 17-Hong Kong), 2118–2121.

Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical Society of America, 51(6B), 2044–2056 http://dx.doi.org/10.1121/1.1913065