Loquens 9(1-2)
Diciembre 2022, e089
ISSN-L: 2386-2637, eISSN: 2386-2637
https://doi.org/10.3989/loquens.2022.e089

The role of the input frequency in L1 Spanish phonological acquisition. A corpus-based study

El papel de la frecuencia del input en la adquisición de la fonología del español como L1. Estudio basado en corpus

Marta Garrote Salazar

Universidad Autónoma de Madrid

https://orcid.org/0000-0002-5566-9073

ABSTRACT

This study presents the phonological system exhibited by children (n=59) aged 3;0 to 6;0 and focuses on the role of input frequency. Using a spontaneous child speech corpus of Spanish (CHIEDE) as a data source, as well as computational processing techniques -including an automatic phonological transcriber-, data relating to the phonological level was retrieved. This resulted in a phonological inventory of Spanish-speaking children, ordered by frequency of use, which may serve as a model for research on typical and atypical child language development. Additionally, a study was carried out on the stability of the participants’ phonological systems by calculating the variability that the different age groups displayed, and outcomes were compared with other similar corpora. Results obtained from the comparison of the phonological inventory of children and adults show that there is a relationship between frequency of use in adult speech and the order of acquisition of phonemes.

Key Words: 
L1 acquisition; phonological development; input frequency; corpus.
RESUMEN

Este estudio presenta el sistema fonológico que muestran 59 participantes de 3;0 a 6;0 años y el papel que juega la frecuencia del input. Usando como fuente un corpus de habla espontánea (CHIEDE) y técnicas de procesamiento computacional -que incluyen un transcriptor fonológico automático- se extrajeron los datos relativos al nivel fonológico, dando como resultado un inventario fonológico de niños hablantes de español. Este inventario, ordenado por frecuencia de uso, puede servir de modelo para la investigación en desarrollo infantil típico y atípico. Además, se realizó un estudio sobre la estabilidad del sistema fonológico de los participantes, calculando la variabilidad entre los diferentes grupos etarios y comparando resultados con otros corpus similares. Los resultados obtenidos de la comparación del inventario infantil con el adulto muestran una clara relación entre la frecuencia de uso del habla adulta y el orden de adquisición de los fonemas.

Palabras clave: 
Adquisición de primera lengua; desarrollo fonológico; frecuencia del input; corpus.

Enviado: 01/07/2022 ; Aceptado: 26/10/2022; Publicado en línea: 14/06/2023

Citation / Cómo citar este artículo: Marta Garrote Salazar (2022). The role of the input frequency in L1 Spanish phonological acquisition. A corpus-based study. Loquens9(1-2), e089. https://doi.org/10.3989/loquens.2022.e089.

CONTENT

1. INTRODUCTION

 

First language (L1 henceforth) acquisition and development have drawn the attention of researchers for centuries. However, new technology development from the last few decades has entailed a qualitative change in research (Dolgova & Tyler, 2019Dolgova, N., & Tyler, A. (2019). Applications of Usage-Based Approaches to Language Teaching. In X. Gao (Ed.), Second Handbook of English Language Teaching (pp. 939-961). Springer. https://doi.org/10.1007/978-3-030-02899-2_49
; Ellis, 2017Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
; Kern et al., 2014Kern, S., Gayraud, F., & Chenu, F. (2014). The role of input in early first language morphosyntactic development. Language, Interaction and Acquisition, 5(1), 1-18. https://doi.org/10.1075/lia.5.1.00int
; MacWhinney, 1996MacWhinney, B. (1996). Computational analysis of interactions. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 152-178). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00006.x
). The gradual introduction of new technological tools and the adoption of common methodologies and procedures made the design of the first corpora of child language possible. In those corpora, hundreds of speech recordings from different-aged children were transcribed, providing researchers with an invaluable database for the study of child language (Ellis, 2017Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
). Currently, the international corpus of reference is CHILDES1 https://childes.talkbank.org/ (MacWhinney & Snow, 1985MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of Child Language, 12(2), 271-295. https://doi.org/10.1017/S0305000900006449), a multilingual child language corpus, in which we can find samples of Spanish language, some of which were used to corroborate results from this study. And particularly regarding the phonological treatment of corpora, the development of the software PHON2 https://www.phon.ca/phon-manual/index.html (Hedlund & Rose, 2020Hedlund, G., & Rose, Y. (2020). Phon 3.1 [Computer Software]. https://phon.ca.
) meant a landmark in the study of child language.

Within the field of L1 acquisition research, the description of the phonological development involves four basic concerns (Grunwell, 1981Grunwell, P. (1981). The development of phonology: A descriptive profile. First language, 3, 161-191. https://doi.org/10.1177/014272378100200601
): the great variation from one individual to another; the extension and gradual regularisation of the child’s pronunciation system, characterised by unsystematicity; the difficulty determining the starting point of the phonological development; and the need to consider both the input and output in the process of description. Grunwell (1981, p. 167)Grunwell, P. (1981). The development of phonology: A descriptive profile. First language, 3, 161-191. https://doi.org/10.1177/014272378100200601
disapproved of the fact that “studies are to discover when children achieve the correct pronunciation of the sounds of their language”. She considered that the question about when the sounds of speech are learnt was ill posed, due to factors such as the wide range of individual variation, or the fact that a child does not acquire each phoneme separately. Therefore, research on phonological acquisition must not focus so much on the precise moment at which a child acquires a certain phoneme, but on the search for patterns by describing large samples of speech language. “We need models of usage and its effects upon acquisition” (Ellis, 2017, p. 48Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
).

This subject matter of phonological acquisition has been largely aimed at improving research on language disorders. From a detailed study of a child’s normal linguistic development and the establishment of patterns in language behaviour it is possible to detect atypical phenomena in the development of an individual. According to Ingram (1976)Ingram, D. (1976). Phonological disability in children. Edwards Arnold.
, the knowledge about patterns of typical language development gives us the clues for the treatment of pathologies. And corpus linguistics plays a pertinent role in this regard, since corpora are a huge source for the analysis of natural language in the elaboration of, for instance, what Acosta and Ramos (1998)Acosta, V., & Ramos, V. (1998). Estudio de los desórdenes del habla infantil desde la perspectiva de los procesos fonológicos. Revista de Logopedia, Fonatría y Audiología, 18, 124-142. https://doi.org/10.1016/S0214-4603(98)75683-9
demanded: a phonological inventory; or to study the role of input in the acquisition process examining child-directed speech (CDS) in natural contexts.

The present study is based on CHIEDE (Garrote, 2010Garrote, M. (2010). Los corpus de habla infantil. Metodología y análisis. Servicio de publicaciones de la Universidad Autónoma de Madrid.
), a cross-sectional corpus in which n=59 children aged 3;0-6;0 participated. The corpus was recorded, transcribed and, subsequently, tagged by means of automatic processing techniques (phonological and morphosyntactic tagging software), and then manually checked to correct possible tagging errors. This methodology facilitates the retrieval of linguistically annotated data (parts of speech, morphological, and phonological information) to quantify linguistic features. It is descriptive work, following an observational method based on performance, on external empirical data, and not on competence and experimentation.

This paper presents a phonological study of L1 Spanish children with the aim to show the phonological development displayed by the participants. Taking into account the participants’ age, our purpose was not to establish the order of acquisition of phonemes, but to carry out a description of the typical phonological development of Spanish-speaking children from 3;0 to 6;0 years old, based on the frequency of occurrence of phonemes (providing a phonological inventory), and to highlight the role of the input frequency as a facilitator to acquire phonemes (even those traditionally considered more complex). Three questions are considered: (1) Is the phonological system completely acquired at 3;0? (2) Is 4;0 a turning point in the acquisition process as many linguistic studies claim (Bosch, 1983Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
; Díez-Itza & Martínez López, 2004Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.
; Maratsos, 1974Maratsos, M. P. (1974). Children who get worse at understanding the passive: A replication of Bever. Journal of Psycholinguistic Research, 3(1), 65-74. https://doi.org/10.1007/BF01067222
)? And finally, and most importantly, (3) To what extent is the input frequency relevant in this process? The goal is to clarify these questions through the revision of some of the most significant theories and research, and the analysis of data from different corpora.

2. PREVIOUS RESEARCH

 

Morphology and syntax are the linguistic levels which have been addressed to the most extent by research on L1 acquisition. Studies carried out on child language have mainly focused on the acquisition of the lexical and grammatical structure, to the detriment of phonology, semantics, or pragmatics. According to Vihman et al. (2009, p. 164)Vihman, M. M., DePaolis, R. A, & Keren-Portnoy, T. (2009). A dynamic systems approach to babbling and words. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 163-182). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.010
, “The role of phonology in the development of linguistic knowledge is often given short shrift by researchers interested in word learning”. Consequently, phonological studies on acquisition are less frequent (Polo, 2016Polo, N. (2016). La investigación actual sobre el desarrollo de la fonología del español como lengua materna. Lenguas modernas, 47, 137-152.
). Moreover, a vast majority focus on the English language. Though research has been gradually carried out on other languages, it is “heavily biased toward Indo-European languages of Western Europe with the bulk of research still concentrated on English” (Stoll, 2009, p. 89Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 89-104). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.006
).

One of the pioneering works on phonological development was Stampe’s (1969)Stampe, D. (1969). The acquisition of phonetic representation. In R. I. Binnick, A. Davidson, G. Green, & J. L. Morgan (Eds.), Papers from the Fifth Regional Meeting of the Chicago Linguistic Society (pp. 443-454). Chicago Linguistic Society.
, for whom the language acquisition process is based upon an innate mechanism children have in order to simplify adult words. By means of these mechanisms or processes -unstressed syllable deletion, clusters reduction, merging vowels into /a/- the child goes from what Stampe called a “language-innocent state” to the adult production.

Later, Ingram (1976)Ingram, D. (1976). Phonological disability in children. Edwards Arnold.
adopted Stampe’s theory for clinical phonology research. Following the piagetian stages (Piaget, 1926Piaget, J. (1926). The language and thought of the child. Kegan Paul, Trench & Trubner.
) of cognitive development and their corresponding linguistic periods, Ingram established a parallelism with the phonological level, thus locating the evolution of the different phonemes and phonological skills at distinct stages from the sensorimotor stage (0;0-1;6) to the formal operational stage (12;0-16;0).

However, crosslinguistic studies on acquisition beyond the early period (around one year of age) have proved that it is not possible to establish clear stages of development applicable to every language. For instance, Durgunoğlu and Öney (1999, p. 283)Durgunoğlu, A. Y., & Öney, B. (1999). A cross-linguistic comparison of phonological awareness and word recognition. Reading and Writing: An Interdisciplinary Journal, 11, 281-299. https://doi.org/10.1023/A:1008093232622
examined the “effects of language-specific influences on the development of phonological awareness” and explained how structural phonetic differences among languages mean differences in the child’s development of phonology. In a similar line, Bleses, Basbøll, Lum and Vach (2010)Bleses, D., Basbøll, H., Lum, J., & Vach, W. (2010). Phonology and lexicon in a cross-linguistic perspective: the importance of phonetics - a commentary on Stoel-Gammon’s “Relationships between lexical and phonological development in young children”. Journal of Child Language ,38(01), 61-68. https://doi.org/10.1017/S0305000910000437
set up a ranking of 7 languages based on the complexity of their phonetic systems (vowel/consonant ratio) and concluded that the most complex one was the Danish phonemic system, followed by the Swedish, the Dutch, the French, the English (American), the Galician and the Croatian. Bernhardt and Stemberger (2017)Bernhardt, B. M., & Stemberger, J. P. (2017). Investigating typical and protracted phonological development across languages. In E. Babatsouli, D. Ingram, & N. Müller (Eds.), Crosslinguistic Encounters in Language Acquisition: Typical and Atypical Development (pp. 71-108). Multilingual Matters. https://doi.org/10.21832/9781783099092-008
, comparing typical development with protracted phonological development, showed that in four languages, Mandarin, Arabic, Slovene and European Spanish, the WWM3WWM stands for “whole word match”, that is, the child’s pronunciation equals the adult’s. scores for 4-year-old children were 80-85% (85.4% for European Spanish).

Though differences across languages, McLeod and Crowe (2018)McLeod, S., & Crowe, K. (2018). Children’s Consonant Acquisition in 27 Languages: A Cross-Linguistic Review. American Journal of Speech-Language Pathology, 27(4), 1546-1571. https://doi.org/10.1044/2018_AJSLP-17-0100
, after reviewing 64 studies involving more than 26,000 children and 27 languages concluded that 93% of consonants were correctly produced by 5 years old. In the same line, Stoel-Gammon (2006, p. 646)Stoel-Gammon, C. (2006). Infancy: phonological development. Encyclopaedia of Language & Linguistics (Second Edition), 642-648. https://doi.org/10.1016/B0-08-044854-2/00838-5
stated that “By the age of 3 years, the level of intelligibility increases to 75%, and by age 4, it is 100%”, meaning that, though not adult-like yet, the child phonological system is sufficiently developed to be intelligible.

In Spain, the theories set out first by Stampe and then by Ingram were later introduced by authors such as Bosch (1983)Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
and Díez-Itza (1995)Díez-Itza, E. (1995). Procesos fonológicos en la adquisición del español como lengua materna. In J. M. Ruiz, P. H. Sheerin, & E. González-Cascos (Eds.), Actas del XI Congreso Nacional de Lingüística Aplicada (pp. 225-264). Universidad de Valladolid.
. For both researchers, the phonological acquisition period is placed between approximately one and a half years old and six to seven years old, with an intermediate division around four years old (Bosch, 1983)Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
. This means that one cannot talk about a total control of the complete phonological system until the age of six or seven, when the child masters certain complicated phonemes and their combination in more complex syllables. In spite of that, as mentioned before (Bernhardt and Stemberger, 2017Bernhardt, B. M., & Stemberger, J. P. (2017). Investigating typical and protracted phonological development across languages. In E. Babatsouli, D. Ingram, & N. Müller (Eds.), Crosslinguistic Encounters in Language Acquisition: Typical and Atypical Development (pp. 71-108). Multilingual Matters. https://doi.org/10.21832/9781783099092-008
; Stoel-Gammon, 2006Stoel-Gammon, C. (2006). Infancy: phonological development. Encyclopaedia of Language & Linguistics (Second Edition), 642-648. https://doi.org/10.1016/B0-08-044854-2/00838-5
), by the age of 4 years intelligibility is complete.

Spanish studies have mostly focused on what Díez-Itza and Martínez López (2004)Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.
call periodo temprano ‘early period’, that is, until about three years old. These authors consider necessary to increase research on the periodo tardío ‘late period’, i.e., from three to six years old. They determined three stages in the phonological acquisition: expansión ‘expansion’, the stage until 3;0, characterised by a progressive diminution of phonological processes (such as unstressed syllable deletion, clusters reduction, etc.), after which there would be a standstill; estabilización ‘stabilisation’, from three to four years old and initially defined by a considerable decrease of processes, which increase again at around four years old (showing a U-shape developmental pattern); and resolución ‘resolution’ from the age of five years onwards, when phonological processes are residual. Díez-Itza and Martínez López’s (2004)Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.
intention was to confirm if the age of four clearly becomes a universal milestone of transition towards subsequent periods, as it has been repeatedly assumed by descriptive studies. In fact, at the age of four years children’s language is characterised, from the standpoint of phonology, by an increased speech rate, which means more coarticulation and the lengthening of utterances and conversational turns (Díez-Itza & Martínez López, 2004Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.
). Many scholars agree on a transition point at four years old (Bosch, 1983Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
; Díez-Itza & Martínez López, 2004Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.
) regarding phonological acquisition, but also other linguistic levels. For example, Maratsos (1974)Maratsos, M. P. (1974). Children who get worse at understanding the passive: A replication of Bever. Journal of Psycholinguistic Research, 3(1), 65-74. https://doi.org/10.1007/BF01067222
, analysing the acquisition of the passive structure, concluded that children show a U-shape developmental pattern around four years old, as the rate of passive comprehension decreased in comparison to younger children. Also, Garrote (2010)Garrote, M. (2010). Los corpus de habla infantil. Metodología y análisis. Servicio de publicaciones de la Universidad Autónoma de Madrid.
found that it was around 4;0 that children produced more non-targeted speech as a consequence of rule overgeneralisation errors.

Bosch (1983)Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
, based on studies by Serra (1983)Serra, M. (1983). Normas estadísticas de articulación para la población escolar de 3 a 7 años en el área metropolitana de Barcelona. Revista de Logopedia, Foniatría y Audiología, 3(4), 232-235. https://doi.org/10.1016/S0214-4603(83)75286-1
and Melgar de González (1976)Melgar de González, M. (1976). Cómo detectar al niño con problemas de habla. Trillas.
, summarised the most problematic phonemes during the acquisition process of Spanish: the trill /r/, fricatives such as /s/, /θ/ and /x/, and the voiced plosive /d/. She concludes that the most difficult place of articulation is that located in the dento-alveolar area, where a great number of sounds are differentiated just by the manner of articulation (Bosch, 1983)Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
. López Valero et al. (1989)López Valero, A., Carrillo Hernández, M. R., & Ros Frutos, J. L. (1989). Aportaciones para el estudio del desarrollo del lenguaje infantil en el período comprendido entre los veinticuatro y los treinta meses. Cauce, Revista de Filología y su Didáctica, 12, 145-156.
supported Bosch’s findings concluding that the sounds belatedly acquired in Spanish are /x/, /f/, /r/ and / θ /.

Other authors such as Serra (1983)Serra, M. (1983). Normas estadísticas de articulación para la población escolar de 3 a 7 años en el área metropolitana de Barcelona. Revista de Logopedia, Foniatría y Audiología, 3(4), 232-235. https://doi.org/10.1016/S0214-4603(83)75286-1
established the following order of acquisition: nasals, plosives, fricatives, and, finally, liquids and the alveolar trill.

It is noteworthy to mention here two studies related to the present one, due to the age range (3 to almost 6 years old) and the language (Spanish, though Mexican variety). First, Jiménez (1987)Jimenez, B. C. (1987). Acquisition of Spanish consonants in children aged 3-5 years, 7 months. Language, Speech, and Hearing Services in Schools, 18(4), 357-363. https://doi.org/10.1044/0161-1461.1804.357
found out that, by age 5 years, the 120 children forming the sample showed production problems only with two consonants: /s/ and /r/. Second, Acevedo (1993, p. 11)Acevedo, M. A. (1993). Development of Spanish consonants in preschool children. Communication Disorders Quarterly, 15(2), 9-15. https://doi.org/10.1177/152574019301500202
also tested 120 Mexican children. Results proved that sound “mastery occurred by the 4;0-4;5 age group”, remaining problematic the following consonants: /ɲ/, /g/, /f/, /s/, and /x/. Both studies were based on elicitation tasks, not on spontaneous speech.

Most significant works on the Spanish phonological acquisition, unlike the present study, are focused on the early period and they are crosslinguistic studies (Bosch & Sebastián-Gallés, 2001Bosch, L., & Sebastián-Gallés, N. (2001). Evidence of early language discrimination abilities in infants from bilingual environments. Infancy, 2, 29-49. https://doi.org/10.1207/S15327078IN0201_3
, 2003Bosch, L., & Sebastián-Gallés, N. (2003). Language experience and the perception of a voicing contrast in fricatives: infant and adult data. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the Fifteenth International Congress of Phonetic Sciences (pp. 1987-1990). Universitat Autónoma de Barcelona.
; Bunta & Ingram, 2007Bunta, F., & Ingram, D. (2007). The acquisition of speech rhythm by bilingual Spanish-and English-speaking 4-and 5-year-old children. Journal of Speech, Language, and Hearing Research, 50(4), 999-1014. https://doi.org/10.1044/1092-4388(2007/070)
; Goldstein & Cintrón, 2001Goldstein, B., & Cintrón, P. (2001). An investigation of phonological skills in Puerto-Rican Spanish-speaking 2-year-olds. Clinical Linguistics and Phonetics, 15, 343-361. https://doi.org/10.1080/02699200010017814
; Kehoe & Lleó, 2003Lleó, C. (2003). Prosodic licensing of coda in the acquisition of Spanish. Probus, 15, 257-281. https://doi.org/10.1515/prbs.2003.010
, 2005Kehoe, M., & Lleó, C. (2005). The emergence of language specific rhythm in German-Spanish bilingual children. Arbeiten zur Mehrsprachigkeit: Working Papers in Multilingualism, 58. SFB 538.
; Kehoe, Lleó & Rakow, 2005Kehoe, M., & Lleó, C. (2005). The emergence of language specific rhythm in German-Spanish bilingual children. Arbeiten zur Mehrsprachigkeit: Working Papers in Multilingualism, 58. SFB 538.
; Lleó, 2002Lleó, C. (2002). The role of markedness in the acquisition of complex prosodic structures by German-Spanish Bilinguals. International Journal of Bilingualism, 6, 291-313. https://doi.org/10.1177/13670069020060030501
, 2003Lleó, C. (2003). Prosodic licensing of coda in the acquisition of Spanish. Probus, 15, 257-281. https://doi.org/10.1515/prbs.2003.010
, 2006Lleó, C. (2006). The acquisition of prosodic word structures in Spanish by monolingual and Spanish-German bilingual children. Language and Speech, 49, 205-229. https://doi.org/10.1177/00238309060490020401
). However, the interest here is in knowing how, once the Spanish phonemes are acquired (late period), the children’s phonological system becomes as stable as the adults’ observing the frequency of use.

Taking into account previous research and the above-mentioned claims (Acosta & Ramos, 1998Acosta, V., & Ramos, V. (1998). Estudio de los desórdenes del habla infantil desde la perspectiva de los procesos fonológicos. Revista de Logopedia, Fonatría y Audiología, 18, 124-142. https://doi.org/10.1016/S0214-4603(98)75683-9
; Grunwell, 1981Grunwell, P. (1981). The development of phonology: A descriptive profile. First language, 3, 161-191. https://doi.org/10.1177/014272378100200601
; MacWhinney, 1996MacWhinney, B. (1996). Computational analysis of interactions. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 152-178). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00006.x
, among others), there is a need for a phonological frequency-based analysis of the linguistic performance of children aged 3;0 to 6;0 (late period), using a spontaneous speech corpus as a data source.

2.1. The role of input and frequency

 

Although input is considered by advocates of nativist theories of a Chomskyan nature as irrelevant, citing the Poverty of Stimulus Argument (Chomsky, 1980)Chomsky, N. (1980). Rules and representation. MIT Press.
, later tendencies such as connectionist models (Menn & Stoel-Gammon, 1996Menn, L., & Stoel-Gammon, C. (1996). Phonological development. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 335-359). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00014.x
) give the input a key role in the learning process, considering it the source of empirical knowledge from which children, through statistical processing, acquire language. Indeed, “a number of linguists have recently proposed statistical explanations for patterns of phonological productions” (Rose, 2009, p. 329Rose, Y. (2009). Internal and External Influences on Child Language Productions. In Pellegrino, François, Egidio Marsico, Ioana Chitoran, & Christophe Coupé (Eds.), Approaches to Phonological Complexity (pp. 329-351). Mouton de Gruyter. https://doi.org/10.1515/9783110223958.329
).

In recent years, the cognitive-functional or usage-based model (Tomasello, 2003Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press.
) has posed the emergence of language as a result of use, from which linguistic patterns arise, and then grammatical constructions are consolidated. From a usage-based approach to language acquisition, “children learn linguistic constructions from the conspiracy of experienced exemplars, with abstract syntactic constructions and their associated meanings emerging from the statistical distribution of form-function correspondences in usage” (Ellis, 2017, p. 46Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
).

Zamuner, Gerken and Hammond (2004, p. 1406)Zamuner, T., LouAnn, S., & Gerken y Hammond, M. (2004). Phonotactic probabilities in young children’s speech production. Journal of Child Language 31, 515-36. https://doi.org/10.1017/S0305000904006233
based their research on the Specific Language Grammar Hypothesis (SLGH), which states that “language acquisition is best described with respect to the patterns in the input or ambient language”. Thus, children will acquire first those phonemes which are more frequent in their language.

Studies based on frequency and likelihood of occurrence have shed some light on the process of language acquisition (Ellis, 2017Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
; Polo, 2016Polo, N. (2016). La investigación actual sobre el desarrollo de la fonología del español como lengua materna. Lenguas modernas, 47, 137-152.
; Rose, 2009Rose, Y. (2009). Internal and External Influences on Child Language Productions. In Pellegrino, François, Egidio Marsico, Ioana Chitoran, & Christophe Coupé (Eds.), Approaches to Phonological Complexity (pp. 329-351). Mouton de Gruyter. https://doi.org/10.1515/9783110223958.329
; Zamuner et al., 2004Zamuner, T., LouAnn, S., & Gerken y Hammond, M. (2004). Phonotactic probabilities in young children’s speech production. Journal of Child Language 31, 515-36. https://doi.org/10.1017/S0305000904006233
). For example, Lleó (2003)Lleó, C. (2003). Prosodic licensing of coda in the acquisition of Spanish. Probus, 15, 257-281. https://doi.org/10.1515/prbs.2003.010
, in a crosslinguistic study of German and Spanish, found that coda consonants are acquired earlier in languages where codas and coda clusters are common. The same author concluded some years later that “We now know that babbling results from a combination of unmarked sounds and the most frequent sounds produced around the baby” (Lleó, 2012, p. 693Lleó, C. (2012). First language acquisition of Spanish sounds and prosody. In J. I. Hualde, A. Olarrea, & E. O’Rourke (Eds.), The Handbook of Hispanic Linguistics (pp. 693-710). Blackwell Publishing Ltd. https://doi.org/10.1002/9781118228098.ch32
)., Also Demuth (2009)Demuth, K. (2009). The prosody of syllables, words and morphemes. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 183-198). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.011
, after analysing the fact that /t/ (and not voiced /d/) is the first coda consonant acquired by English speaking children, determined that “although frequency and markedness typically pattern together, children may show a preference for frequency over markedness effects in their early productions” (Demuth, 2009, p. 189Demuth, K. (2009). The prosody of syllables, words and morphemes. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 183-198). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.011
). Roark and Demuth (2000)Roark, B., & Demuth, K. (2000). Prosodic constraints and the learners’s environment: a corpus study. In Howell, S. Catherine, Sara A. Fish & Thea Keith-Lucas (Eds.). Proceedings of the 24th Annual Boston University Conference on Language Development. Vol. 2 (pp. 597-608). Cascadilla Press.
carried out a corpus-based study on prosodic properties on language. Results proved that “young language learners are sensitive to statistical properties of the input, and this influences the course of language development.” (Roark & Demuth, 2000, p. 599Roark, B., & Demuth, K. (2000). Prosodic constraints and the learners’s environment: a corpus study. In Howell, S. Catherine, Sara A. Fish & Thea Keith-Lucas (Eds.). Proceedings of the 24th Annual Boston University Conference on Language Development. Vol. 2 (pp. 597-608). Cascadilla Press.
). For a more complete view of the role of input and frequency in child language acquisition, see Kern et al. (2014)Kern, S., Gayraud, F., & Chenu, F. (2014). The role of input in early first language morphosyntactic development. Language, Interaction and Acquisition, 5(1), 1-18. https://doi.org/10.1075/lia.5.1.00int
, who, in a special issue, crosslinguistically analyse the essential function of these two factors in the process of L1 acquisition, covering distinct linguistic levels.

The present research is framed within the usage-based phonology (Polo, 2016Polo, N. (2016). La investigación actual sobre el desarrollo de la fonología del español como lengua materna. Lenguas modernas, 47, 137-152.
), and the SLHG (Zamuner et al., 2004Zamuner, T., LouAnn, S., & Gerken y Hammond, M. (2004). Phonotactic probabilities in young children’s speech production. Journal of Child Language 31, 515-36. https://doi.org/10.1017/S0305000904006233
), following Ellis’s (2008, p. 95Ellis, N. C. (2008). Usage-based and form-focus SLA: The implicit and explicit learning of constructions. In A. Tyler, Y. Kim, & M. Takada (Eds.), Language in the Context of Use: Discourse and Cognitive Approaches to Language, (pp. 93-120). New York: Mouton de Gruyter.
) statement: “language processing is intimately tuned to input frequency and probabilities of mappings at all levels of grain: phonology and phonotactics, reading, spelling, lexis, morphosyntax, formulaic language, language comprehension, grammaticality, sentence production, and syntax. It relies on this prior statistical knowledge”.

Notwithstanding, following Rose (2009, p. 346Rose, Y. (2009). Internal and External Influences on Child Language Productions. In Pellegrino, François, Egidio Marsico, Ioana Chitoran, & Christophe Coupé (Eds.), Approaches to Phonological Complexity (pp. 329-351). Mouton de Gruyter. https://doi.org/10.1515/9783110223958.329
), “while statistics of the input seem to play a central role in infant speech perception, such statistics appear to be only one of the many factors underlying patterns observed in speech production”. Therefore, a single approach is not enough to account for language acquisition, but a contribution to the general research scenario.

2.2. Contribution of Corpus Linguistics

 

Investigation of language acquisition has traditionally been based on experiments or tests of a logopedic kind rather than on spontaneous speech (see Acevedo, 1993Acevedo, M. A. (1993). Development of Spanish consonants in preschool children. Communication Disorders Quarterly, 15(2), 9-15. https://doi.org/10.1177/152574019301500202
or Jiménez, 1987Jimenez, B. C. (1987). Acquisition of Spanish consonants in children aged 3-5 years, 7 months. Language, Speech, and Hearing Services in Schools, 18(4), 357-363. https://doi.org/10.1044/0161-1461.1804.357
as examples of research describing the phonological development of Mexican Spanish children ranging in age from 3 to more than 5 years). This may be due to the fact that, on the one hand, such studies tend to focus on speech and language disorders and, therefore, the samples in many cases belong to subjects who show atypical language development. These samples are collected in assessment situations where the context tends to be artificially created. On the other hand, another reason for using tests and not speech corpora in child language research is related to the difficulty of obtaining large samples of spontaneous speech, which poses a major disadvantage to any investigation: we have to find the occasion to make recordings, but also these must be later transcribed. This difficulty is compounded by the challenges of working with children, since it is not only necessary to count on the permission of parents or guardians, but also, we must be particularly respectful of their right to privacy.

Ellis (2017)Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
states that usage-based linguistics are supported by findings from Corpus Linguistics, Cognitive Linguistics, and Psycholinguistics. In the same line, Dolgova and Tyler (2019, p. 914)Dolgova, N., & Tyler, A. (2019). Applications of Usage-Based Approaches to Language Teaching. In X. Gao (Ed.), Second Handbook of English Language Teaching (pp. 939-961). Springer. https://doi.org/10.1007/978-3-030-02899-2_49
claim that Corpus Linguistics studies are an example of the different existing usage-based models, which “reveals frequency patterns and meanings in natural usage contexts”. These authors call for the need of using corpus linguistics in research from a usage-based perspective: “The usage-based research program necessitates extensive analysis both of the usage from which learners learn and of learner usage as it develops” (Ellis, 2017, p. 41Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
), by means of corpora and computational techniques. Nonetheless, Ellis (2017, p. 46)Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
warns about the need for complementary sources of information: “Learner language corpora show what learners say; they do not show what they know. Experimental techniques are needed to probe aspects of knowledge and understanding”.

The use of corpora for assessing phonological development has been extensively promoted by researchers (Demuth, 2009Demuth, K. (2009). The prosody of syllables, words and morphemes. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 183-198). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.011
; Dolgova and Tyler, 2019Dolgova, N., & Tyler, A. (2019). Applications of Usage-Based Approaches to Language Teaching. In X. Gao (Ed.), Second Handbook of English Language Teaching (pp. 939-961). Springer. https://doi.org/10.1007/978-3-030-02899-2_49
; Ellis, 2017Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
; MacWhinney, 1996MacWhinney, B. (1996). Computational analysis of interactions. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 152-178). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00006.x
; Stoll, 2009Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 89-104). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.006
, among others) as a complement to tests carried out in artificial contexts in order to observe the production of selected words. The acquisition of a sound is gradual, and its production is maintained for a certain period, fluctuating between the correct form and the non-targeted alternatives to its fossilisation. However, experimental tasks typically use isolated words as a model of production of a certain sound; during tests, which consist of the child repeating a word or group of words after the adult, immediate imitation can lead to a better pronunciation, which outside those contexts would not be that correct. Acosta and Ramos (1998)Acosta, V., & Ramos, V. (1998). Estudio de los desórdenes del habla infantil desde la perspectiva de los procesos fonológicos. Revista de Logopedia, Fonatría y Audiología, 18, 124-142. https://doi.org/10.1016/S0214-4603(98)75683-9
criticised the historically used assessment procedure that focused on isolated words as opposed to the analysis of spontaneous speech samples.

In addition, corpora can be easily managed to retrieve data using useful automatic or semi-automatic computational tools, which facilitate work and save time. Therefore, corpus linguistics can be either a method in itself or a complement to the traditional approach, especially describing the most unconscious and spontaneous facet of language.

The main contribution of naturalistic language corpora to the study of language acquisition is providing samples of authentic language in real context, an invaluable source for the study of child language. Spontaneous language corpora are preferable to study the real use of language in children, on occasion combined with corpora made up of texts obtained by means of elicitation tasks or tests as a supplement to evoke those phenomena difficult to find in spontaneous speech, due to low frequency of occurrence, or even to avoidance strategies -words children systematically avoid due to pronunciation difficulties (Stoll, 2009Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 89-104). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.006
).

3. METHODOLOGY

 

3.1. The CHIEDE corpus

 

CHIEDE, a spontaneous child language corpus of Spanish, is made up of approximately 60,000 words. About a third of the corpus consists of child language and the remaining is CDS. The main feature of CHIEDE is the spontaneity of interactions. The corpus is made up of transcribed recordings of communicative situations in their natural context. The recordings were carried out in central Spain, where the linguistic variety is Peninsular Spanish, in a medium-sized town. The speakers are monolingual and belonging to middle socioeconomic status regarding their families’ income and occupation.

The corpus presents two types of interactions: spontaneous collective interactions, recorded at a daily activity in the classroom where the whole group of children and the teacher informally chatted; and dialogues, in which an adult talks with a single child. Figure 1 shows the corpus design4For further details, consult the web site http://www.lllf.uam.es/ESP/Chiede.html#:~:text=El%20Corpus%20de%20Habla%20Infantil,comunicativas%20en%20su%20contexto%20natural.. Children were grouped according to their year of birth.

Figure 1.  Corpus design.
medium/medium-LOQUENS-9-1-2-e089-gf1.png

CHIEDE contains 58,616 word tokens in 30 text files for a total of 7 hours and 53 minutes of recordings in 30 audio files from n=59 child participants. Table 1 presents figures regarding word tokens, number of utterances, word types and the token/type ratio by age group.

Table 1.  Corpus data: Word tokens, utterances and types by age group
Age group Word tokens Utterances Types Token /type ratio
3;0-3;12 5,628 4,909 985 5.7
4;0-4;12 6,787 5,092 1,155 5.8
5;0-5;12 9,004 5,443 1,450 6.2
Adults 37,197 20,876 2,910 12.7
Total 58,616 36,320

The fact that the corpus was going to be published required being extremely respectful and compliant with the current legal framework. Consequently, before recording, parents, teachers and participants were properly apprised and asked to sign an informed consent agreeing to participate in the research. Regarding ethical concerns, all names were anonymised and, on occasion, parts of the recordings were cut and discarded due to sensitive information the children gave about their private lives.

The device used to record the corpus was a Sony DAT (Digital Audio Tape), which allows for a digital recording with professional quality, with a Sony Stereo microphone placed in the most adequate spot to capture the sound. Even so, when recording ambient sound, a certain level of background noise is inevitable; it is impossible to obtain studio sound quality. For this reason, a sound editing software (Wavelab, https://www.steinberg.net/es/wavelab/) was used to improve the quality of the recordings.

The topics of conversation were varied, but all of them related to the children’s everyday lives: what they did yesterday or the previous weekend, describing their family, talking about their friends, their pets, or the things they like to do, etc.

Each recording is aligned with its corresponding orthographic transcription, including a header with metadata or sociolinguistic and contextual information. In addition to the audio and the text files, two other kind of files are included: those with the sound-text alignment by utterances and those in XML format with morphosyntactic annotation. The files are identified with a name where the age of the child participant is specified.

3.2. Procedure

 

This work was conducted from the perspectives of computational linguistics and corpus linguistics, to assist other disciplines such as phonology and psycholinguistics. The main advantage of working with corpora is to improve and facilitate the empirical work through computational tools that make tasks such as labelling, counting of items and calculation of frequencies faster and more reliable. Undoubtedly, the phonological transcription of a text is a task which needs the investment of many working hours. If the orthographic transliteration does consume most of the time devoted to the creation of a corpus, the phonological transcription would at least double that time. Nowadays, software such as PHON (Hedlund & Rose, 2020Hedlund, G., & Rose, Y. (2020). Phon 3.1 [Computer Software]. https://phon.ca.
) facilitates this task. The present study, however, used the one developed the software by Moreno Sandoval et al. (2008)Moreno Sandoval, A., Torre Toledano, D., De La Torre, R., Garrote, M., & Guirao, J. M. (2008). Developing a phonemic and syllabic frequency inventory for spontaneous spoken Castilian Spanish and their comparison to text-based inventories. Proceedings of the VI Language Resources and Evaluation Conference (LREC), 1097-1100.
, which, to simplify, transforms “the orthographical representation of a word to its phonemic transcription based on context-dependent rules” (Moreno Sandoval et al., 2008, p. 1098Moreno Sandoval, A., Torre Toledano, D., De La Torre, R., Garrote, M., & Guirao, J. M. (2008). Developing a phonemic and syllabic frequency inventory for spontaneous spoken Castilian Spanish and their comparison to text-based inventories. Proceedings of the VI Language Resources and Evaluation Conference (LREC), 1097-1100.
). The reliability of the automatic phonological transcription was high: 4% of the words transcribed automatically were found to have a transcription (either phonemic or syllabic) error. Therefore, it was necessary that a group of linguists carry out a second part of the task (peer review), listening to the audio files and manually correcting the mistakes, and completing those features and nuances absent in an orthographic representation. It must also be clarified that the phonological transcription was a broad one, not a narrow annotation, which would have considerably increased the work. As children were not too young regarding the language acquisition period, most of them exhibited an adult-like speech in phonological terms, and just three children from the 3;0 group had typical (not due to any pathology) pronunciation difficulties (files ADR3.wav, BRU3.wav, and NAT3.wav, and their corresponding ADR3.txt, BRU3.txt, and NAT3.txt files, which can be consulted in the website mentioned in Note 4), which were carefully annotated.

Finally, to be faithful to the children’s production, the phonological transcription was carried out over the actual orthographic transcription, that is, a second orthographic line (introduced by %pho) in which the real production of the child (including errors) was represented, as shown in example (1).

  • (1) *BRU: la pongo encima ///

  • %pho: la pono encima ///

In this way, figures regarding frequencies are real, and not based on target forms.

4. RESULTS

 

Results5Statistical analysis was carried out using the software IBM SPSS Statistics. are presented in four separate sections. In the first one a frequency-based phonological inventory is provided to address research questions 1 and 3. The next two sections offer data regarding variability between the three age groups. Finally, data from CHIEDE are corroborated by comparing results with three corpora from the CHILDES database.

4.1. Data retrieved from the phonological transcription

 

According to the data collected, Table 2 presents the relative frequency of the total number of phoneme tokens in the three child groups that make up the corpus.

Table 2.  Relative frequency of phoneme tokens by age group.
Phoneme 3;0-3;12 4;0-4;12 5;0-5;12 CDS (adults)
e 12.9 13.44 13.58 15.12
a 13.29 13.09 13.13 12.27
o 11.68 10.46 10.81 10.38
s 7.34 7.55 7.94 8.11
i 8.41 8.1 7.94 7.22
n 7.35 7.2 6.72 7.05
ɾ 4.41 4.76 4.84 5.12
t 4.03 3.73 3.78 4.52
l 4.49 4.8 4.74 4.51
k 3.72 4.39 4.58 4.49
d 2.66 3.11 3.53 4.36
m 3.46 3.85 3.92 3.15
u 3.89 3.91 3.53 3.14
p 3.18 2.99 2.71 2.74
b 2.42 2.71 2.42 2.5
θ 0.84 0.88 1.03 1.52
g 1.5 1.3 1.06 0.91
ʎ6The phoneme /ʎ/ is the default output of the automatic phonological transcriber. However, it must be clarified that the language variety studied (central Spain) presents yeísmo. Thus, the actual phonetic representation of /ʎ/ is /ʝ/. 1.75 1.15 1.12 0.83
x 0.84 0.93 0.67 0.62
f 0.35 0.4 0.39 0.5
r 0.56 0.43 0.62 0.42
ʧ 0.64 0.45 0.62 0.3
ɲ 0.29 0.37 0.31 0.19

The total number of phoneme tokens is 75,535, and the Phonological Mean Length of Utterance (PMLU) (Ingram, 2002Ingram, D. (2002). The measurement of whole-word productions. Journal of Child Language, 29, 713-733. https://doi.org/10.1017/S0305000902005275
) is 12.72 phonemes. In this case, the table does not present the order of acquisition of phonemes (already acquired due to the children’s age), but their usage frequency, as data were not longitudinally collected. It can be observed how the phonemes that occupy the final rows in the table are more infrequent in Spanish and therefore their frequency decreases in relation to the most common ones; nevertheless, the figures increase as the children grow older. This shows that from three to five years old, the process of language acquisition is still ongoing and therefore studies on the acquisition of language must not stop at 36 months. However, according to these data, all children show a complete (intelligible) acquisition, even of those phonemes considered as acquired later.

In addition to the phonological data extracted from the children’s speech, a fourth column that includes the frequencies of phonemes in the child-directed speech (adults’) has been added. Although data are similar for both children and adults, greater similarity can be noticed, especially at the top of the table, between the oldest group (5;0-6;0) and the group of adults than between the 3;0-4;12-year-olds’ and the adults’ speech.

Analysing absolute frequency means for the three child groups, the asymptotic significance is p = 0.0007If adults’ absolute frequency means are compared with the children’s, the asymptotic significance is p = .000 in all cases., which denotes noteworthy different distributions of the three groups. If we observe the sample in detail (Table 3), the results are as follows:

Table 3.  Friedman’s Two-Way Analysis of Variance by Ranks (frequency means).
Pairwise Comparisons (df 2)
Sample 1/ Sample 2 Test statistic Std. Error Std. Test Statistic Sig.
3/4-year-olds -.870 .295 -2.949 .003
3/5-year-olds -1.413 .295 -4.792 .000
4/5-year-olds -.543 .295 -1.843 .065

By comparing the distribution of data from the three groups, we can observe a significant difference between the youngest (3;0-3;12) and the oldest (4;0-6;0) groups.

Another feature of the automatic phonological transcriber is the segmentation of words into syllables. In this way, it is possible to quickly and reliably know the total number of syllables that make up our corpus, and their frequency of use. The total number of syllable tokens is 35,086 and the Syllable Mean Length of Utterance (SMLU) is 5.91.

The top 25 more frequent syllables are made up of no more than two phonemes, and most of them follow the pattern CV, supporting previous research (Carreira, 1991Carreira, M. (1991). The acquisition of Spanish syllable structure. In D. Wanner, & D. A. Kibbee (Eds.), New analyses in Romance linguistics (pp. 3-18). John Benjamins. https://doi.org/10.1075/cilt.69.06car
; Goldstein & Cintrón, 2001Goldstein, B., & Cintrón, P. (2001). An investigation of phonological skills in Puerto-Rican Spanish-speaking 2-year-olds. Clinical Linguistics and Phonetics, 15, 343-361. https://doi.org/10.1080/02699200010017814
; Kehoe & LLeó, 2003Kehoe, M., & Lleó, C. (2003). The acquisition of syllable types in monolingual and bilingual German and Spanish children. In B. Beachley, A. Brown, & F. Conlin (Eds.), Proceedings of the Twenty-seventh annual Boston University Conference on Language Development (pp. 402-413). Cascadilla Press.
). Closed syllables (CVC) or consonant clusters like CCV involve a higher articulatory difficulty and therefore their frequency of use is lower compared to open syllables consisting of no more than two phonemes. The four groups coincide (80%): 20 out of the 25 most frequent syllables are the same for children as for adults. From these data, it is possible to easily and accurately calculate PMLU and SMLU for each age group. In Table 4 we observe how figures appreciably increase from three to six years old.

Table 4.  PMLU and SMLU by age group.
PMLU SMLU
3;0-3;12 10.29 4.88
4;0-4;12 13.57 6.26
5;0-6;0 14.11 6.49
Adults 28.35 13.03

Statistics show that there is a significant difference between groups’ means, being p = 0.026 for PMLU and p = 0.025 for SMLU.

Findings (Table 2) prove a relationship between input frequency and order of acquisition that will be thoroughly analysed in the section devoted to the discussion, revisiting research question 3.

4.2. Standard deviation analysis

 

So far, all data presented belong to the whole corpus divided into age groups. However, to calculate the standard deviation a sub-corpus was extracted in order to get a balance between the participants. As seen in Figure 1, representing the corpus design, CHIEDE is divided into two sub-corpora: collective interactions and dialogues. In the former communicative setting, the number of subjects is about twenty children (see Figure 1 for exact numbers), and the participation of all of them is not equal. When extracting the phonemes inventory for each of the participants, it was observed that while for some of them the number of words was very high -and therefore they presented a high frequency of phonemes- for others figures were considerably lower due to their moderate participation. Hence, a decision was made to use just the dialogues sub-corpus for this task as only one child participates in each interaction, so the number of conversational turns increases and therefore his/her production in terms of number of words enlarges. In addition, it was found that the number of words uttered by the children was similar in each dialogue (Table 5). A balance needed to compare data from different subjects was thus obtained.

Table 5.  Sample sizes per child group.
Age Words
3;0-3;12-year-olds 4,021
4;0-4;12-year-olds 4,416
5;0-6;0-year-olds 4,119
Total 12,556

Thus, the total sample consists of 24 children, equally divided into three age groups -3;0-3;12, 4;0-4;12 and 5;0-6;0 years old- each one made up of eight children, four boys and four girls (see Figure 1). The relative frequency was calculated from the automatic count of the absolute frequency of the twenty-three Spanish phonemes, and then, the standard deviation across all children in each age group was computed. Table 6 presents the values for each age group.

Table 6.  Standard deviation from relative frequency of phoneme tokens by age group8As results belong to a sample, the symbol representing the mean is and the symbol representing the standard deviation is s (instead of µ for mean and σ for standard deviation, symbols conventionally used to describe a population)..
3;0-3;12-year-olds 4;0-4;12-year-olds 5;0-6;0-year-olds
Phonemes s Phonemes s Phonemes s
e 12.95 2.40 a 13.51 2.47 e 13.97 0.73
a 12.68 2.20 e 13.51 2.24 a 12.16 1.32
o 11.80 1.32 o 10.35 1.72 o 11.14 0.58
i 8.57 1.78 i 8.19 1.81 s 8.71 1.35
n 7.87 1.12 s 8.02 1.85 i 8.22 1.34
s 7.76 1.39 n 7.65 0.66 n 7.15 0.94
ɾ 4.27 0.84 ɾ 4.82 0.79 ɾ 4.69 0.95
u 4.25 1.12 k 4.15 1.25 l 4.61 0.67
l 4.05 0.97 l 4.13 0.93 k 4.12 0.61
t 4.05 1.02 u 3.95 0.93 m 3.71 0.86
k 3.90 1.05 t 3.83 0.89 t 3.65 0.42
m 3.57 1.16 m 3.50 0.99 u 3.55 0.65
p 3.18 0.78 p 3.03 0.66 d 3.20 0.69
d 2.54 0.19 d 3.01 0.81 p 3.11 0.79
b 2.41 0.93 b 2.48 0.70 b 2.45 0.49
ʎ 1.52 0.67 g 1.28 0.26 g 1.19 0.52
ɡ 1.51 0.41 ʎ 1.07 0.39 ʎ 1.08 0.52
x 0.72 0.37 x 0.99 0.41 θ 1.00 0.25
ʧ 0.63 0.31 θ 0.77 0.44 x 0.70 0.18
θ 0.60 0.42 ʧ 0.52 0.21 ʧ 0.52 0.16
r 0.57 0.21 r 0.51 0.31 r 0.48 0.25
f 0.33 0.24 ɲ 0.45 0.38 f 0.38 0.26
ɲ 0.27 0.14 f 0.31 0.11 ɲ 0.21 0.06

Noting the values, the deviation degree of each phoneme in relation to the mean is appreciable, especially for the figures corresponding to the 4;0-4;12 years old group (11 out of 23 phonemes), which show a higher fluctuation from the mean. On the contrary, the 5;0-6;0 years old group displays less variation, although it is notable salient in four cases: /f/, /g/, /p/ and /ɾ/. To appreciate the differences more clearly, these data have been transferred to boxplots (Figures 2, 3 and 4). For the last values, due to the low frequency of phonemes, differences are hardly substantial; but for higher values, the degree of variability is noticeable.

Figure 2.  Variability in 3-year-old group (3;0-3;12).
medium/medium-LOQUENS-9-1-2-e089-gf2.png
Figure 3.  Variability in 4-year-old group (4;0-4;12).
medium/medium-LOQUENS-9-1-2-e089-gf3.png
Figure 4.  Variability in 5-year-old group (5;0-6;0).
medium/medium-LOQUENS-9-1-2-e089-gf4.png

In the boxplots, the form of the median line shows three distinct blocks: after the first six most frequent phonemes (/e/, /a/, /o/, /i/, /n/, /s/) there is a marked drop, after which the values are kept within a stable range until a second drop in the last and least frequent ones (from /x/ in the 3;0-3;12 and 4;0-4;12 years old groups, and /θ/ in the 5;0-6;0 years old group). The highest frequency rates are distributed among seven phonemes: vowels /a/, /e/, /i/, and /o/, the nasal /n/, and the fricative /s/ (mean above 7, Table 6). Within the second block, we find plosives, the vowel /u/, the liquids /l/ and / ɾ /, and the nasal /m/. Finally, the last block (mean below 1, Table 6), in which the frequency of sounds is moderate, includes the rest of the fricatives, the trill /r/, and the nasal /ɲ/; here the degree of variability decreases due to the low frequency of use.

Despite the fact that the median line pattern is similar for the three charts, in the first two age groups there are more striking irregularities, while the last age group’s plot shows a softer median curve. In the latter case the degree of deviation is lower, showing more consistency.

Again, Friedman’s Two-Way Analysis of Variance by Ranks presents an asymptotic significance of p = 0.018, detailed by age groups as follows:

Table 7 shows significant differences between 5 and 4-year-olds and between 5 and 3-year-olds. However, between the 3 and the 4 years old groups there seems to be no significant difference, which means that in the oldest age group (5;0-6;0 years old) there is a stabilisation of the phonological system, since figures for standard deviation are lower (as can be seen in 8), given that fluctuation from the mean decreases. At ages 3;0-3;12 and 4;0-4;12 years the values present a higher variation, especially for the most frequent phonemes. However, from 5 years old these differences disappear and the figures are stabilised, decreasing the distance between the values and the mean, in contrast to the irregularities which the other two age groups show, especially the 4;0-4;12 years old group. Thus, the idea of a turning point at the age of four years in the process of phonological acquisition is reinforced: again, it seems that it is from that age when children’s language begins to approach adult use.

Table 7.  Friedman’s Two-Way Analysis of Variance by Ranks (standard deviation).
Pairwise Comparisons (df 2)
Sample 1/Sample 2 Test Statistic Std. Error Std. Test Statistic Sig.
5/4-year-olds .652 .295 2.212 .027
5/3-year-olds .783 .295 2.654 .008
4/3-year-olds .130 .295 .442 .658

4.3. U-shape development at four years old

 

Linked to the question about whether 4;0 is a turning point in the language acquisition process, and to the above data (standard deviation analysis), it is relevant to describe the finding of the greatest variability of 4-year-olds in the present study as a sign of a U-shaped (inverted in the chart) development pattern. Figure 5 shows how variability (based on standard deviation) is higher for 11 out of 23 phonemes (43.5%) in the 4;0 group: /e/, /a/, /o/, /s/, /i/, /l/, /p/, / θ /, /x/, /r/, and /f/. Therefore, it can be concluded that, at least in these 11 cases, a U-shape development pattern can be observed. This issue will be thoroughly discussed later.

Figure 5.  U-shape development at four years old.
medium/medium-LOQUENS-9-1-2-e089-gf5.png

4.4. Extrapolation of results

 

Phonological frequencies depend on the lexical use and on the lexical selection the child makes (statistical acquisition based on the lexicon, Polo, 2016Polo, N. (2016). La investigación actual sobre el desarrollo de la fonología del español como lengua materna. Lenguas modernas, 47, 137-152.
). “Children who still have a small vocabulary may be very selective in their choice of words, that is, either actively avoid words which are difficult to pronounce or substitute consonants systematically” (Stoll, 2009, p. 94Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 89-104). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.006
). Therefore, a study such as the one presented here is incomplete if lexical units are not taken into account. To accomplish this, the most frequent lexical units presented in were analysed. But in order to reinforce conclusions, we used not only CHIEDE, but three more corpora from the CHILDES database (MacWhinney and Snow 1985MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of Child Language, 12(2), 271-295. https://doi.org/10.1017/S0305000900006449). In this way, it can be determined if the results presented here are contextual or, on the contrary, they are a general tendency. To carry out this test, the methodology was as follows:

  • Among the CHILDES corpora in Spanish language, three corpora which shared features with CHIEDE were selected, especially regarding age range. They were Spanish Díez-Itza Corpus (Díez-Itza, 1995)Díez-Itza, E. (1995). Procesos fonológicos en la adquisición del español como lengua materna. In J. M. Ruiz, P. H. Sheerin, & E. González-Cascos (Eds.), Actas del XI Congreso Nacional de Lingüística Aplicada (pp. 225-264). Universidad de Valladolid.
    , Spanish BecaCESNo Corpus (Benedet & Snow, 2004Benedet, M., & Snow, K. (2004). Spanish BecaCESNo Corpus. TalkBank. https://childes.talkbank.org/access/Spanish/BecaCESNo.html
    ) and Spanish Marrero Corpus (Albalá & Marrero, 2004Albalá, M. J., & Marrero, V. (2004). Spanish Marrero Corpus. TalkBank. https://childes.talkbank.org/access/Spanish/Marrero.html
    ).

  • From two of them, BecaCESNo and Marrero, those files (transcriptions) in which the child was younger than 3;0 and older than 6;0 years old were discarded, as CHIEDE’s participants are within that age range.

  • Once the corpora were selected, CLAN, a tool provided by the CHILDES Project (MacWhinney & Snow, 1985MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of Child Language, 12(2), 271-295. https://doi.org/10.1017/S0305000900006449),was used to extract the list of different forms (types) and their frequency of use.

  • After cleaning up those lists (deleting Proper Names, as they are contextual, or correcting orthographic mistakes), they were compared and the most frequent lexical units or types common to the four corpora were extracted.

  • The 500 most frequent types were selected and the phonological transcriber was applied to them.

Table 8.  Four corpora frequency, mean, standard deviation and coefficient of variation (cv).
Phoneme Beca CESNo Díez-Itza Marrero CHIEDE cv
a 13.92 14.06 14.90 13.49 4.18
e 14.26 13.54 13.56 13.19 3.31
o 11.97 11.97 12.40 11.68 2.47
s 8.88 8.77 8.68 8.85 1.04
i 6.94 8.45 7.31 8.11 9.08
n 7.62 7.54 6.80 8.42 8.70
k 4.81 5.20 5.16 4.15 10.05
m 4.06 4.37 3.30 4.68 14.42
l 3.95 4.08 4.34 3.86 5.09
ɾ 3.87 3.53 4.59 3.37 14.09
t 3.82 3.68 3.91 3.61 3.63
u 3.40 3.12 2.44 3.99 19.97
d 2.95 2.75 2.95 2.58 6.34
p 2.95 2.32 2.73 2.85 10.13
b 2.74 2.53 2.77 2.47 5.56
ʎ 1.20 1.32 1.15 1.50 12.01
g 0.83 0.79 0.80 0.98 10.26
θ 0.62 0.52 0.65 0.59 9.42
x 0.40 0.53 0.56 0.46 14.07
ʧ 0.23 0.23 0.22 0.43 34.71
f 0.19 0.29 0.32 0.26 20.47
ɲ 0.25 0.21 0.18 0.28 18.97
r 0.13 0.17 0.28 0.20 31.92

Table 8 show the results. The most relevant figures are those in the last column, in which the coefficient of variation shows the variability of the four samples in relation to the mean. The most homogeneous values belong to the phonemes /a/, /e/, /o/, /s/, /i/, /n/, /l/, /t/, /d/, /b/, and /θ/. On the other hand, /ʧ/, /f/, and /r/ show the most heterogeneous distribution. These phonemes are precisely the most infrequent ones not only in CHIEDE, but in the other three corpora too, as well as in the adults’ speech, again reinforcing the assumption about an existing relationship of the input frequency with the order of acquisition of phonemes.

Broadly speaking, the differences among the four corpora are not meaningful, as frequency figures are almost equal, which means that the basic lexical units are not context dependent, but generalised, as well as the most frequent phonemes. Therefore, the results obtained after the phonological analysis carried out on CHIEDE can be extrapolated.

5. DISCUSSION

 

Revisiting research questions in light of the results, major findings are summarised here. Regarding the first research question posed in the present study, it can be concluded that, according to the sample, the phonological Spanish system is essentially acquired (in terms of intelligibility) at the age of three years (as shown in Table 2). Acquisition is here understood as development, that is, as a process where phonemes are already organised into patterns (what Velleman and Vihman (2002)Velleman, S. L., & Vihman, M. M. (2002). Whole-Word Phonology and Templates. Language Speech and Hearing Services in Schools, 33(1), 9-23 https://doi.org/10.1044/0161-1461(2002/002)
call templates) typical of the final stages of development in children, showing that units are rooted. According to Velleman and Vihman (2002, p. 20)Velleman, S. L., & Vihman, M. M. (2002). Whole-Word Phonology and Templates. Language Speech and Hearing Services in Schools, 33(1), 9-23 https://doi.org/10.1044/0161-1461(2002/002)
, “templates serve as a stepping stone in the direction of the adult system, despite the decrease in accuracy that may temporarily result”. Vihman (2018, p. 38)Vihman, M., & Wauquier, S. (2018). Templates in child language. In M. Hickmann, E, Veneziano, & H. Jisa (Eds.), Sources of variation in first language acquisition: Languages, contexts, and learners (pp. 27-44). John Benjamins Publishing Company. https://doi.org/10.1075/tilar.22.02vih
also states that “template formation is neither the outcome of a pre-existing principle nor an end in itself, but instead a dynamic (and momentary) child response, in the early stages of acquisition, to the phonological and lexical challenges of the language”.

It is generally accepted in Spanish phonological acquisition research that the most problematic phonemes are liquid consonants, the fricatives /s/, /θ/ and /x/, the nasal /ɲ/, and the plosive /d/ (Acevedo, 1993Acevedo, M. A. (1993). Development of Spanish consonants in preschool children. Communication Disorders Quarterly, 15(2), 9-15. https://doi.org/10.1177/152574019301500202
; Bosch, 1983Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
, Jiménez, 1978Jimenez, B. C. (1987). Acquisition of Spanish consonants in children aged 3-5 years, 7 months. Language, Speech, and Hearing Services in Schools, 18(4), 357-363. https://doi.org/10.1044/0161-1461.1804.357
). However, after analysing these sounds in CHIEDE, it can be observed that both the fricative /s/ and the liquids /l/ and /ɾ/ are among the most frequent phonemes. CHIEDE’s participants showed no added difficulty in their use, indicating that, although they may be problematic phonemes at the time of their acquisition, from three years old onwards these three sounds do not present any difficulty for children with typical development; in fact, they are widely used. Regarding the rest of the phonemes which are considered problematic, it can be concluded that they are characterised by a lower use. The higher frequency of certain phonemes over others is a lexical matter: “Thus, when we examine the lexicon (words) of a language, not all sounds have an equal opportunity to appear in all positions.” (Bernstein-Ratner, 1994, p. 351Bernstein-Ratner, N. (1994). Phonological analysis of child speech. In J. L. Sokolov, & C.E. Snow (Eds.), Handbook of research in language development using CHILDES (pp. 324-372). Hillsdale, NJ: Lawrence Erlbaum Associates.
). Certain phonemes, such as /r/ or /ɲ/, are less frequent in the Spanish lexicon, and thus their frequency of use is low (as seen in frequency lists, Tables 2 and 8 ).

Results from the present study shed light on the existence of a turning point at four years old in the process of L1 acquisition (research question 2). On the one hand, figures on PMLU and SMLU (Table 4) indicate that from four to five years of age there is a significant increase towards adult language. Furthermore, standard deviation (Table 6) shows how language becomes stabilised from five years old onwards. It can also be stated that the subjects from this study fit Díez-Itza and Martínez López’s (2004)Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.
stages, as it seems that from 3;0 to 5;0 years old children are in a period of reorganisation of the phonological system, termed “stabilisation” by the authors; however, from 5;0 years old onwards children seem to achieve the “resolution” stage. Variability showed by the group of 4;0-4;12 leads to the conclusion that around four years old there is a landmark which is relevant not only for research on typical language development, but specially for research on speech and language disorders. This turning point is also supported by the U-shape development pattern evidenced from the analysis in Figure 5. Although the 3-year-olds group displayed a similar pattern, this was shown in those less frequent phonemes. However, 4-year-olds exhibited a higher variation and a U-shape pattern precisely for those phonemes which are acquired earlier and, therefore, should be stable at this age.

The overriding question guiding this research is to what extent the input frequency is relevant in the L1 acquisition process (research question 3). In disciplines such as Psycholinguistics, and more specifically in Speech and Language Therapy, it is quite accepted, that phonemes which usually pose a problem in the acquisition process, such as the Spanish trill, are characterised by a more difficult physiological articulation (Bosch, 1983Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
; López Valero et al., 1989López Valero, A., Carrillo Hernández, M. R., & Ros Frutos, J. L. (1989). Aportaciones para el estudio del desarrollo del lenguaje infantil en el período comprendido entre los veinticuatro y los treinta meses. Cauce, Revista de Filología y su Didáctica, 12, 145-156.
). However, this idea is conceived from the standpoint of adult speakers whose articulatory system is fossilised. The baby’s physiology is ready to adapt to different circumstances and therefore we cannot claim whether it is difficult for a child to manage his/her articulators to pronounce a sound or if he/she simply lacks enough examples to learn it. According to Zamuner et al. (2004, p. 1420)Zamuner, T., LouAnn, S., & Gerken y Hammond, M. (2004). Phonotactic probabilities in young children’s speech production. Journal of Child Language 31, 515-36. https://doi.org/10.1017/S0305000904006233
, “it appears that children are not limited by articulatory or perceptual constraints, but rather that children’s errors are largely influenced by their ability to access stored representations.”. For these reasons, and mainly based on the results obtained from CHIEDE, it is highlighted here the relevance of probability and frequency in studies on language ontogenesis, as frequency of use may be an essential indicator of typical development.

It is also agreed that at the age of three all vowels are acquired, followed by nasals, approximants, and later plosives. However, at this age, the incomplete acquisition of liquids, fricatives and affricates prevails (LLeó, 2012Lleó, C. (2012). First language acquisition of Spanish sounds and prosody. In J. I. Hualde, A. Olarrea, & E. O’Rourke (Eds.), The Handbook of Hispanic Linguistics (pp. 693-710). Blackwell Publishing Ltd. https://doi.org/10.1002/9781118228098.ch32
). Interestingly, this order of acquisition coincides with the order of frequency of spontaneous adult speech phonemes in Spanish (Table 2).

Studies such as those by Demuth (2009)Demuth, K. (2009). The prosody of syllables, words and morphemes. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 183-198). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.011
, Ellis (2017)Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215
, Kern et al. (2014)Kern, S., Gayraud, F., & Chenu, F. (2014). The role of input in early first language morphosyntactic development. Language, Interaction and Acquisition, 5(1), 1-18. https://doi.org/10.1075/lia.5.1.00int
or Tomasello (2009)Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press.
, among several others, demonstrate the probabilistic relationship between input and language acquisition. The present study is another example of how the input frequency affects language development (in this particular case, phonological acquisition). “Ease of articulation seems to play only a partial role in determining the overall developmental route” (Pye, Ingram & List, 1987, p. 182Pye, C. L., Ingram, D., & List, H. (1987). A comparison of initial consonant acquisition in English and Quiché. In K. Nelson, & A. Van Kleeck (Eds.), Children’s Language (Vol. 6) (pp. 175-190). Erlbaum. https://doi.org/10.4324/9781315792668-8
).

Another factor influencing phonological learning is phonological neighbourhoods or phonologically similar words. Studies such as those by Zamuner (2009)Zamuner, T. (2009). The structure and nature of phonological neighbourhoods in children’s early lexicons. Journal of Child Language, 36, 3-21. https://doi.org/10.1017/S0305000908008829
showed that the words which are first acquired have denser neighbourhoods than those acquired later. Maekawa and Storkel (2006)Maekawa, J., & Storkel, H. L. (2006). Individual differences in the influence of phonological characteristics on expressive vocabulary development by young children. Journal of Child Language, 33, 439-459. https://doi.org/10.1017/S0305000906007458
also highlighted the importance of phonotactic probability and density neighbourhood. These authors concluded that “[...] phonotactic probability, density and frequency appeared to predict expressive vocabulary development but with individual variation across children” (Maekawa & Storkel, 2006, p. 457Maekawa, J., & Storkel, H. L. (2006). Individual differences in the influence of phonological characteristics on expressive vocabulary development by young children. Journal of Child Language, 33, 439-459. https://doi.org/10.1017/S0305000906007458
).

Likewise Pierrehumbert (2003)Pierrehumbert, J. B. (2003). Probabilistic theories of phonology. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probability Theory in Linguistics (pp. 177-228). The MIT Press.
referred to various studies that have shown that children are sensitive to statistical patterns of sound. This stands in opposition to the idea of a universal inventory from which the individual selects the necessary elements to design his/her phonological system. The main counter-argument she stated is that this theory does not explain why children take so much time from when they acquire or distinguish an element as one of their own language until they master its production in an adult manner. Phonetic knowledge is gradually acquired and it is updated through experience. “Acquiring the phonetic encoding system of a language involves acquiring probability distributions over the phonetic space”9 Pierrehumbert (2003) defined the concept of phonetic space as the acoustic and articulatory parameterisation of speech as physical event, that is, what Moreno Cabrera (1997) called espacio de variación articulatorio (‘articulatory variation space’), or the different articulatory realisations of a phoneme. (Pierrehumbert, 2003, p. 184Pierrehumbert, J. B. (2003). Probabilistic theories of phonology. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probability Theory in Linguistics (pp. 177-228). The MIT Press.
).

This last idea leads to consider how crucial the roles of probability and frequency of use are in the process of language acquisition. Bernstein Ratner (1994)Bernstein-Ratner, N. (1994). Phonological analysis of child speech. In J. L. Sokolov, & C.E. Snow (Eds.), Handbook of research in language development using CHILDES (pp. 324-372). Hillsdale, NJ: Lawrence Erlbaum Associates.
suggested that those elements that children acquire earlier are the most frequent both in adult speech and in all languages throughout the world, while phonemes that present a higher learning difficulty are precisely those that are less represented.

In this study (Table 2), the frequency of use that the oldest age group shows is very similar to that shown by adults in spontaneous speech, whereas the differences between the other two groups of children and the adult one are larger. According to data from CHIEDE, the most common sounds in adult speech are precisely those which, based on previous research (Bosch, 1993Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.
; López Valero et al., 1989López Valero, A., Carrillo Hernández, M. R., & Ros Frutos, J. L. (1989). Aportaciones para el estudio del desarrollo del lenguaje infantil en el período comprendido entre los veinticuatro y los treinta meses. Cauce, Revista de Filología y su Didáctica, 12, 145-156.
; Serra, 1983Serra, M. (1983). Normas estadísticas de articulación para la población escolar de 3 a 7 años en el área metropolitana de Barcelona. Revista de Logopedia, Foniatría y Audiología, 3(4), 232-235. https://doi.org/10.1016/S0214-4603(83)75286-1
), are acquired earlier and more easily, i.e., vowels and nasals in the first place, followed by plosives and liquids. Lower positions on the frequency list are occupied by fricatives, which are precisely the last and most problematic in the acquisition process.

The same phenomenon occurs in other languages. For example, the English sounds identified as more complicated to learn (Grunwell, 1981Grunwell, P. (1981). The development of phonology: A descriptive profile. First language, 3, 161-191. https://doi.org/10.1177/014272378100200601
) are those which have a lower frequency rate in adult language (Mines et al., 1978Mines, M. A., Hanson, B. F., & Shoup, J. E. (1978). Frequency of occurrence of phonemes in conversational English. Language and speech, 21(3), 221-241. https://doi.org/10.1177/002383097802100302
). Among these phonemes are some fricatives, such as the voiceless dental /θ/ and the voiceless and voiced postalveolar affricates /ʧ/ and /ʤ/. There is an undeniable relationship between less frequent phonemes in adult language and those which are more problematic in the acquisition process.

The evidence so far leads to emphasize the importance of the input frequency in the study of L1 acquisition and its relation to the most problematic phonemes. With this, the importance of the place and manner of articulation as the sole factor causing the delayed acquisition of certain phonemes should be played down (Rose, 2009Rose, Y. (2009). Internal and External Influences on Child Language Productions. In Pellegrino, François, Egidio Marsico, Ioana Chitoran, & Christophe Coupé (Eds.), Approaches to Phonological Complexity (pp. 329-351). Mouton de Gruyter. https://doi.org/10.1515/9783110223958.329
). As stated by Menn and Stoel-Gammon (1996, p. 352)Menn, L., & Stoel-Gammon, C. (1996). Phonological development. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 335-359). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00014.x
, “A theory of child phonology cannot ignore word frequency although current adult phonological theory has no place for this notion”.

5.1. Limitations

 

Despite the fact that the children in CHIEDE showed a complete (intelligible) acquisition of phonemes at 3 years old, this situation must be regarded with caution, since the participants represent only a part of the whole population of Spanish-speaking children. Giving priority to sub-corpora balance (between the three different age groups’ language production) limited the number of participants per age group. Nevertheless, the comparison of CHIEDE’s data to those from three different corpora supports, to some extent, the findings in the present study.

As Grunwell (1981)Grunwell, P. (1981). The development of phonology: A descriptive profile. First language, 3, 161-191. https://doi.org/10.1177/014272378100200601
stated, language acquisition is characterised by great variation from an individual to another. However, data from CHIEDE may serve as a paradigmatic pattern of linguistic behaviour for research on child language.

Another potential limitation could be the grouping of participants. As 4 years old is hypothesised as a critical age, speakers could have been grouped by different age limits to analyse the range 3;5-4;5. However, a balanced distribution of children in three groups prevailed here. Otherwise, age ranges and number of participants per group would be unbalanced. In addition, it would also be relevant to consider the role of the gender factor for future research.

Concerning the characteristics of the transcription, further research is suggested regarding issues such as the distribution of phonemes and syllable structure, clusters or allophones description. This would involve a narrow transcription, which exceeds the scope of this research. Indeed, as mentioned, recording conditions were not ideal due to the ambient sound.

Finally, it would be interesting to extend this experiment to other languages, particularly to other Spanish dialects and varieties, and observe to what extent patterns coincide.

6. CONCLUSIONS

 

Research on language acquisition beyond English and crosslinguistically has thrived during the last decades, although many unsolved questions still remain. There is a need for large cross-sectional spontaneous speech corpora, sufficiently representative and linguistically annotated. Furthermore, standards must be established to facilitate analysis and comparison. As Stoll (2009, p. 91)Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 89-104). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.006
complained, “the use of different data sets, different methods or different criteria for coding makes it difficult to compare across languages”. Also, corpus-based analysis of the late acquisition period should be increased, that is, exceeding 36 months old, as most of the existing corpora do not include child participants exceeding that initial period of language development. The use of representative corpora and computational tools enriches research on language acquisition and is a reliable method for the study of frequency, which, as several investigations reveal, is a significant factor throughout the acquisition process.

The findings from the present study contribute to current research on Spanish-speaking children’s phonological acquisition in three ways:

  • Providing a phonological inventory which may serve as a model for future research on typical and atypical child language development (from 3 years old onwards).

  • Contributing to the assumption that 4 years old is a turning point in the process of language acquisition, as the variability analysis of the frequency of phonemes in CHIEDE shows.

  • Corroborating the importance of the role of input frequency as a factor to take into consideration when analysing child language.

From a methodological point of view, we encourage language acquisition research based on natural language corpora. Corpus Linguistics and Computational Linguistics are essential in language analysis, especially from a usage-based approach, as commented above and showed in this research. In addition, apart from the three contributions mentioned above, the findings of this research have practical implications for Clinical Linguistics and Speech and Language Therapy, as they can be used as a paradigm for the assessment of child language

7. ACKNOWLEDGEMENTS

 

This research was supported by the project Adquisición fónica y corpus. Tratamiento en PHON del corpus Koiné de habla infantil (FFI2017-82752-P), funded by FEDER, Ministerio de Ciencia, Innovación y Universidades (Agencia Estatal de Investigación, Proyectos de Excelencia 2017)

NOTES

 
1

https://childes.talkbank.org/

2

https://www.phon.ca/phon-manual/index.html

3

WWM stands for “whole word match”, that is, the child’s pronunciation equals the adult’s.

4

For further details, consult the web site http://www.lllf.uam.es/ESP/Chiede.html#:~:text=El%20Corpus%20de%20Habla%20Infantil,comunicativas%20en%20su%20contexto%20natural.

5

Statistical analysis was carried out using the software IBM SPSS Statistics.

6

The phoneme /ʎ/ is the default output of the automatic phonological transcriber. However, it must be clarified that the language variety studied (central Spain) presents yeísmo. Thus, the actual phonetic representation of /ʎ/ is /ʝ/.

7

If adults’ absolute frequency means are compared with the children’s, the asymptotic significance is p = .000 in all cases.

8

As results belong to a sample, the symbol representing the mean is and the symbol representing the standard deviation is s (instead of µ for mean and σ for standard deviation, symbols conventionally used to describe a population).

9

Pierrehumbert (2003)Pierrehumbert, J. B. (2003). Probabilistic theories of phonology. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probability Theory in Linguistics (pp. 177-228). The MIT Press.
defined the concept of phonetic space as the acoustic and articulatory parameterisation of speech as physical event, that is, what Moreno Cabrera (1997)Moreno Cabrera, J. C. (1997). Introducción a la lingüística: enfoque tipológico y universalista. Síntesis.
called espacio de variación articulatorio (‘articulatory variation space’), or the different articulatory realisations of a phoneme.

8. REFERENCES

 

Acevedo, M. A. (1993). Development of Spanish consonants in preschool children. Communication Disorders Quarterly, 15(2), 9-15. https://doi.org/10.1177/152574019301500202

Acosta, V., & Ramos, V. (1998). Estudio de los desórdenes del habla infantil desde la perspectiva de los procesos fonológicos. Revista de Logopedia, Fonatría y Audiología, 18, 124-142. https://doi.org/10.1016/S0214-4603(98)75683-9

Albalá, M. J., & Marrero, V. (2004). Spanish Marrero Corpus. TalkBank. https://childes.talkbank.org/access/Spanish/Marrero.html

Benedet, M., & Snow, K. (2004). Spanish BecaCESNo Corpus. TalkBank. https://childes.talkbank.org/access/Spanish/BecaCESNo.html

Bernhardt, B. M., & Stemberger, J. P. (2017). Investigating typical and protracted phonological development across languages. In E. Babatsouli, D. Ingram, & N. Müller (Eds.), Crosslinguistic Encounters in Language Acquisition: Typical and Atypical Development (pp. 71-108). Multilingual Matters. https://doi.org/10.21832/9781783099092-008

Bernstein-Ratner, N. (1994). Phonological analysis of child speech. In J. L. Sokolov, & C.E. Snow (Eds.), Handbook of research in language development using CHILDES (pp. 324-372). Hillsdale, NJ: Lawrence Erlbaum Associates.

Bleses, D., Basbøll, H., Lum, J., & Vach, W. (2010). Phonology and lexicon in a cross-linguistic perspective: the importance of phonetics - a commentary on Stoel-Gammon’s “Relationships between lexical and phonological development in young children”. Journal of Child Language ,38(01), 61-68. https://doi.org/10.1017/S0305000910000437

Bosch, L. (1983). El desarrollo fonológico infantil: una prueba para su evaluación. Anuario de Psicología, 28(1), 87-114.

Bosch, L., & Sebastián-Gallés, N. (2001). Evidence of early language discrimination abilities in infants from bilingual environments. Infancy, 2, 29-49. https://doi.org/10.1207/S15327078IN0201_3

Bosch, L., & Sebastián-Gallés, N. (2003). Language experience and the perception of a voicing contrast in fricatives: infant and adult data. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the Fifteenth International Congress of Phonetic Sciences (pp. 1987-1990). Universitat Autónoma de Barcelona.

Bunta, F., & Ingram, D. (2007). The acquisition of speech rhythm by bilingual Spanish-and English-speaking 4-and 5-year-old children. Journal of Speech, Language, and Hearing Research, 50(4), 999-1014. https://doi.org/10.1044/1092-4388(2007/070)

Carreira, M. (1991). The acquisition of Spanish syllable structure. In D. Wanner, & D. A. Kibbee (Eds.), New analyses in Romance linguistics (pp. 3-18). John Benjamins. https://doi.org/10.1075/cilt.69.06car

Chomsky, N. (1980). Rules and representation. MIT Press.

Demuth, K. (2009). The prosody of syllables, words and morphemes. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 183-198). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.011

Díez-Itza, E. (1995). Procesos fonológicos en la adquisición del español como lengua materna. In J. M. Ruiz, P. H. Sheerin, & E. González-Cascos (Eds.), Actas del XI Congreso Nacional de Lingüística Aplicada (pp. 225-264). Universidad de Valladolid.

Díez-Itza, E., & Martínez López, V. (2004). Las etapas tardías de la adquisición fonológica: procesos de reducción de grupos consonánticos. Anuario de Psicología, 35, 177-202.

Dolgova, N., & Tyler, A. (2019). Applications of Usage-Based Approaches to Language Teaching. In X. Gao (Ed.), Second Handbook of English Language Teaching (pp. 939-961). Springer. https://doi.org/10.1007/978-3-030-02899-2_49

Durgunoğlu, A. Y., & Öney, B. (1999). A cross-linguistic comparison of phonological awareness and word recognition. Reading and Writing: An Interdisciplinary Journal, 11, 281-299. https://doi.org/10.1023/A:1008093232622

Ellis, N. C. (2008). Usage-based and form-focus SLA: The implicit and explicit learning of constructions. In A. Tyler, Y. Kim, & M. Takada (Eds.), Language in the Context of Use: Discourse and Cognitive Approaches to Language, (pp. 93-120). New York: Mouton de Gruyter.

Ellis, N. C. (2017). Cognition, Corpora, and Computing: Triangulating Research in Usage-Based Language Learning. Language Learning, 67(S1), 40-65. https://doi.org/10.1111/lang.12215

Garrote, M. (2010). Los corpus de habla infantil. Metodología y análisis. Servicio de publicaciones de la Universidad Autónoma de Madrid.

Goldstein, B., & Cintrón, P. (2001). An investigation of phonological skills in Puerto-Rican Spanish-speaking 2-year-olds. Clinical Linguistics and Phonetics, 15, 343-361. https://doi.org/10.1080/02699200010017814

Goodman, J. C., Dale, P. S., & Li, P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language, 35, 515-531. https://doi.org/10.1017/S0305000907008641

Grunwell, P. (1981). The development of phonology: A descriptive profile. First language, 3, 161-191. https://doi.org/10.1177/014272378100200601

Hedlund, G., & Rose, Y. (2020). Phon 3.1 [Computer Software]. https://phon.ca.

Ingram, D. (1976). Phonological disability in children. Edwards Arnold.

Ingram, D. (1979). Phonological patterns in the speech of young children. In P. Fletcher, & M. Garman (Eds.), Language acquisition (pp. 133-149). Cambridge University Press.

Ingram, D. (2002). The measurement of whole-word productions. Journal of Child Language, 29, 713-733. https://doi.org/10.1017/S0305000902005275

Jimenez, B. C. (1987). Acquisition of Spanish consonants in children aged 3-5 years, 7 months. Language, Speech, and Hearing Services in Schools, 18(4), 357-363. https://doi.org/10.1044/0161-1461.1804.357

Kehoe, M., & Lleó, C. (2003). The acquisition of syllable types in monolingual and bilingual German and Spanish children. In B. Beachley, A. Brown, & F. Conlin (Eds.), Proceedings of the Twenty-seventh annual Boston University Conference on Language Development (pp. 402-413). Cascadilla Press.

Kehoe, M., & Lleó, C. (2005). The emergence of language specific rhythm in German-Spanish bilingual children. Arbeiten zur Mehrsprachigkeit: Working Papers in Multilingualism, 58. SFB 538.

Kehoe, M., Lleó, C., & Rakow, M. (2004). Voice onset time in bilingual German-Spanish children. Bilingualism: Language and Cognition, 7, 71-88. https://doi.org/10.1017/S1366728904001282

Kern, S., Gayraud, F., & Chenu, F. (2014). The role of input in early first language morphosyntactic development. Language, Interaction and Acquisition, 5(1), 1-18. https://doi.org/10.1075/lia.5.1.00int

Kouti, M. (2010). Problemas perceptivos de la estructura silábica del español por aprendices griegos de E/LE. Revista electrónica de didáctica. Español lengua extranjera (redELE), 19, 19-36.

Lleó, C. (2002). The role of markedness in the acquisition of complex prosodic structures by German-Spanish Bilinguals. International Journal of Bilingualism, 6, 291-313. https://doi.org/10.1177/13670069020060030501

Lleó, C. (2003). Prosodic licensing of coda in the acquisition of Spanish. Probus, 15, 257-281. https://doi.org/10.1515/prbs.2003.010

Lleó, C. (2006). The acquisition of prosodic word structures in Spanish by monolingual and Spanish-German bilingual children. Language and Speech, 49, 205-229. https://doi.org/10.1177/00238309060490020401

Lleó, C. (2012). First language acquisition of Spanish sounds and prosody. In J. I. Hualde, A. Olarrea, & E. O’Rourke (Eds.), The Handbook of Hispanic Linguistics (pp. 693-710). Blackwell Publishing Ltd. https://doi.org/10.1002/9781118228098.ch32

López Valero, A., Carrillo Hernández, M. R., & Ros Frutos, J. L. (1989). Aportaciones para el estudio del desarrollo del lenguaje infantil en el período comprendido entre los veinticuatro y los treinta meses. Cauce, Revista de Filología y su Didáctica, 12, 145-156.

MacWhinney, B. (1996). Computational analysis of interactions. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 152-178). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00006.x

MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of Child Language, 12(2), 271-295. https://doi.org/10.1017/S0305000900006449

Maekawa, J., & Storkel, H. L. (2006). Individual differences in the influence of phonological characteristics on expressive vocabulary development by young children. Journal of Child Language, 33, 439-459. https://doi.org/10.1017/S0305000906007458

Maratsos, M. P. (1974). Children who get worse at understanding the passive: A replication of Bever. Journal of Psycholinguistic Research, 3(1), 65-74. https://doi.org/10.1007/BF01067222

Martínez Celdrán, E., Fernández Planas, A. M., & Carrera Sabaté, J. (2003). Castilian Spanish. Journal of the International Phonetic Association, 33(2), 255-259. https://doi.org/10.1017/S0025100303001373

McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction. University Press.

McLeod, S., & Crowe, K. (2018). Children’s Consonant Acquisition in 27 Languages: A Cross-Linguistic Review. American Journal of Speech-Language Pathology, 27(4), 1546-1571. https://doi.org/10.1044/2018_AJSLP-17-0100

Melgar de González, M. (1976). Cómo detectar al niño con problemas de habla. Trillas.

Menn, L., & Stoel-Gammon, C. (1996). Phonological development. In P. Fletcher, & B. MacWhinney (Eds.), The Handbook of child language (pp. 335-359). Blackwell. https://doi.org/10.1111/b.9780631203124.1996.00014.x

Mines, M. A., Hanson, B. F., & Shoup, J. E. (1978). Frequency of occurrence of phonemes in conversational English. Language and speech, 21(3), 221-241. https://doi.org/10.1177/002383097802100302

Moreno Cabrera, J. C. (1997). Introducción a la lingüística: enfoque tipológico y universalista. Síntesis.

Moreno Sandoval, A., Torre Toledano, D., De La Torre, R., Garrote, M., & Guirao, J. M. (2008). Developing a phonemic and syllabic frequency inventory for spontaneous spoken Castilian Spanish and their comparison to text-based inventories. Proceedings of the VI Language Resources and Evaluation Conference (LREC), 1097-1100.

Piaget, J. (1926). The language and thought of the child. Kegan Paul, Trench & Trubner.

Pierrehumbert, J. B. (2003). Probabilistic theories of phonology. In R. Bod, J. Hay, & S. Jannedy (Eds.), Probability Theory in Linguistics (pp. 177-228). The MIT Press.

Polo, N. (2016). La investigación actual sobre el desarrollo de la fonología del español como lengua materna. Lenguas modernas, 47, 137-152.

Pye, C. L., Ingram, D., & List, H. (1987). A comparison of initial consonant acquisition in English and Quiché. In K. Nelson, & A. Van Kleeck (Eds.), Children’s Language (Vol. 6) (pp. 175-190). Erlbaum. https://doi.org/10.4324/9781315792668-8

Roark, B., & Demuth, K. (2000). Prosodic constraints and the learners’s environment: a corpus study. In Howell, S. Catherine, Sara A. Fish & Thea Keith-Lucas (Eds.). Proceedings of the 24th Annual Boston University Conference on Language Development. Vol. 2 (pp. 597-608). Cascadilla Press.

Rose, Y. (2009). Internal and External Influences on Child Language Productions. In Pellegrino, François, Egidio Marsico, Ioana Chitoran, & Christophe Coupé (Eds.), Approaches to Phonological Complexity (pp. 329-351). Mouton de Gruyter. https://doi.org/10.1515/9783110223958.329

Serra, M. (1983). Normas estadísticas de articulación para la población escolar de 3 a 7 años en el área metropolitana de Barcelona. Revista de Logopedia, Foniatría y Audiología, 3(4), 232-235. https://doi.org/10.1016/S0214-4603(83)75286-1

Stampe, D. (1969). The acquisition of phonetic representation. In R. I. Binnick, A. Davidson, G. Green, & J. L. Morgan (Eds.), Papers from the Fifth Regional Meeting of the Chicago Linguistic Society (pp. 443-454). Chicago Linguistic Society.

Stoel-Gammon, C. (2006). Infancy: phonological development. Encyclopaedia of Language & Linguistics (Second Edition), 642-648. https://doi.org/10.1016/B0-08-044854-2/00838-5

Stoll, S. (2009). Crosslinguistic approaches to language acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 89-104). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.006

Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Harvard University Press.

Valian, V. (2009). Innateness and learnability. In E. Bavin, (Ed.), The Cambridge handbook of language acquisition (pp. 15-34). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.002

Velleman, S. L., & Vihman, M. M. (2002). Whole-Word Phonology and Templates. Language Speech and Hearing Services in Schools, 33(1), 9-23 https://doi.org/10.1044/0161-1461(2002/002)

Vihman, M. M., DePaolis, R. A, & Keren-Portnoy, T. (2009). A dynamic systems approach to babbling and words. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 163-182). Cambridge University Press. https://doi.org/10.1017/CBO9780511576164.010

Vihman, M., & Wauquier, S. (2018). Templates in child language. In M. Hickmann, E, Veneziano, & H. Jisa (Eds.), Sources of variation in first language acquisition: Languages, contexts, and learners (pp. 27-44). John Benjamins Publishing Company. https://doi.org/10.1075/tilar.22.02vih

Zamuner, T. (2009). The structure and nature of phonological neighbourhoods in children’s early lexicons. Journal of Child Language, 36, 3-21. https://doi.org/10.1017/S0305000908008829

Zamuner, T., LouAnn, S., & Gerken y Hammond, M. (2004). Phonotactic probabilities in young children’s speech production. Journal of Child Language 31, 515-36. https://doi.org/10.1017/S0305000904006233