Software-assisted identification of non-native pitch elements for Russian-speaking learners of Spanish

1. INTRODUCTION

⌅

Improvement of the phonetic-phonological competence of second language learners constitutes a multifaceted challenge for educators and learners alike. As posited by Patil and Rao (2012)Patil, V., & Rao, P. (2012). Automatic pronunciation assessment for language learners with acoustic-phonetic features. In Proceedings of the workshop on speech and language processing tools in education, 17-24.
, the fluency of second language speakers is commonly evaluated by comparing their articulation and prosody with those of native speakers. To facilitate the acquisition of acoustic and prosodic features and to identify potential errors in the pronunciation of second language learners, various tools have been developed. Noteworthy examples include the tool introduced by Elvira-García, Farrús, and Garrido-Almiñana (2023)Elvira-García, W., Farrús, M., & Garrido-Almiñana, J. M. (2023, Jun.). Comparison of intensity-based methods for automatic speech rate computation. Loquens, 9(1-2), e090. doi: https://doi.org/10.3989/loquens.2022.e090
, which enables the automatic computation of speech rate; the tools by Patil and Rao (2012)Patil, V., & Rao, P. (2012). Automatic pronunciation assessment for language learners with acoustic-phonetic features. In Proceedings of the workshop on speech and language processing tools in education, 17-24.
and Oplustil and Toledo (2019)Oplustil, P., & Toledo, G. (2019). Uso de una herramienta didáctica para la práctica de la entonación en hablantes no nativos de español. 37-50. doi: https://doi.org/10.21001/sintagma.2019.31.03
, designed for displaying phonetic-phonological similarity; and the tool by Strik, Truong, De Wet, and Cucchiarini (2009)Strik, H., Truong, K., De Wet, F., & Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection. Speech communication, 51(10), 845-852. doi: 10.1016/j.specom.2009.05.007
, tailored for error detection in pronunciation. Despite these valuable contributions, there is currently an absence of a unified tool that encompasses the dual functionality of identifying tonal deviations and providing insights into differences and similarities in L2 learners’ prosody. Moreover, these existing tools fall short in documenting and tracking the progress of second language learners’ phonetic-phonological competence acquisition.

In light of this, Couto-Fernández, Sarymsakova, Condori-Fernández, and Martín-Rodilla (2022)Couto-Fernández, T., Sarymsakova, A., Condori-Fernández, N., & Martín-Rodilla, P. (2022). Plugin for automatisation of phonetic-phonological analysis and obtaining analytical feedback for Spanish learners. In Sepln-pd 2022: Annual conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, 83-87. Retrieved from https://bit.ly/48yNqz2
have developed a software tool titled PAFe (in Spanish: Plugin para el análisis fonético-fonológico en español; in English: Plugin for phonetic-phonological analysis in Spanish) in order to amend phonetic-phonological competence of second language learners and streamline the efforts of L2 educators. This tool is versatile, serving both didactic and autodidactic purposes. As the latest studies by Couto-Fernández (2021)Couto-Fernández, T. (2021). Una herramienta de análisis del habla de audio para proporcionar retroalimentación automática a los estudiantes en la pronunciación en español.
, Sarymsakova (2022)Sarymsakova, A. (2022). La enseñanza del español a rusohablantes por medio de interacciones profesionales simuladas (role-play): el análisis del conflicto comunicativo e intercultural y su relación con la prosodia y el gesto (Doctoral dissertation, Universidade da Coruña). Retrieved from https://bit.ly/3kAevOf
and Couto-Fernández et al. (2022)Couto-Fernández, T., Sarymsakova, A., Condori-Fernández, N., & Martín-Rodilla, P. (2022). Plugin for automatisation of phonetic-phonological analysis and obtaining analytical feedback for Spanish learners. In Sepln-pd 2022: Annual conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, 83-87. Retrieved from https://bit.ly/48yNqz2
show, the current version of PAFe provides comparative-contrastive melodic analysis in terms of suprasegmental speech characteristics for native and non-native speakers, their intonational features in concrete. The tool calculates the similarity of intonation production and detects tonal deviations between native and non-native speakers of Spanish. PAFe provides feedback to learners via both percentage similarity and graphs of pitch contrast, and stores data recorded by reference speakers and learners.

The empirical approach for evaluating the usability of PAFe was employed by Couto-Fernández (2021)Couto-Fernández, T. (2021). Una herramienta de análisis del habla de audio para proporcionar retroalimentación automática a los estudiantes en la pronunciación en español.
in order to test this tool targeting both Spanish L2 students and educators. The assessment included various metrics, namely efficiency (percentage of task completion relative to allocated time), effectiveness (degree of task completion during tool testing), user satisfaction, and usefulness (alignment with user expectations). The outcomes of this evaluation, as reported in the cited work, were derived from survey responses, revealing a notable level of efficiency, effectiveness, and user satisfaction. Notwithstanding, the usefulness metrics underscored the imperative for enhancing the tool’s usability, emphasizing the need for further testing across diverse domains, extending beyond didactics.

The later studies by Couto-Fernández et al. (2022)Couto-Fernández, T., Sarymsakova, A., Condori-Fernández, N., & Martín-Rodilla, P. (2022). Plugin for automatisation of phonetic-phonological analysis and obtaining analytical feedback for Spanish learners. In Sepln-pd 2022: Annual conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, 83-87. Retrieved from https://bit.ly/48yNqz2
and Sarymsakova (2022)Sarymsakova, A. (2022). La enseñanza del español a rusohablantes por medio de interacciones profesionales simuladas (role-play): el análisis del conflicto comunicativo e intercultural y su relación con la prosodia y el gesto (Doctoral dissertation, Universidade da Coruña). Retrieved from https://bit.ly/3kAevOf
have found that, among other results, PAFe provides the most accurate intonational production feedback through its approach of intersyllabic analysis. Based on the outcomes and with the objective of extending contributions to the field of contrastive-comparative prosody studies between Russian and Spanish through the PAFe tool, we address the problem of phonetic-phonological competence acquisition among Russian speaking learners of Spanish We formulate our hypothesis in the present investigation as follows: The application of the PAFe tool automates the discernment of pitch functional elements in both male and female Russian speaking learners of Spanish and serves to furnish valuable insights into potential challenges encountered by these learners in their pursuit of acquiring phonetic-phonological competence of Spanish.

To test this hypothesis, this study aims to achieve the following objectives:

By comparing the empirical data obtained via intersyllabic intonation analysis of native and Russian speaking learners of Spanish performed by PAFe tool, to define the f ₀ functional elements where the most pitch deviations occurred.
To identify which parameters of intersyllabic analysis offered by PAFe require further refinement and implementation in future versions.

The subsequent sections of this manuscript are organized as follows. Section 2 provides a brief overview of the intonational systems of Russian and Spanish, alongside an operation of the PAFe tool and its application in second language didactic studies. Section 3 presents the terminology used throughout our study, poses the research questions, and describes the methodology employed to address the given questions. Section 4 presents the experiment description and the main findings collected within the PAFe tool intersyllabic analysis. Finally, we conclude in Section 5.

2. BACKGROUND

⌅

2.1. Russian and Spanish prosodic systems

⌅

In the context of prior contrastive-comparative investigations into the phonetic-phonological systems of Russian or Spanish, noteworthy contributions have been made by García-Riverón (1980García-Riverón, R. (1980). Sistema sobstvenno voprositel´nyj predlozhenii v sopostovlenii s ispanskim. MGU.
, 1987)García-Riverón, R. (1987). La interrogación. Introducción a su estudio. Editorial Científico-Técnica.
, Mazina (1984) Mazina, L. (1984). Методика обучения студентов-иностранцев интонации русского языка (начальный этап контакта испанского языка с русским) (Doctoral dissertation, Московский государственный педагогический институт иностранных языков имени Мори́са Торе́за). Retrieved from https://bit.ly/47tWcge
, Dmítrieva (2017)Dmítrieva, A. (2017). Estudio recíproco de la fonética y fonología del español y el ruso desde el punto de vista de su adquisición y de su enseñanza como lenguas extranjeras (Doctoral dissertation, Universidad Pablo de Olavide). Retrieved from https://bit.ly/47sVi3G
, and Sarymsakova (2022)Sarymsakova, A. (2022). La enseñanza del español a rusohablantes por medio de interacciones profesionales simuladas (role-play): el análisis del conflicto comunicativo e intercultural y su relación con la prosodia y el gesto (Doctoral dissertation, Universidade da Coruña). Retrieved from https://bit.ly/3kAevOf
. The elucidations provided by these authors distinct dissimilarities in the functioning of melodic patterns within the respective linguistic frameworks of Russian or Spanish. As explained in subsection 4.1, we have selected four speech acts representing pragmatic categories identified as challenging by Sarymsakova (2022)Sarymsakova, A. (2022). La enseñanza del español a rusohablantes por medio de interacciones profesionales simuladas (role-play): el análisis del conflicto comunicativo e intercultural y su relación con la prosodia y el gesto (Doctoral dissertation, Universidade da Coruña). Retrieved from https://bit.ly/3kAevOf
. These acts -namely, threat, politeness, irony, and request- have been deemed challenging due to the pivotal role of pitch in activating implicit meaning. Accordingly, we focus on delineating distinctions in the melodic patterns characterizing these utterances within contributions made by García Riverón, Mazina, Dmitrieva, and Sarymsakova, dedicated to the undertaking of contrastive-comparative analyses within the domain of phonetic-phonological systems in Russian and Spanish.

Henceforth, these authors concur in the observed distinctions in melodic patterns within the Russian and Spanish intonational systems. Regarding the melodic characteristics of speech acts such as threat and request, the Russian intonational system manifests an ascending pitch movement at the centre of its intonational construction. As defined by Bryzgunova (1977)Bryzgunova, E. A. (1977). Zvuki i intonatsiya russkoi rechi. Russkii yazyk.
, the centre of its intonational construction signifies a pitch functional element denoting semantic-pragmatic alterations within an utterance, wherein a declarative statement transitions into an interrogative through the modulation of the centre of intonational construction. Nevertheless, the authors mentioned above underline the incompleteness in the characterization of melodic patterns pertaining to irony and politeness within the Russian intonational system.

Moreover, Hidalgo-Navarro (2019)Hidalgo-Navarro, A. (2019). Sistema y uso de la entonación en español hablado: aproximación interactivo-funcional. Ediciones Universidad Alberto Hurtado.
and Cantero-Serena and Mateo-Ruiz (2013)Cantero-Serena, F. J., & Mateo-Ruiz, M. (2013). La entonación prelingüística del español: implicaciones didácticas. In Aportaciones para una educación lingüística y literaria en el siglo xxi, 2-11.
emphasize the employment of circumflex (ascending-descending) tonal movements in the tonemes associated with politeness and irony (which are not entirely delineated within the Spanish system). Additionally, they identify the utilization of plain tonemes in threat expressions and ascending tonemes in politeness utterances.

In summary, the aforementioned authors delineate fundamental challenges encountered by Russian speakers in the acquisition of Spanish melodic patterns. Certain pitch features associated with speech acts, such as irony and politeness, lack a clear definition within the Russian intonational system. While the Russian intonational system exhibits modulation of the pitch functional element known as the center of intonational construction, Spanish pitch features are distinctly specified for particular speech acts.

Drawing upon the abovementioned dissimilarities of melodic patterns of Russian or Spanish and possible challenge for phonetic-phonological competence acquisition among Russian speaking learners of Spanish, we aim to contribute to the field of comparative-contrastive studies of intonational systems of Russian or Spanish by employing the PAFe tool, which is expounded upon below.

2.2. PAFe operation and usage

⌅

The PAFe tool has been created as an extension to an existing desktop application for acoustic speech analysis, Praat, developed by Boersma and Weenink (2019)Boersma, P., & Weenink, D. (2019). Praat: doing phonetics by computer (version 6.0. 52) [windows]. Available at: https://www.fon.hum.uva.nl/praat/.
. PAFe consists of a series of scripts in the Python programming language and implements three intonation comparison algorithms of an ELE learner (in Spanish: Español como Lengua Extranjera; in English: Spanish as Foreign Language) and a native Spanish speaker. This software tool is based on the Praat architecture (Boersma & Weenink, 2019Boersma, P., & Weenink, D. (2019). Praat: doing phonetics by computer (version 6.0. 52) [windows]. Available at: https://www.fon.hum.uva.nl/praat/.
), to which a new module (PAFe, as an extension of Praat) is coupled (see Figure 1) made up of Praat scripts, Python code and a PostgreSQL database. Praat, through its scripting, allows command line calls to other systems, as described by Pop and Altar (2014)Pop, D.-P., & Altar, A. (2014). Designing an MVC model for rapid web application development. Procedia Engineering, 69, 1172-1179. doi: https://doi.org/10.1016/j.proeng.2014.03.106
, making it possible to extend the application through the use of other languages and technologies external to Praat.

Figure 1. PAFe global architecture. Image credit:

medium/medium-LOQUENS-10-1-2-e104-gf1.png

This new module (PAFe) communicates with the original system by means of new Praat scripts associated with the application’s menu items (see Figure 2), from which these files are executed. Sometimes, the new module dispenses with calls to Praat and generates information windows directly from Python code files.

Figure 2. User Interface visualising the new functionalities added to Praat.

medium/medium-LOQUENS-10-1-2-e104-gf2.png

Natural language processing and audio processing techniques are used in the PAFe tool, taking as the main source the human voice recordings of native speakers and students. This tool allows three different types of analysis: global, tonal tendency, and intersyllabic, as Couto-Fernández et al. (2022)Couto-Fernández, T., Sarymsakova, A., Condori-Fernández, N., & Martín-Rodilla, P. (2022). Plugin for automatisation of phonetic-phonological analysis and obtaining analytical feedback for Spanish learners. In Sepln-pd 2022: Annual conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, 83-87. Retrieved from https://bit.ly/48yNqz2
describe. In addition, PAFe provides its users with (a) a database containing their user profile, pronunciation exercises and WAV. audios, and (b) a graphical interface including reports on their progress, such as improvement of their pronunciation in Praat.

Concerning its application, the PAFe tool has predominantly undergone testing within the domain of second language didactics. As previously anticipated in Section 1, subsequent evaluations of the PAFe tool underscore its merits as a computer-assisted instrument for enhancing phonetic-phonological competence among both second language educators and learners (Couto-Fernández, 2021Couto-Fernández, T. (2021). Una herramienta de análisis del habla de audio para proporcionar retroalimentación automática a los estudiantes en la pronunciación en español.
; Sarymsakova, 2022Sarymsakova, A. (2022). La enseñanza del español a rusohablantes por medio de interacciones profesionales simuladas (role-play): el análisis del conflicto comunicativo e intercultural y su relación con la prosodia y el gesto (Doctoral dissertation, Universidade da Coruña). Retrieved from https://bit.ly/3kAevOf
). A notable outcome from prior PAFe evaluations, in addition to presenting feedback similarity percentages in intonational production, is the participants’ recognition of its capacity for graphical visualization of native and non-native melodic curves that indicate each syllable tonal movement (Couto-Fernández, 2021Couto-Fernández, T. (2021). Una herramienta de análisis del habla de audio para proporcionar retroalimentación automática a los estudiantes en la pronunciación en español.
). This feature was deemed a valuable resource for aiding in the improvement and acquisition of melodic patterns in the target language.

Following a brief overview of relevant background related to our study and connected to comparative-contrastive studies on Russian and Spanish intonation, as well as aligned with the PAFe operation and testing, the ensuing section will focus on delineating the methodologies and techniques employed in the processing and interpretation of our data.

3. METHODOLOGY

⌅

Due to its widespread use in the design of experiments and empirical studies in information systems and software engineering (Martin-Rodilla, Panach, Gonzalez-Perez, & Pastor, 2018Martin-Rodilla, P., Panach, J. I., Gonzalez-Perez, C., & Pastor, O. (2018). Assessing data analysis performance in research contexts: An experiment on accuracy, efficiency, productivity and researchers’ satisfaction. Data & Knowledge Engineering, 116, 177-204. doi: https://doi.org/10.1016/j.datak.2018.06.003
; Panach, España, Dieste, Pastor, & Juristo, 2015Panach, J. I., España, S., Dieste, O., Pastor, Ó., & Juristo, N. (2015). In search of evidence for model-driven development claims: An experiment on quality, effort, productivity and satisfaction. Information and software technology, 62, 164-186. doi: https://doi.org/10.1016/j.infsof.2015.02.012
), we chose Wohlin’s framework (Wohlin et al., 2012Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Springer Science & Business Media. doi: https://doi.org/10.1007/978-3-642-29044-2
) for the design of an experiment to test our hypothesis as formulated in the Section 1. Specifically, Wohlin’s framework includes diverse phases inherent to experimental design, such as scoping, hypothesis formulation, experimental design, operation, analysis and interpretation, hypothesis testing, and conclusion. This framework, in addition to being used for software validation and methodological proposals in various domains, has been used for the initial validation of tools and methodologies in areas such as Digital Humanities and/or Natural Language Processing as in Martin-Rodilla and Gonzalez-Perez (2023)Martin-Rodilla, P., & Gonzalez-Perez, C. (2023). Same text, same discourse? Empirical validation of a discourse analysis methodology for cultural heritage. Digital Scholarship in the Humanities, 38(1), 224-239. doi: https://doi.org/10.1093/llc/fqac038
. Therefore, we consider it appropriate as a reference for our experimentation.

Considering the Wohlin’s framework, we chose to test our hypothesis as formulated in Section 1 following the summary of our scoping, i.e. we analyse the pitch deviation of Russian speaking learners’ of Spanish for the purpose of identification of potential challenges in phonetic-phonological competence acquisition with respect to the background on previous comparative-contrastive studies from the point of view of PAFe users in the context of the linguistic studies. This study also aims to identify areas for improvement in the tool’s operability.

In order to provide accurate contrastive-comparative intonation analysis data of native and non-native speakers through PAFe tool, we base our study on the Melodic Speech Analysis methodology of Cantero-Serena (2002Cantero-Serena, F. J. (2002). Teoría y análisis de la entonación (Vol. 54). Edicions Universitat Barcelona.
, 2019)Cantero-Serena, F. J. (2019). Análisis prosódico del habla: más allá de la melodía. Comunicación Social: Lingüística, Medios Masivos, Arte, Etnología, Folclor y otras ciencias afines, 2, 485-498.
and employ the following essential principles from speech processing:

We annotate the syllables of each speech act in a Praat textgrid (Boersma & Weenink, 2019Boersma, P., & Weenink, D. (2019). Praat: doing phonetics by computer (version 6.0. 52) [windows]. Available at: https://www.fon.hum.uva.nl/praat/.
); we identify pitch values of all vowels in the syllables (voiced or voiced consonants are also measured), using the Praat Script developed by Mateo (2010aMateo, M. (2010a). Protocolo para la extracción de datos tonales y curva estándar en análisis melódico del habla (AMH). Phonica, 6, 49-90. doi: https://doi.org/10.1344/phonica.2010.6.49-90
, 2010b) Mateo, M. (2010b). Scrips en Praat para la extracción de datos tonales y curva estándar. Phonica, 6, 91-111. doi: 10.1344/phonica.2010.6.91-111
, which extracts the absolute values in Hz, relativises them and draws the graph of the standardised melody;
We select relevant frequency values between tonal segments from irrelevant values; according to Cantero-Serena (2002Cantero-Serena, F. J. (2002). Teoría y análisis de la entonación (Vol. 54). Edicions Universitat Barcelona.
, 2019)Cantero-Serena, F. J. (2019). Análisis prosódico del habla: más allá de la melodía. Comunicación Social: Lingüística, Medios Masivos, Arte, Etnología, Folclor y otras ciencias afines, 2, 485-498.
, Font-Rotchés and Cantero-Serena (2008Font-Rotchés, D., & Cantero-Serena, F. J. (2008). La melodía del habla: acento, ritmo y entonación. Eufonía. Didáctica de la música, 43, 19-39. Retrieved from https://bit.ly/3NYz0zF
, 2009)Font-Rotchés, D., & Cantero-Serena, F. J. (2009). Melodic analysis of speech method (MAS) applied to Spanish and Catalan. Phonica, 5, 33-47.
, less than 10 percent difference between segments is considered imperceptible.

As for the classification of the functional elements of pitch contour, we refer to Cantero-Serena (2002Cantero-Serena, F. J. (2002). Teoría y análisis de la entonación (Vol. 54). Edicions Universitat Barcelona.
, 2019)Cantero-Serena, F. J. (2019). Análisis prosódico del habla: más allá de la melodía. Comunicación Social: Lingüística, Medios Masivos, Arte, Etnología, Folclor y otras ciencias afines, 2, 485-498.
, Font-Rotchés and Cantero-Serena (2008Font-Rotchés, D., & Cantero-Serena, F. J. (2008). La melodía del habla: acento, ritmo y entonación. Eufonía. Didáctica de la música, 43, 19-39. Retrieved from https://bit.ly/3NYz0zF
, 2009)Font-Rotchés, D., & Cantero-Serena, F. J. (2009). Melodic analysis of speech method (MAS) applied to Spanish and Catalan. Phonica, 5, 33-47.
studies. We assume body to be the part of the melody that goes from the first peak (the first tonic vowel) to the last tonic vowel of the contour, which we call the core (or toneme). In our study we also distinguish the final inflection, which refers to the syllables after the last tonic syllable or core. Finally, the anacrusis, represents the unstressed syllables preceding the first peak.

In order to reach the objectives of the study as formulated in Section 1 and apply the aforementioned methods and approaches, the following research questions were specified:

Which pitch functional elements, identified by the PAFe tool, pose particular challenges for male and female Russian speaking learners of Spanish as a second language (L2) striving to enhance their phonetic-phonological competence?
Does the tool provide sufficient data for intonational analysis of native and Russian speaking learners’ of Spanish melodic curve?

4. EXPERIMENTAL DESIGN AND DATA PROCESSING RESULTS

⌅

4.1. Data collection

⌅

In accordance to the Wohlin’s stage of experimental design, the preliminary data on assessment of pitch functional elements via PAFe feedback has been obtained through multiple recording sessions with Russian speaking learners of Spanish. In order to carry out these recordings, we have chosen four indirect speech acts that have been identified as “challenging” by Sarymsakova (2022)Sarymsakova, A. (2022). La enseñanza del español a rusohablantes por medio de interacciones profesionales simuladas (role-play): el análisis del conflicto comunicativo e intercultural y su relación con la prosodia y el gesto (Doctoral dissertation, Universidade da Coruña). Retrieved from https://bit.ly/3kAevOf
due to the main role of intonation in activation of implicit meaning. These speech acts (SA) are presented in Table 1 and Table 2 include the following lexical-syntactic structures and pragmatic functions (PF):

SA 1: “Y, claro, luego usted será responsable”; PF: indirect threat.¹In case of male non-native speakers, the text of the reproduced SA 1 and SA 4 is slightly different due to native speakers’ recordings which represent semi-spontaneous communicative interaction: “Luego usted será responsable” and “Eso no puede volver a repetirse”. The lexical-syntactic structure of these SA does not affect the results of intonational similarity because SA have the same functional elements of the pitch contour.
SA 2: “No tengo ninguna duda de que sois buenas guías, eso ya lo sé”; PF: negative politeness.
SA 3: “Con eso entiendo que es usted muy listo para dar las explicaciones”; PF: irony.
SA 4: “Pero, eso no puede volver a repetirse”; PF: indirect request.

In English: SA 1: "And, of course, then you will be responsible"; SA 2: "I have no doubt that you are good guides, I know that"; SA 3: "Saying that I understand that you are very smart to give explanations"; SA 4: "But, that can't happen again".

The recording procedure design consists of the following steps:

Preparation: the students are required to complete certain exercises in order to train and test their phonetical-phonological competence in Spanish as L2.²The didactical sequence for the mentioned exercises has been proposed by Sarymsakova (2022).
Beginning: listening to the audios we have previously recorded with two native speakers of Spanish (young female and male primary and secondary school teachers from urban area of Galicia). This stage is repeated at least 5-10 times or until the students memorise and freely reproduce the utterance and intonation of the reference audio. It is worth mentioning that we have not provided the written text of the utterances in order for the participants to show their level of listening comprehension and, in this way, confirm their command of intermediate-advanced level of spoken Spanish.
Recording of the utterances: the obtained result represents four audios of four reproduced utterances (maximum duration 2-4 seconds) for each of our students.
Data collection: in total, 5 female learners (24-34 years old) and 4 male learners (18-35 years old) whose L1 is also Russian have participated in this empirical study. All our students are B1-B2 level and do not currently reside in a Spanish-speaking country. We have recorded 36 audios, thus, throughout this step, we have produced 36 files in the .TextGrid format containing the syllable-by-syllable of each of the utterances respectively. For the TextGrid annotations, we used the Schiel (1999)Schiel, F. (1999). Automatic phonetic transcription of non-prompted speech. In Proceedings of the XIVth International Congress of Phonetic Sciences: ICPhS 99; San Francisco, 1-7 August 1999, San Francisco: 607-610. Retrieved from https://bit.ly/3TRkLQX
; Kisler, Reichel, and Schiel (2017)Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326-347. doi: https://doi.org/10.1016/j.csl.2017.01.005
software tools to automatize TextGrid generation. In order to perform the intersyllabic type of intonational analysis offered by PAFe, we stored both the audios and the .TextGrids of our learners and native speakers in PAFe memory using the “Upload .WAV file” and “Upload.TEXTGRID file” options.

4.2. Data analysis outcomes

⌅

Hereafter, we present the results obtained after synthesis of collected data. Table 1 shows the outcomes we got after the recordings with our five female non-native speakers (Sp 1F, 2F, 3F, 4F, 5F) and intersyllabic analysis performed by PAFe tool. Table 1 illustrates five Russian female learners took part in the aforementioned recordings, which were used for PAFe intersyllabic tests; the task was to reproduce four indirect speech acts pronounced by a native speaker of Spanish as accurately as possible, conveying the pragmatic feature indicated before (first paragraph of the present section). By comparing the results obtained from PAFe’s automatic calculations when running the intersyllabic analysis algorithm, we observe that the results range for these speakers is from 80.93% to 93.41% of intonation accuracy, which we consider closely similar to the intonation of the native speaker.

Table 1. Results of the intersyllabic analysis by PAFe tool for female Russian speaking learners. Intonation similarity percentage. SA = speech act. Sp = speaker. F = female.

SA	Sp 1F	Sp 2F	Sp 3F	Sp 4F	Sp 5F
1	88.23	88.69	86.38	88.69	88.23
2	86.05	87.05	90.60	91.10	91.65
3	88.55	88.77	87.86	93.41	89.68
4	80.93	90.71	90.57	88.71	89.79

Table 2. Results of the intersyllabic analysis by PAFe tool for male Russian speaking learners. Intonation similarity percentage. SA = speech act. Sp = speaker. M = male.

SA	Sp 1M	Sp 2M	Sp 3M	Sp 4M
1	80.20	79.90	78.70	83.30
2	90.70	80.10	74.00	74.20
3	87.18	77.50	74.77	78.14
4	87.67	81.92	83.25	76.00

Regarding the intersyllabic analysis results of our male non-native speakers (Sp 1M, 2M, etc.), see Table 2. Taking the same considerations about the stages of the recording performance as described above, the similarity percentages for our male Russian speaking learners of Spanish range from 74% to 90.70%. These results show slightly lower similarity in reproducing the intonation of the native speaker compared to the percentage of female learners. However, we still consider them quite satisfactory because none of our participants showed less than 70% similarity. Regarding the pitch difference per syllable of each reproduced speech act, see Figures 3-8Figures 3, 4, 5, 6, 7, 8.

Figure 3 indicates the major difference (from here onwards marked in red for female speakers) for SA 1 produced by female non-native speakers appears in the final inflection (last unstressed syllable) ble (33%). Likewise, in the body of SA 1, the unstressed syllable us in the body of the pitch contour indicates a percentage of differentiation of 18.2%.

Figure 3. Average difference of pitch deviation by syllables for SA 1 reproduced by female non-native students. Red is used to highlight the syllables with the largest difference.

medium/medium-LOQUENS-10-1-2-e104-gf3.png

Regarding the SA 1 by male non-native speakers, Figure 4 shows the largest difference (from here onwards marked in green for male speakers) in the body, the unstressed syllable se (36%) and the stressed syllable rá (29.5%) of the nucleus of the verbal syntagm.

Figure 4. Average difference of pitch deviation by syllables for SA 1 reproduced by male non-native students. Green is used to highlight the syllables with the largest difference.

medium/medium-LOQUENS-10-1-2-e104-gf4.png

Figure 5. Average difference percentage of pitch deviation by syllables for SA 2 reproduced by both female and male non-native students. Red is used to highlight the syllables with the largest difference in female speakers. Green is used to highlight the syllables with the largest difference in male speakers.

medium/medium-LOQUENS-10-1-2-e104-gf5.png

Figure 6. Average difference percentage of pitch deviation by syllables for SA 3 reproduced by both female and male non-native students. Red is used to highlight the syllables with the largest difference in female speakers. Green is used to highlight the syllables with the largest difference in male speakers.

medium/medium-LOQUENS-10-1-2-e104-gf6.png

Figure 7. Average difference of pitch deviation by syllables for SA 4 reproduced by female non-native students. Red is used to highlight the syllables with the largest difference in female speakers.

medium/medium-LOQUENS-10-1-2-e104-gf7.png

Figure 8. Average difference of pitch deviation by syllables for SA 4 reproduced by male non-native students. Green is used to highlight the syllables with the largest difference in male speakers.

medium/medium-LOQUENS-10-1-2-e104-gf8.png

As for SA 2 with negative politeness pragmatic function, Figure 5 displays the largest syllabic difference for female speakers in the subordinate part, in the nexus que (18.2%) and the unstressed syllable nas (29.6%), both of them included to the body of the SA 2 utterance. Likewise, the male speakers showed 24.5% in the same pitch contour element (syllables na and bue).

The data from Figure 6 presents pitch contour elements which has the major pith difference for SA 3. The ironical speech act by female speakers distinguishes three main tonal deviations: the body of the utterance formed by do (19%) and to (16%), and the final inflection nes (15.8%). On the other hand, the male Russian speaking learners’ of Spanish utterance reveals only two main points of pitch contrast: tien (26.5%) and pa (27.3%) syllables, both from the body of ironical speech act. It should be noted that, in the case of the final inflection nes, the difference shown by PAFe is 0.01 due to imperceptible pith values of this segment (less than 75 Hz, due to PAFe default settings, as an extension of Praat).

Finally, Figure 7 shows the pitch difference in the body of indirect request speech act 4, namely in the unstressed syllable vol (23.8%), and in the final inflection represented by the syllable se (34%). Regarding the male speakers’ tonal contrast found for SA 4, Figure 8 shows it in the first peak (first stressed vowel), specifically in the stressed syllable e of the demonstrative pronoun (33.5%), as well as in the beginning of the body of the same pitch contour, concretely in the unstressed syllable pe (22%).

In order to adequately interpret the provided empirical data obtained via intersyllabic intonation analysis of native and Russian speaking learners of Spanish performed by PAFe tool, we have obtained the pitch functional elements where the most of f ₀ deviations occurred for female and male Russian speaking learners of Spanish. Figure 9 shows the most of the pitch deviations detected by PAFe, in the case of female Russian learners, occurs in the body (10.71%) and in the final inflection (27.60%) of the pitch contour, while our male non-native participants (Figure 10) showed a greater pitch contrast in the body (20.73%) and in the first peak (20.13%).

Figure 9. Pitch difference percentage detected by PAFe for functional elements in the Russian female pitch contour.

medium/medium-LOQUENS-10-1-2-e104-gf9.png

Figure 10. Pitch difference percentage detected by PAFe for functional elements in the Russian male pitch contour.

medium/medium-LOQUENS-10-1-2-e104-gf10.png

Nevertheless, Figure 9 and Figure 10 illustrate that there were instances where the comparison was deemed inapplicable, evident by tone values equivalent to 0.01. Given the emergence of such data, statistical testing became imperative. Consequently, we employed the software tool proposed by Rodríguez-Fernández, Canosa, Mucientes, and Bugarín (2015)Rodríguez-Fernández, I., Canosa, A., Mucientes, M., & Bugarín, A. (2015). STAC: a web platform for the comparison of algorithms using statistical tests. In Proceedings of the 2015 ieee international conference on fuzzy systems (fuzz-ieee).
to assess the statistical significance of the experimental data. We opted for the Friedman Aligned Ranks test due to the fact that the experimental data are not normally distributed and the property of homocedasticity is not satisfied. The resulting p-value, determined to be 0.03769 (with a significance level of 0.05), was observed for male first peak values, as well as female body tonal values. This implies that these pitch functional elements bear statistical significance within our study.

After all, the data obtained on Russian speaking learners of Spanish via intersyllabic analysis provided by PAFe tool and tested statistically have revealed the percentage of tonal deviations in functional elements of intonational contour (anacrusis, first peak, body, nucleus and final inflection) and have shown slight differences in those elements in speech acts produces by male and female speakers.

5. CONCLUSIONS

⌅

The study we have carried out showed that the PAFe tool facilitates identification of non-native pitch deviation patterns of the pitch contour and provides quantitative data on it automatically.

In summary, we highlight the following key aspects which we have set out in the Introduction section and which meet the initial objectives aroused within our hypothesis formulated in Section 1:

According to the empirical data obtained within the intersyllabic PAFe tests carried out with our Russian speaking learners of Spanish, we have identified that most of the tone deviation, in the case of female learners, occurs in the body of the four reproduced speech acts, while male Russian participants’ pitch contour elements with greater melodic contrast was the first peak. The findings also imply that the pitch functional elements in question pose noteworthy challenges for Russian speaking learners of Spanish attempting to acquire phonetic-phonological proficiency in Spanish, as they manifest the most prominent tonal deviations. These outcomes substantiate the validity of our hypothesis, as the application of PAFe in the current research has effectively automated the discrimination of pitch functional elements and has yielded valuable data elucidating the potential impediments faced by Russian speaking learners of Spanish.
The intersyllabic functionality of PAFe tool enables automatization of melodic contrastive-comparative analysis for native and non-native speakers. However, there is a sequence of aspects that may limit melodic elements identification. The range of tonal values does not permit the analysis of pitch bellow 75 Hz. Also, as we indicated in Data collection subsection, audio and annotation files for native and non-native speakers (PAFe´s input files) are obtained separately. In to ease the intersyllabic analysis, the process of input files collection should be automatized and integrated to the PAFe interface, which we intend to develop in its future versions and include these variables in future experiments to be carried out.

Regarding future lines of research, we expect to deepen the study of gender differences in functional speech element and explain why female Russian speakers present more tonal deviance in final inflection, while male non-natives show the greatest contrast in first peak. Finally, our future purpose is to collect more data on functional speech element pitch contrast for Spanish native and non-native speakers by carrying out studies with Spanish students with different L1 and, on the basis of synthesis of this data with the PAFe tool, generate a dataset for Spanish non-native speakers voice recognition software.