Age and gender effects in European Portuguese spontaneous speech

: Aging is part of the normal evolution of human beings. However, the knowledge about speech in the older ages is still dispersed and incomplete. Considering conflicting findings reported in prior research, this study aims to contribute to increase our knowledge about age effects on the spontaneous speech of Portuguese adults. In order to analyze the effects of age on rhythmic, intonation and voice quality domains, several parameters were extracted from spontaneous speech produced by 112 adults, aged between 35 and 97. Data were obtained through a picture description task. The results showed that the most consistent age-related effects are an increase in speech pauses, mainly in men, and a Harmonics-to-Noise Ratio (HNR) decrease in women. Speaking fundamental frequency ( f 0 ) tends to decrease in women and to slightly increase in men with age. These findings for Portuguese are in line with previous research suggesting that suprasegmental characteristics of speech change with age, with some gender differences.


INTRODUCTION
The older population is quickly increasing worldwide. Demographic aging, due primarily to lower fertility, also reflects a human success history of increased longevity (He, Goodkind, & Kowal, 2016;Kart & Kinney, 2001). According to the World Health Organization, the number of people aged over 65 is rising, and by 2050, 1 in 6 people in the world will be over the age of 65, up from 1 in 11 in 2019 (United Nations Department of Economic and Social Affairs Population Division, 2019). Portugal is also one of the developed countries with the highest rate of older population (between 1970 and 2019, the percentage of people aged 65 and over increased from 9.7% to 22.4%) (Statistics Portugal, 2015, 2019United Nations Department of Economic and Social Affairs Population Division, 2019).
Aging, as a normal and inevitable process, involves physiological, cognitive, psychological, and social changes (Hermes, Mertens, & Mücke, 2018;Makiyama & Hirano, 2017). The physiological aging affects all speech organs. In this sense, age-related changes in speech organs as well as in neuro-muscular control determine considerable variation in speech production and speech planning across the life span (Hermes et al., 2018;Pellegrino, He, & Dellwo, 2018). These changes alter considerably the way individuals speak over time both segmentally and suprasegmentally (Pellegrino et al., 2018). Although most studies have focused on the production of segments, such as vowels, the suprasegmental characteristics of speech are also particularly vulnerable to age. However, the effect of aging on suprasegmental characteristics of speech, i.e., in speech rhythm and intonation has been investigated in far less detail.
Concerning European Portuguese (EP), there are only a few studies regarding age-related speech variations, mainly, at the segmental level (Albuquerque, Oliveira, Teixeira, Sa-Couto, & Figueiredo, 2019;Albuquerque et al., 2014;Guimarães & Abberton, 2005;Pellegrini et al., 2013). Still, vowel production in reading tasks failed to provide a clear picture of the age effects in spontaneous speech and has some limitations. To the best of our knowledge, there is almost no data on the acoustic correlates of aging on EP spontaneous speech (Guimarães & Abberton, 2005).
For this reason, the aim of this study is to analyze the effects of aging on speech rhythm (duration measures, speech and articulation rate), intonation (speaking fundamental frequency (f 0 )) and voice quality (Harmonics-to-Noise Ratio (HNR)) measured in spontaneous speech. This study considers age as a continuous variable in the analysis avoiding the effects of arbitrary age groups division (as in Albuquerque, Oliveira, Teixeira, Sa-Couto, & Figueiredo (2020)).
Since there is a paucity of literature on EP speech acoustics and the available data were collected from a small number of speakers (Albuquerque et al., 2014;Escudero, Boersma, Rauber, & Bion, 2009;Martins, 1973;Oliveira, Cunha, Silva, Teixeira, & Sa-Couto, 2012), this study also provides valuable insights to understand the aging effects on natural spoken language production.
Knowledge of how speech changes with age is essential for the development of automatic speech recognition (ASR) systems suitable for older voices (i.e., personalized reading aids and voice prostheses) (Vipperla, Renals, & Frankel, 2010), for clinical assessment and treatment of different speech disorders that are often age-related, and to provide information for other fields of knowledge (i.e., phonetics, speech science, forensic linguistics and biometric recognition).

RELATED WORK
Many studies have addressed the effects of aging on the acoustic properties of speech (Eichhorn, Kent, Austin, & Vorperian, 2018;Linville, 2001;Schötz, 2006). Most of them have focused on segmental features and have shown an increase of segment duration (Albuquerque et al., 2020;Linville, 2001;Schötz, 2006). Despite spontaneous speech characteristics having been investigated in far less detail, acoustic studies have also revealed age-induced effects at suprasegmental level (i.e., intonation and rhythm), but they have focused mainly on reading speech.
For EP, acoustic studies about age-related changes on spontaneous speech are scarce. Previous studies with different speech corpora (e.g., reading or conversation), indicated a slight or significant age-related f 0 decrease in women, and a non-agreement in men (Albuquerque et al., 2019(Albuquerque et al., , 2014Guimarães & Abberton, 2005;Pellegrini et al., 2013). In reading tasks, Pellegrini et al. (2013) refer that speech rate decreases and the percent pause ratio increases with age for both genders. Given that previous research used different corpora and analysis procedures, it is hard at this time to draw solid conclusions on the effects of age and gender on EP spontaneous speech.

METHOD
This cross-sectional study was approved by the Ethics Committee Centro Hospitalar São João/ Faculty of Medicine, University of Porto, Portugal (approval number N38/18), and all participants signed a written informed consent form.

Participants
A total of 112 native Portuguese speakers, from the central region of Portugal, aged between 35 and 97, participated in this study. They covered 4 age groups: Each participant completed a written questionnaire and reported: no history of speech-language impairment; no severe hearing problems; and no history of neurological disorders or head/ neck cancer. Also, the participant inclusion criteria consisted of individuals free of upper respiratory tract infection for 3 weeks prior to the experiment. Participants were excluded if they: 1) were current smokers or had smoked within the previous 5 years; 2) self-reported poor general health; 3) wore hearing aids, 4) had received speech and language therapy, and 5) reported that their voice was different than usual on the day of testing (i.e., having a cold or allergy symptoms). Participants who exhibited any observable sign of speech, voice or severe hearing problems, as assessed by a speech pathologist on the moment of recording, and those who were unable to follow directions were excluded.

Corpus and recording protocol
Spontaneous speech samples were collected from the participants using the Boston Diagnostic Aphasia Examination (Goodglass & Kaplan, 1983) picture description task, with the standardized picture "Cookie Theft" stimulus. Participants were instructed to describe the picture at comfortable pitch and loudness level, after familiarizing themselves with the image, in order to obtain induced spontaneous speech (Morgan & Rastatter, 1986;Pakhomov et al., 2011). The instruction given to participants was as follows: "Tell me everything you see in this picture.".
Recordings took place in quiet rooms using an AKG condenser microphone and USB external soundcard (PreSonus), with a sampling rate of 44100 Hz. The image was presented on the computer screen with software system SpeechRecorder (Draxler & Jänsch, 2017). The participants were seated at a table and the microphone was adjusted to each participant and positioned at an approximately 15-20 cm distance from the mouth.

Suprasegmental annotation
The recorded data was segmented through automatic Praat scripts. The speech and pauses were labeled, and in the speech intervals, the vowel onsets were also detected.
The automated alignments of silent pauses were manually checked by two trained analyzers, who verified the accuracy of pause and speech intervals, and also labeled intervals with speaker or environmental noise. The intervals were labeled as: pause (breathing sound was considered as silent pause), speech, verbal non lexical (i.e., filled pauses), noise (i.e., noise that occurs during the speaker's pauses), vocal non lexical (i.e., laughter, coughing or other human noises), and speech with noise (i.e., speech intervals with environmental noise that could affect the acoustic measurements) (Pellegrino, 2019;Schuller et al., 2013).
Vocal non lexical phenomena were considered as pause time, while verbal non lexical phenomena were not included in the present analysis (Pellegrino, 2019;Tuomainen, Hazan, & Taschenberger, 2019). Speech intervals with noise were not counted for further analysis, and also the beginning and ending of all recordings were not pondered in the analysis due to sentence initial and final acoustic variability (a total of 7% of the speech intervals were excluded).
Regarding the syllables spoken, an adapted Praat script of the BeatExtractor (Barbosa, 2006(Barbosa, , 2010 was used to detect vowel onset using a beat wave (a normalized and band-specific amplitude). The cut-off frequency was defined automatically. The thresholds were 0.1 and 0.06; the filter was defined as Butterworth and the technique was Amplitude. The total number of syllables were automatically obtained through the sum of all vowel onset detected within all valid speech intervals per speaker. A random check was done to verify the vowel onsets and confirm the script performance.

Acoustic measurements
Acoustic parameters (f 0 and HNR) were automatically extracted from the valid speech intervals, using the Praat script ProsodyDescriptor (Barbosa, 2013), with the f 0 threshold of 75 -400 Hz for males, and 120 -600 Hz for females. The script extracted and calculated the parameters f 0 mean (semitones (re 1 Hz)) and HNR. Each value was considered and used to obtain the average of speaking f 0 for each participant. The f 0 scale with values in semitones relative to 1 Hz was converted to Hertz.
In order to analyze the age effects in rhythm, intonation and voice quality, several parameters (Table 1) were extracted from spontaneous speech, based on the suprasegmental annotation of speech, pauses and syllables.

RESULTS
This section presents the results of the selected acoustic parameters, and their statistical analyses.
The multiple linear regression coefficients are displayed in Table 2 and the results revealed significant differences between genders for the following parameters: speech rate, articulation rate, speaking f 0 and HNR. There was a significant age effect in males for total speech duration, percent pause time, mean pause duration, speech variability and number of syllables; and in females for the suprasegmental parameters: speech rate, articulation rate and HNR.

Rhythmic parameters
Concerning the mean pause duration (see Figure  1) and the percent pause time (see Figure 2), the re-gression lines showed an increase with age, mainly in males, which means pause duration increases from approximately 0.7 s at age 35 to 1.4 s at age 100, and the percent pause time increases 20% in the same age     interval. The multiple linear regression revealed that, for both parameters, the age effect is only significant for men. Regarding articulation rate ( Figure 3): (1) age effect is gender dependent, with men presenting a 9.4% higher mean articulation rate than women; (2) difference between genders decreases with age, with older men and women presenting similar mean articulation rate, due to an increase in women's articulation rate. The regression model confirmed the gender differences and also showed a significant increase with age for female. Speech rate (Figure 4), as articulation rate, presented a tendency to increase in women. In men, a higher tendency to decrease, due to a rise in the percent pause time with age was observed.
Additionally, a decrease of the total speech duration with age, mainly in males was seen ( Figure 5), which is in line with the significant decrease of the number of syllables also observed for males. For both genders, speech duration variability decreases with age, but is only significant in males.

Speaking f 0
In Figure 6 a large age effect in speaking f 0 was not observed. In men speaking f 0 tended to increase with aging. Conversely, in women speaking f 0 tended to slightly decrease with age. As expected, male speakers had a significantly lower speaking f 0 when compared to female speakers regardless of age.

DISCUSSION
This study provides a base to complement previous studies in age effects, mostly at the suprasegmental level. The age effects in rhythmic parameters, speaking f 0 and HNR of the EP spontaneous speech were explored through a picture description task. Firstly, the older males produced shorter descriptions than younger adults, which may be related to task nature or indicate differences in linguistic domain (Mortensen, Meyer, & Humphreys, 2006).
Our analyses of rhythmic variation with age are in line with previous findings for other languages, mostly for men (Hartman & Danhauer, 1976;Steffens, 2011), where older men presented significantly more pause time and longer pause duration when compared with the young male speakers. Although men spoke faster than women, as in Jacewicz, Fox, & Wei (2010) and Verhoeven et al. (2004), this difference decreased with age. The faster articulation rate in older women is not in agreement with the general trend (Hazan et al., 2018;Linville, 2001), although some studies (Brückl & Sendlmeier, 2003;Jacewicz, Fox, O'Neill, & Salmons, 2009;Jacewicz et al., 2010) refer no age-related differences. Furthermore, a longitudinal study of spontaneous speech (Gerstenberg et al., 2018) also reported that articulation rate tends to increase in French speakers with age, presumably due to language specific effects or due to age-related changes in the cardiovascular system (i.e., older speakers may inhale more often, but may compensate for the reduced respiratory capacities by an increase in articulation rate to maintain information density) (Gerstenberg et al., 2018). Also, Brückl and Sendlmeier (2003) suggest that the duration of speech pauses is a better indicator of age than the articulation rate.
Additionally, for males, our results indicated that the total speech duration decreases with age, despite the fact that the mean speech duration of each speech interval only tends to decrease slightly. However, the speech duration variability significantly decreases with age, which may indicate that older males tend to perform speech intervals with a similar duration.

Harmonics-to-Noise Ratio
As for HNR, as can be seen in Fig. 7, males presented lower values than females, and the regression results confirmed the gender differences. In the case of females, their values tended to be lower with age, while no change in males was observed. The regression line showed a decrease of about 4 dB between the ages 35 and 100 in females, which was significant.
Regarding speaking f 0 , our data tend to confirm the claim in the literature that f 0 slightly decreases in females with age (Goy et al., 2013;Guimarães & Abberton, 2005;Hazan et al., 2018;Morgan & Rastatter, 1986;Titze, 1994;Winkler, 2004), which has been attributed to the endocrinological changes that occur after menopause (Linville, 2001;Schötz, 2006;Sebastian, Babu, Oommen, & Ballraj, 2012). For males the speaking f 0 appears to remain stable with age, based on the regression lines. Nonetheless, the analysis of the same data by age groups in Albuquerque et al. (2021) revealed that speaking f 0 decreases until the age group [65][66][67][68][69][70][71][72][73][74][75][76][77][78][79] and starts to rise after that age, with an increase of about 10 Hz in the older group, which presented the highest mean value. Thus, a nonlinear variation of speaking f 0 with age, which has been reported in other studies (Titze, 1994), was also observed in the data. The increase of speaking f 0 in older males may be associated with the muscle atrophy, or with an increase in stiffness of vocal folds tissue with aging (Higgins & Saxman, 1991;Linville, 2001;Sebastian et al., 2012). Thus, speaking f 0 tends to converge across genders as age increases.
Lastly, females exhibit higher values of HNR than males, as in most studies (Ambreen et al., 2019;Dehqan et al., 2013;Goy et al., 2013;Schötz, 2006). The significant decrease of HNR with age in females is also consistent with past studies (Dehqan et al., 2013;Ferrand, 2002;Xue & Deliyski, 2001), and reflects more additive noise in the voiced signal (Dehqan et al., 2013;Ferrand, 2002;Schötz, 2006). According to Mueller (1997), a lower HNR would be predicted because hoarseness tends to be more prevalent in the voices of older speakers due to decrements in laryngeal function, such as ossification of cartilage, degeneration of muscle, connective tissue, and neural tissue, as well as respiratory changes (Dehqan et al., 2013;Mueller, 1997;Titze, 1994).
For males, almost no age-related variation was found in HNR, as in Ambreen et al. (2019), Dunashova (2021) and Goy et al. (2013). However, these results should be considered with caution, since the type of speech sample used in the present research did not consist of sustained vowels. Still, a longitudinal study (Dunashova, 2021), using reading samples of one male speaker at different ages (age 59 and age 74), found no changes in HNR with age.
Conclusions on the effects of aging on spontaneous speech should be drawn with caution due to differences in the recording environment, and because of the automatic extraction procedures. Although not all labeled syllables were manually verified, they were obtained using the same procedure for all speakers.

CONCLUSION
This article explores the age effects at the suprasegmental level in EP spontaneous speech. These results provide, essentially, a point of departure to establish the normal patterns of rhythm and intonation in spontaneous speech across age among adult Portuguese native speakers.
Fundamentally, as age progresses, male speakers tend to talk less, with a decrease in variability of speech inter-val duration and with higher pause time. Female participants present a significant increase in speech and articulation rate and also a HNR decrease, which means that women speak faster and with a lower voice quality.
Additional studies about HNR and articulation rate are required, and longitudinal studies might provide additional insights to the question of how getting older modifies the characteristics of spontaneous speech.