1. INTRODUCTION
⌅Speech
rhythm has remained a controversial topic for a long time, especially
in the field of acoustic phonetics. Examples of such controversial
issues are (to name a few): a) how to define and measure speech rhythm
and whether this concept is valid and/or useful (Arvaniti, 2012Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
; Tilsen, 2016Tilsen, S. (2016). Selection and coordination: The articulatory basis for the emergence of phonological structure. Journal of Phonetics, 55, 53-77. https://doi.org/10.1016/j.wocn.2015.11.005
), b) if distinct rhythm classes or types such as
stress-timed and syllable-timed languages exist and whether or not they
reflect linguistic or perceptual categories (Dauer, 1983Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of phonetics, 11(1), 51-62. https://doi.org/10.1016/S0095-4470(19)30776-4
; Ramus et al., 1999Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
; White & Mattys, 2007White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501-522. https://doi.org/10.1016/j.wocn.2007.02.003
); and c) how speech rhythm relates and interacts
with other prosodic or segmental features such as intonation, stress,
vowel quality, and syllable structure (Grabe & Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
; Arvaniti et al., 2008; Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
). The fact that such issues exist reflects the
diverse and complex nature of studies done in this area, which, in turn,
have served as the rationale behind the current study.
Moreover,
one significant contribution of such studies to the body of literature
has been the introduction of the above-mentioned issues in at least four
different branches of applied phonetics, namely 1) Children’s speech
and second language learning (Lee et al. 2014Lee,
C. S., Kitamura, C., Burnham, D., & McAngus Todd, N. P. (2014). On
the rhythm of infant-versus adult-directed speech in Australian English. The Journal of the Acoustical Society of America, 136(1), 357-365. https://doi.org/10.1121/1.4883479
; Polyanskaya & Ordin, 2015Polyanskaya, L., & Ordin, M. (2015). Acquisition of speech rhythm in first language. The Journal of the Acoustical Society of America, 138(3), EL199-EL204. https://doi.org/10.1121/1.4929616
), 2) Speech technology (Barbosaand & Bailly, 1994Barbosa, P., & Bailly, G. (1994). Characterisation of rhythmic patterns for text to-speech synthesis. Speech Communication, 15(1-2), 127-137. https://doi.org/10.1016/0167-6393(94)90047-7
, Gibbon, 2023Gibbon, D. (2023). The rhythms of rhythm. Journal of the International Phonetic Association, 53(1), 233-265. doi:10.1017/S0025100321000086
), 3) Speech pathology (Leong & Goswami, 2014Leong,
V., & Goswami, U. (2014). Impaired extraction of speech rhythm from
temporal modulation patterns in speech in developmental dyslexia. Frontiers in human neuroscience, 8, 96. https://doi.org/10.3389/fnhum.2014.00096
; Liss et al., 2009Liss,
J. M., White, L., Mattys, S. L., Lansford, K., Lotto, A. J., Spitzer,
S. M., & Caviness, J. N. (2009). Quantifying speech rhythm
abnormalities in the dysarthrias. Journal of speech, language, and hearing research: JSLHR, 52(5), 1334-1352. https://doi.org/10.1044/1092-4388(2009/08-0208)
; White et al., 2010White, L., Liss, J. M. & Dellwo, V. (2010). Assessment of rhythm. In A. Lowit and R.D. Kent ed. Assessment of Motor Speech Disorders, 312-352, San Diego: Plural Publishing.
, Magne et al., 2016Magne,
C., Jordan, D. K., & Gordon, R. L. (2016). Speech rhythm
sensitivity and musical aptitude: ERPs and individual differences. Brain and Language, 153-154, 13-19. https://doi.org/10.1016/j.bandl.2016.01.001
) and, 4) Forensic phonetics (Dellwo et al., 2012Dellwo, V., Leemann, A., & Kolly, M. J. (2012, September). Speaker idiosyncratic rhythmic features in the speech signal. Interspeech Conference Proceedings.http://interspeech2012.org/accepted-abstract.html?id=1195
; Leeman et al., 2014Lee,
C. S., Kitamura, C., Burnham, D., & McAngus Todd, N. P. (2014). On
the rhythm of infant-versus adult-directed speech in Australian English. The Journal of the Acoustical Society of America, 136(1), 357-365. https://doi.org/10.1121/1.4883479
; Asadi et al. 2018Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
).
The distinction between speaker
variability and language-dependent rhythmic characteristics is very
relevant, as it helps determining the sources and effects of rhythmic
variation in speech (Fuchs, 2016Fuchs, R. (2016). The Concept and Measurement of Speech Rhythm. In: Speech Rhythm in Varieties of English. Prosody, Phonology and Phonetics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47818-9_3
). Nonetheless, drawing a clear line between such
concepts is not an easy task since they tend to interact and influence
each other in complex ways (Mok & Dellwo, 2008Mok,
P., & Dellwo, V. (2008, May). Comparing native and non-native
speech rhythm using acoustic rhythmic measures: Cantonese, Beijing
Mandarin and English. In: Speech Prosody 2008, Campinas, Brazil. ISCA,
423-426.
). For example, some speaker-specific features
may or may not be pronounced depending on the language or dialect
spoken by the speaker (Leemann et al., 2014Leemann,
A., Kolly, M. J., & Dellwo, V. (2014). Speaker-individuality in
suprasegmental temporal features: Implications for forensic voice
comparison. Forensic science international, 238, 59-67. https://doi.org/10.1016/j.forsciint.2014.02.019
). while, some other language-specific features
may be pronounced more strongly or weakly depending on the speaker’s
individual style or preference (Asadi et al., 2018Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
).
Consequently, to distinguish between
speaker variability and language-dependent rhythmic characteristics,
both acoustic measurements and perceptual judgments of speech rhythm are
needed. Acoustic measurements provide objective and quantitative data
on temporal features of speech like duration, intensity, frequency, and
variability (Gibbon, 2023Gibbon, D. (2023). The rhythms of rhythm. Journal of the International Phonetic Association, 53(1), 233-265. doi:10.1017/S0025100321000086
).
That being said, to study the between
speaker variability and language-dependent rhythmic characteristics,
this study attempted to find out the appropriate measures for
between-speaker and between-sentences rhythmic variability in two
speaking styles in Kalhori. Kalhori was selected because it has unique
features that may affect its speech rhythm, such as its complex syllable
structure (C)(C)V(V)(C)(C), its stress pattern (penultimate or final),
its vowel harmony system (front-back and round-unround), and its tonal
accent system (Karimi-doostan, 2002Karimi-Doostan, G. (2002). Syllable structure in Kurdish language. Specialized Journal of Language and Literature of Mashhad Faculty of Literature and Human Sciences, 35(2), 235-248.
; Kreynbroek, 2005Kreyenbroek, P. G. (2005). On the Kurdish language. In The Kurds (pp. 62-73). Routledge.
; Thackston, 2006Thackston, W. M. (2006). Kurmanji Kurdish:-A Reference Grammar with Selected Readings. Renas Media.
).
Therefore, segmental intervals, consonant and vowel intervals, vocalic
and consonantal intervals, voiced and unvoiced intervals, syllable
intervals, and syllable peak intervals were examined in both spontaneous
and read speech styles of Kalhori, a variety of Kurdish.
Kurdish
is a covering term used to refer to a group of Northwestern Iranian
languages spoken in parts of Turkey, Iran, Armenia, Iraq, Syria, and
Azerbaijan (Windfuhr, 1989Windfuhr, G. (1989). West Iranian Dialects. In R Schmitt, ed. Compendium Liguarum Iranicum, 294-295, Wiesbaden: Reichert.
). Generally, there is no agreement on the classification of Kurdish dialects whether in Iran or other countries. McCarus (2009)McCarus, E. N. (2009). Kurdish. In G. Windfuhr Ed. London & New Yourk: Routledg.
, for instance, believes that Kurdish cannot be located in a single group among Iranian languages because, according to Gharib (2011)Gharib, H. E. (2011). Transitivity alternations in Sorani Kurdish (Doctoral dissertation, University of Kansas). http://hdl.handle.net/1808/9791
and McCarus (1959), it shares syntactic and morphological similarities with Balouchi, Gilaki, Taleshi, and Farsi. Dabirmoghadam (2013)Dabirmoghadam, M. (2013). Typology of Iranian 2. Tehran: SAMT.
Daneshpazhouh (2010)Daneshpazhouh, F. (2010). Ergativity in Kurmanji (Badini), Sorani and Hawrami Kurdish,Tehran: Dastan.
, Thackston (2006)Thackston, W. M. (2006). Kurmanji Kurdish:-A Reference Grammar with Selected Readings. Renas Media.
and Kreynbroek (2005)Kreyenbroek, P. G. (2005). On the Kurdish language. In The Kurds (pp. 62-73). Routledge.
provide different classifications for Kurdish. It is, however, mainly
divided into three main groups including Northern Kurdish or “Kurmanji”,
Central Kurdish or “Sorani” and Southern Kurdish. According to Fattah (2000)Fattah, I.K. (2000). Les dialectes kurdes méridionaux: Étude linguistique et dialectologique (Acta Iranica 37). Leuven: Peeters.
,
Southern Kurdish is spoken by three million people across an extensive
region in Kermanshah, Ilam, Parts of Lorestan and Kurdistan Provinces in
Iran and Khanaqin and Mandali in Iraq. As Figure 1 illustrates, the southern Kurdish consists of several varieties
including Kermashani, Feyli, Laki and Kalhori. The data for the current
study has been based upon Kalhori, one of the biggest tribes of
Kermanshah and the second biggest tribe in Iran. Kalhori is the spoken
variety in Iran’s Kermanshah (in Eslamabad, Gilan-e-Gharb, southern part
of Qasr-e-Shirin), Ilam (in Abdanan), Kurdistan (in Bijar and Qorveh)
and Iraq (in Khaneqeyn, Kalar, Kofri and Diyala). Figure 1 shows the Revised Map of the distribution of Southern Kurdish dialects (Belelli, 2019Belelli, S. (2019). Towards a dialectology of Southern Kurdish: Where to begin. Current issues in Kurdish linguistics, 73, 92.
).
The current study aims to respond to the following research questions using two different styles: read and spontaneous speech.
-
Q1: What is the typology of Kalhori rhythm based on the read corpus?
-
Q2: How does sentence structure impact the rhythmic measures of Kalhori’s read speech?
-
Q3: Which durational measures have a significant impact on the between-speaker rhythmic variability in read and spontaneous Kalhori speech?
Thence, the first question helped with documenting and describing the rhythmic typology of Kalhori, while the second question allowed for the analysis of between-sentence variation by revealing how the rhythmic measures varied across different sentences within same speakers and speech style. Lastly, the third question showed speakers’ consistency and/or flexibility while producing and maintaining their speech rhythm, and the adaptation of their speech rhythm in different sentence structures, contents, and different styles.
2. LITERATURE REVIEW
⌅Speech rhythm approaches can be roughly divided into three categories: durational, modulation, and prominence (He, 2022He,
L. (2022). Characterizing first and second language rhythm in English
using spectral coherence between temporal envelope and mouth
opening-closing movements. The Journal of the Acoustical Society of America, 152(1), 567-579. https://doi.org/10.1121/10.0012694
). Durational approaches measure the variability
of different phonetic intervals in speech, especially vocalic and
consonantal intervals (Ramus et al., 1999Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
; Grabe & Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
; White & Mattys, 2007White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501-522. https://doi.org/10.1016/j.wocn.2007.02.003
; Dellwo 2009Dellwo,
V., Schmid, S., Schwarzenbach, M., & Studer-Joho, D. (2009).
Choosing the right rate normalization method for measurements of speech
rhythm. https://doi.org/10.5167/uzh-45236
, 2010Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
; Arvantini, 2012Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
; Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
). Modulation approaches analyze the temporal
envelope of speech and extract the recurring frequencies and phase
relationships of different modulation rates such as syllable and stress (O’ Dell & Neimenen, 1999O’Dell, M., & Nieminen, T. (1999, August). Coupled oscillator model of speech rhythm. In Proceedings of the XIVth international congress of phonetic sciences (Vol. 2, pp. 1075-1078). Berkeley: University of California.
; Barbaso, 2002Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. In Speech Prosody 2002
; Tilsen & Johnson, 2008Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. The Journal of the Acoustical Society of America, 124(2), EL34-EL39. https://doi.org/10.1121/1.2947626
; Leong et al. 2014Leong,
V., Stone, M. A., Turner, R. E., & Goswami, U. (2014). A role for
amplitude modulation phase relationships in speech rhythm perception. The Journal of the Acoustical Society of America, 136(1), 366-381. https://doi.org/10.1121/1.4883366
; Malisz et al., 2017Malisz,
Z., O‘Dell, M., Nieminen, T., & Wagner, P. (2017). Perspectives on
speech timing: Coupled oscillator modeling of Polish and Finnish. Phonetica, 73(3-4), 229-255. https://doi.org/10.1159/000450829
; Lancia et al., 2019; Gibbon, 2023Gibbon, D. (2023). The rhythms of rhythm. Journal of the International Phonetic Association, 53(1), 233-265. doi:10.1017/S0025100321000086
). Prominence approaches examine the intensity or
spectral variability of speech and use them to identify the rhythmic
skeleton or pattern of speech (Todd, 1985; Cummins & Port, 1998Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2), 145-171.
; Lee & Todd, 2004Lee,
C. S., & Todd, N. P. M. (2004). Towards an auditory account of
speech rhythm: application of a model of the auditory ‘primal sketch’to
two multi-language corpora. Cognition, 93(3), 225-254. https://doi.org/10.1016/j.cognition.2003.10.012
).
Since this study aimed at exploring the
durational aspects of the Kalhori variety rhythm, a brief overview of
durational approach is provided. Durational approaches can be traced
back to the theories of isochrony in stress-timed and syllable-timed
languages. This theory was first proposed by Pike (1945)Pike, K. L. (1945). The Intonation of American English. Ann Arbor: University of Michigan Press.
, and James (1938James, A. L. (1938). Our spoken language (No. 9). London: T. Nelson.
, 1929)James, A. L. (1929). Historical introduction to French Phonetics, London: ULP.
who claimed that “stress-timed” languages, such as English, German, and
Dutch, had equal/periodic feet and “syllable-timed” languages such as
French, Italian, and Spanish, had equal/periodic syllables. Nonetheless,
such attempts proved that the isochrony or quasi-isochrony of
durational intervals were not observable in several languages (Bertrán,
1999Bertrán, A. P. (1999). Prosodic typology: on the dichotomy between stress-timed and syllable-timed languages. Language design: journal of theoretical and experimental linguistics, 2, 103-130.; Dauer, 1983Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of phonetics, 11(1), 51-62. https://doi.org/10.1016/S0095-4470(19)30776-4
; Pointon, 1980Pointon, G. E. (1980). Is Spanish really syllable-timed? Journal of Phonetics, 8(3), 293-304. https://doi.org/10.1016/S0095-4470(19)31479-2
; Roach, 1982Roach,
P. (1982). On the distinction between ‘stress-timed’and
‘syllable-timed’languages. Linguistic controversies. in Crystal,
Linguistic controversies: essays in linguistic theory and practice in
honour of F.R. Palmer, 73-79.
). Later on, other
measures for speech rhythm were proposed by phoneticians. Standard
deviation of vocalic and consonantal intervals (∆C and ∆V) as well as
the percentage of vocalic intervals (%V) were examined for each sentence
by Ramus et al. (1999)Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
to determine the rhythmic typology of different languages. Data from Ramus et al. (1999)Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
consisted of five 15- to 19-syllable sentences
read by four native speakers in eight different languages. Their entire
database contained 2,720 syllables, with each language consisting of 340
syllables. The results of this study indicated that English is a
stress-timed language while French is a syllable-timed language based on
∆V and %V.
In order to measure durational variability between sequences of vocalic and consonantal intervals, Grabe and Low (2002)Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
introduced the Pairwise variability index (nPVI-V
and rPVI-C) in which they examined 16 languages, in each language a
native speaker read the original text or translation of the story “North
Wind and the Sun”. This story contained 141 syllables in the English
version. Assuming that the average number of syllables in each version
of each language is about 150 syllables, the total number of syllables
examined in this study were 2256 syllables (16×150). Based on the
results of this study, English rhythm shows patterns that are more
closely aligned with stress-timed languages, while French leans closer
to syllable-timed languages.
White and Mattys (2007)White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501-522. https://doi.org/10.1016/j.wocn.2007.02.003
studied PVI, ∆C, ∆V, VarcoC, VarcoV and %V in
English and Dutch as representative of stress-timed languages and
Spanish and French as syllable-times. Their database included 5 speakers
from each language that read the text of a short story.
Varco coefficient and the natural logarithm that are other normalization methods on the speech rate were proposed by Dellwo (2009Dellwo,
V., Schmid, S., Schwarzenbach, M., & Studer-Joho, D. (2009).
Choosing the right rate normalization method for measurements of speech
rhythm. https://doi.org/10.5167/uzh-45236
, 2010)Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
by using Bonn Tempo corpus that consisted of 12 German speakers, 7
English speakers and 7 French speakers at the time of the research.
Moreover, Arvantini (2012)Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
investigated the repetition of acoustical
information of syllables instead of segmental units by introducing the
amplitude envelope measure of rhythm. Arvantini (2012)Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
used 3 different styles in her study: story
reading, spontaneous speech and sentence reading. Participants in her
research were from six different languages of Greek, English, German,
Spanish, Korean and Italian. Eight speakers of each language were
present in this research. In the story reading section, the text of the
story “North Wind and the Sun”, was recited for about one to two minutes
in the form of spontaneous speech style, and in the sentence reading
section, 5 sentences were read by each speaker.
Subsequent
research has shown that vocalic and consonantal rhythm measures can vary
significantly in a language based on the speaker’s performance (Wiget et al., 2010Wiget,
L., White, L., Schuppler, B., Grenon, I., Rauch, O., & Mattys, S.
L. (2010). How stable are acoustic metrics of contrastive speech rhythm? The Journal of the Acoustical Society of America, 127(3), 1559-1569. https://doi.org/10.1121/1.3293004
; Yoon, 2010Yoon, T. J. (2010). Capturing inter-speaker invariance using statistical measures of rhythm. Speech Prosody 2010, 100201, 1-4.
, Loukina et al., 2011Loukina,
A., Kochanski, G., Rosner, B., Keane, E., & Shih, C. (2011). Rhythm
measures and dimensions of durational variation in speech. The Journal of the Acoustical Society of America, 129(5), 3258-3270. https://doi.org/10.1121/1.3559709
; Arvantini, 2012Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
; Leeman et al., 2014Leemann,
A., Kolly, M. J., & Dellwo, V. (2014). Speaker-individuality in
suprasegmental temporal features: Implications for forensic voice
comparison. Forensic science international, 238, 59-67. https://doi.org/10.1016/j.forsciint.2014.02.019
). However, Wiget al. (2010)Wiget,
L., White, L., Schuppler, B., Grenon, I., Rauch, O., & Mattys, S.
L. (2010). How stable are acoustic metrics of contrastive speech rhythm? The Journal of the Acoustical Society of America, 127(3), 1559-1569. https://doi.org/10.1121/1.3293004
indicated that %V and VarcoV are more variable than nPVI among different English speakers. While Dellwo and Fourcin (2013)Dellwo, V., & Fourcin, A. (2013). Rhythmic characteristics of voice between and within languages. Travaux neuchâtelois de linguistique, (59), 87-107. https://doi.org/10.26034/tranel.2013.2947
proposed that speaker-specific information is
also reported in the duration of voiced and unvoiced intervals in the
German-Swiss language, Dellwo et al. (2015)Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
suggested that between-speaker variability of
speech rhythm measures is robust in different within-speaker situations
by considering %V, ∆V (ln), ∆C (ln), ∆peak (ln) based on the speed of
speech production organs movement and linguistic structures in which 12
German speakers read seven sentences in five different speech rates:
very slow, slow, normal, fast and very fast.
Persian within-speaker and between-speaker differences with different speech rates have been studied by Asadi et al. (2018)Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
where 10 Persian speakers read the story “The
North Wind and the Sun” in 5 different speech rates. The results showed
that %V is a robust parameter in distinguishing between-speaker factors. Taghva et al. (2021)Taghva,
N., Abolhasani Zade, V., and Moloodi, A., (2021). Durational rhythmic
variability of Persian between-sentence, Presented at SLE 2021, Socities
Linguistica Europia, 223-224.
studied a read text in
Persian and indicated that VarcoC and %V are the robust measures in
between-sentence differences in which ten Persian speakers read the
story of “The North Wind and the Sun”.
Having studied the
literature and to the best of the present researchers’ knowledge, no
study has yet comprehensively investigated the quantitative rhythmic
measures for Kurdish language and its varieties, which, as mentioned in
“Section 1”, is being spoken in parts of Turkey, Iran, Armenia, Iraq,
Syria, and Azerbaijan (Windfuhr, 1989Windfuhr, G. (1989). West Iranian Dialects. In R Schmitt, ed. Compendium Liguarum Iranicum, 294-295, Wiesbaden: Reichert.
).
To fill this gap, this study examined the between-sentence and
between-speaker rhythmic measures in two different speaking styles (read
and spontaneous speech) in Kalhori, a variety of Kurdish.
3. METHOD
⌅Ten
native speakers of Kalhori variety who were originally from the same
region (Kermanshah, which is the largest Kurdish-speaking city in Iran [Borjian, 2017Borjian, H. (2017). Kermanshah vii. languages and dialects. İçinde Encyclopædia Iranica. Erişim tarihi: 14 Temmuz 2021.
]),
including five males and five females, participated in this study. Ages
ranged between 21 and 40 with a mean of 31.72 years and SD of 8.81. To
be of the same social group, all participants were recruited among
Shiraz University students.
The experiment took place at Shiraz University’s acoustic room where the researchers were able to use Zoom h4 recorder. The recorder was positioned diagonally around 20cm away from the participants’ mouths using a base.
To move forward with the experiment, two sets of corpora were compiled. Gibbon (2022)Gibbon, D. (2022). Speech rhythms: learning to discriminate speech styles. Proc. Speech Prosody 2022, 302-306.
had indicated that depending on the styles, the degree of rhythm may
vary from being more rhythmical in the rhetoric of public speeches,
poetry recitation and reading aloud to being more arhythmical in
planning discussions. Therefore, in the first corpus, the participants
read an identical story to determine the rhythmic typology of Kalhori
variety and to express between-sentence and between-speaker rhythmic
variability in Kalhori read speech. Following previous studies (Pellegrino, 2019Pellegrino, E. (2019). The effect of healthy aging on within-speaker rhythmic variability: A case study on Noam Chomsky. Loquens, 6(1), e060. https://doi.org/10.3989/loquens.2019.060
; Gibbon, 2022Gibbon,
D. (2022). New Perspectives on Ibibio Speech Rhythm. In Current Issues
in Descriptive Linguistics and Digital Humanities: A Festschrift in Honor of Professor Eno-Abasi Essien Urua (pp. 457-486). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-2932-8_34
; Asadi et al., 2018Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
), in this study, the potential effects of age,
style, and speech rate on the rhythmic metrics was eliminated by
selecting participants of approximately the same age group (21-40) who
read the same story at a normal speed. In the second corpus,
participants were interviewed to observe between-speaker rhythmic
variability in Kalhori spontaneous speech.
3.1. Experiment 1: Read corpus
⌅To
elicit precise instances of between-sentence and between-speaker
diversities, in the first experiment, attempts were made to provide
identical situations for all participants. As a result, we gave the
Kalhori version of the “North Wind and the Sun” story (written with
Persian orthography) to the participants before beginning the interview
and asked them to read it at a normal speed. The reason for selecting
this story was that it has been recognized as a standard for phonetic
documentation of many languages by the International Phonetic
Association, and it has been frequently utilized by speech scientists
for analyzing both sound segments and prosodies (Baird et al., 2022Baird,
L., Evans, N., & Greenhill, S. J. (2022). Blowing in the wind:
Using ‘North Wind and the Sun’ textsto sample phoneme inventories. Journal of the International Phonetic Association, 52(3), 453-494. https://doi.org/10.1017/S002510032000033X
). This story comprises seven complex Kalhori
sentences, a total of 70 tokens (10 speakers × 7 sentences). In the
event that a mistake was made by the participants while reading the
sentences during the interview, they were asked to read the sentences
again.
3.2. Experiment 2: Spontaneous corpus
⌅To devise the spontaneous corpus, we interviewed the participants by asking them six questions about the content of which they were unaware prior to the study. Then, 21 sentences were extracted from each participant’s speech. The selected sentences were grammatically meaningful; the speakers did not express them with hesitation and did not have any pronunciation problems. Eventually, the final set of data for this part of the experiment comprised 210 tokens (10 speakers × 21 sentences).
3.3. Data editing
⌅The research corpora were analyzed using Praat version 6.1.41 after creating five TextGrid tiers. Each segment’s offset and onset were determined manually and transcribed according to the IPA in the first tier by the first author (NT) and they were checked again by the fourth author (RT), a native speaker of Kurdish Kalhori. Afterwards, the vowels and consonants were tagged in the second tier. In the third tier, the vowel and consonant intervals were labeled based on the number of consonants and vowels; and, in the fourth layer, the vocalic and consonantal intervals were identified. Finally, in the fifth tier, the syllable boundaries were tagged manually. Eventually, the peak of each syllable was automatically identified according to the principle of sonority and by drawing on Dellwo’s script (https://www.cl.uzh.ch/de/people/team/phonetics/vdellw/software.html) in the sixth layer. An example of a TextGrid is presented in Figure 2.
3.4. The measures
⌅Some speech rhythm measures (68 measures) from previous studies were used in this research as well (Ramus et al., 1999Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
; White & Mattis, 2007White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501-522. https://doi.org/10.1016/j.wocn.2007.02.003
; Grabe & Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
; Dellwo 2009Dellwo,
V., Schmid, S., Schwarzenbach, M., & Studer-Joho, D. (2009).
Choosing the right rate normalization method for measurements of speech
rhythm. https://doi.org/10.5167/uzh-45236
, 2010Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
; Dellwo et al., 2012Dellwo, V., Leemann, A., & Kolly, M. J. (2012, September). Speaker idiosyncratic rhythmic features in the speech signal. Interspeech Conference Proceedings.http://interspeech2012.org/accepted-abstract.html?id=1195
; Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
). The script proposed by Dellwo (https://www.cl.uzh.ch/de/people/team/phonetics/vdellw/software.html) calculated all measures automatically. These measures are listed according to the TextGrid tiers in Table 1.
Tier | Tier’s name | Measures |
---|---|---|
1 | segment | rateSeg, meanSeg |
2 | cv segment | rateCon, meanCon, meanConLn, ∆Con, ∆ConLn, VarcoCon, rPVI_Con, nPVI_Con, rateVow, meanVow, meanVowLnt, ∆Vow, ∆VowLn, VarcoVow,rPVI_Vow,nPVI_Vow |
4 | cv interval | rateC, meanC, meanCLn, ∆C, ∆CLn, VarcoC, rPVI_C, nPVI_C, rateV, meanV, meanVLn, ∆V, ∆VLn, VarcoV, rPVI_V, nPVI_V, %V, %VO, nVoiced, meanVoiced, meanVoicedLn, ∆Voiced, ∆VoicedLn, VarcoVoiced, rPVI_Voiced, nPVI_Voiced, nUnvoiced, meanUnvoiced, meanUnvoicedLn, ∆Unvoiced, ∆UnvoicedLn, VarcoUnvoiced, rPVI_Unvoiced, nPVI_Unvoiced |
5 | syllable | rateSyl, meanSyl, meanSylLn, ∆Syl, ∆SylLn, VarcoSyl, rPVI-Syl, nPVI-Syl |
6 | peak tier | meanPeak, ratePeak, meanPeakLn, ∆Peak, ∆PeakLn, VarcoPeak, rPVI_peak, nPVI_peak |
In this part, one item from each measure is described:
Where nv is the number of vowel intervals, nc is the number of consonant intervals, vi is the duration of the vowel, and ci is the duration of the consonant.
Where NSyl is the number of syllable intervals in the sentence, and d is the sentence duration without considering the pauses.
The standard deviation of the normalized rate of different intervals (standard deviation divided by the mean called varco, such as Formula 3)
Where is the standard deviation of consonant intervals and is the mean duration of consonant intervals.
Where m is the number of vowel intervals and dk is the duration of vowel intervals.
Where Invl is vowel, consonant or peak intervals and N is the number of these intervals.
3.5. Data analysis
⌅To calculate all the rhythm measures in Praat, the script written by Dellwo (https://www.cl.uzh.ch/de/people/team/phonetics/vdellw.html)
was used. Then, correlational measures were determined after running
Pearson correlation analysis. Pearson correlation is a statistical
method that measures the linear relationship between two continuous
variables. It is useful for feature selection, the process of choosing
the most relevant variables for analyzing and reducing the
dimensionality of the data (James et al. 2013James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: springer.
). As shown in Table 1,
we calculated 68 durational rhythmic measures, which was a very large a
number for effective analysis. We, therefore, applied Pearson
correlation as a feature selection method to reduce the number of
measures and retain the most relevant ones for speech rhythm analysis.
Pearson correlation allowed us to examine the linear relationship
between each pair of measures (He & Dellwo, 2016He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language & the Law, 23(2). doi: 10.1558/ijsll.v23i2.30345
) and eliminate those that were highly correlated
(r > 0.5) with others so that redundant information about speech
rhythm would be avoided. Those measures that had low correlation (r <
0.5) were kept since they provided independent information about speech
rhythm. Moreover, sentences and/or speakers were considered as an
independent variable and the rhythmic measures as dependent variable.
Afterwards,
since in the read corpus data were balanced and orthogonal, to
ascertain Kalhori’s between-sentence rhythmic measures variability, a
one-way ANOVA test was run. ANOVA was used to see how language and
method affect measures. It was also utilized to sentence types and
determine whether means differ significantly and helped the authors
understand the data’s variability and patterns (see Arvaniti, 2012Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
).
Furthermore, to explore Kalhori’s
between-speaker rhythmic measures variability, a mixed-design ANOVA or a
MANOVA was used. MANOVA is a statistical method that compares the means
of multiple dependent variables across different groups and conditions,
while accounting for both between-subjects and within-subjects factors (Stevens, 1996Stevens, J. (1996). Applied multivariate statistics for the social sciences. Mahway, NJ: Lawrence Erlbaum.
).
To interpret the results of MANOVA, both the multivariate tests and the
univariate tests were studied. Multivariate tests determine the
significance of the overall effects of the factors on combination of
dependent variables; and, univariate tests show effects of the factors
on each dependent variable (Stevens, 1996Stevens, J. (1996). Applied multivariate statistics for the social sciences. Mahway, NJ: Lawrence Erlbaum.
).
In this study, MANOVA allowed simultaneous comparison of multiple
dependent variables (rhythmic measures) across two independent variables
i.e., styles and speakers. It showed whether any rhythmic measures
differed significantly between the two styles when considered together.
4. RESULTS
⌅To
identify the typology of Kalhori rhythmic features, and determine the
between-speaker and between-sentence rhythmic variabilities, we used the
measures investigated in previous studies (Ramus et al., 1999Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
; White & Mattis, 2007White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501-522. https://doi.org/10.1016/j.wocn.2007.02.003
; Grabe & Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
; Dellwo, 2009Dellwo,
V., Schmid, S., Schwarzenbach, M., & Studer-Joho, D. (2009).
Choosing the right rate normalization method for measurements of speech
rhythm. https://doi.org/10.5167/uzh-45236
, 2010Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
; Dellwo et al., 2012Dellwo, V., Leemann, A., & Kolly, M. J. (2012, September). Speaker idiosyncratic rhythmic features in the speech signal. Interspeech Conference Proceedings.http://interspeech2012.org/accepted-abstract.html?id=1195
; Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
).
4.1. Read corpus analysis
⌅The sum of interval durations considered in the read experiment are shown in Table 2.
Intervals | Sum |
---|---|
segmental intervals | 3880 |
syllable intervals | 1795 |
consonantal intervals | 1632 |
vocalic intervals | 1599 |
consonantal-vocalic intervals | 3439 |
consonant intervals | 1215 |
vowel intervals | 1581 |
peak intervals | 1658 |
voiced intervals | 1599 |
unvoiced intervals | 1632 |
4.1.1. The rhythmic typology of Kalhori
⌅To determine the typology of rhythm in Kalhori variey ∆C, %V and nPVI-V (Ramus et al., 1999Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
; Grabe & Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
; Dellwo, 2010Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
) were explored. The descriptive statistics are as follows (Table 3):
Mean | Std. | Skewness | Kurtosis | |
---|---|---|---|---|
∆C | .056 | .01 | .59 | -.02 |
%V | 42.28 | 5.61 | .42 | -.01 |
nPVI_V | 47.36 | 9.02 | .08 | -.541 |
The comparison of the results of table 3 with Ramus et al (1999)Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
- and Grabe and Low (2002)Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
- shows that the mean value of ∆C is 0.056, which
is relatively low compared to some stress-timed languages like English
(0.07). The mean value of %V is 42.28 which is relatively high compared
to some stress-timed languages like English (38.5), and the mean value
of nPVI_V is 47.36, which is also relatively low compared to some
stress-timed languages like English (52.1). Table 4 presents the standard deviation of %V, ∆C, and nPVI-V of Kalhori
Kurdish in comparison with English as a stress-timed language and French
as a syllable-timed language derived from Ramus et al. (1999)Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
, and Grabe and Low (2002)Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
.
and Grabe and Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
)
%V (std) | ∆C (std) | nPVI-V (mean) | |
---|---|---|---|
English | 5.4 | 1.63 | 54 |
French | 4.5 | 0.74 | 43.05 |
Kalhori Kurdish | 5.61 | 0.01 | 47.36 |
4.1.2. Between-sentence measures in read corpus
⌅To answer the second question of this study and understand the impact of sentence structure on the rhythmic measures of read Kalhori speech, at first, Pearson correlation analysis was run to keep the measures with low correlation (r < 0.5). The results showed that rateSyl, ∆SylLn, VarcoC, nPVI-V, and %V are the least correlated measures in the read corpus. As mentioned in part (3.4), RateSyl measures the overall speech rate, ∆SylLn shows how the syllable lengths vary within an utterance, VarcoC reveals how consonantal intervals vary with regards to their average length, nPVI-V indicates how similar or different the vowel durations are from each other, and %V tells us how much of the utterance is occupied by vowels. The results of Pearson correlation analysis of these five measures are represented on Table 5.
rateSyl | ∆SylLn | VarcoC | nPVI_V | %V | |
---|---|---|---|---|---|
rateSyl | 1 | -.01 | -.02 | .27* | .39** |
∆SylLn | -.01 | 1 | .12 | .04 | -.07 |
VarcoC | -.02 | .12 | 1 | -.01 | .06 |
nPVI_V | .27* | .04 | -.01 | 1 | .33** |
%V | .39** | -.07 | .06 | .33** | 1 |
** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
Table 5 indicates that the selected measures (rateSyl, ∆SylLn, VarcoC, nPVI_V, and %V) are less related to each other compared to the rest of the measures because of their low correlation coefficients (r < 0.5) which suggests that they capture different aspects of speech rhythm and do not provide redundant information.
Afterwards, a one-way ANOVA test was used for the measures selected using Pearson correlation analysis. We considered the sentences of read corpus as the independent variable and the measures as the dependent variables (Table 6).
Measures | Sum of squares | F | Sig. |
---|---|---|---|
rateSyl | 2.56 | 1.29 | .27 |
∆SylLn | .04 | 1.93 | .08 |
VarcoC | .13 | 2.40 | .03 |
nPVI-V | 978.92 | 2.21 | .05 |
%V | 739.24 | 5.41 | .00 |
The results of the ANOVA one-way test (Table 6) indicate that VarcoC and %V are meaningfully significant. Although VarcoC is also significant, the significance level of VarcoC is 0.03, which is very close to 0.05, the usual threshold for rejecting the null hypothesis. This means that VarcoC is only marginally significant. Moreover, the F-value of VarcoC is 2.40, which is much lower than the F-value of %V, which is 5.41. F-value is the ratio of the variance between groups to the variance within groups for each measure. Therefore, the higher the F-value, the greater the between-sentence variabilities of this measure. This means that VarcoC has a smaller ratio of variance between groups to variance within groups than %V, and it explains less of the total variation in the data than %V. Therefore, VarcoC is not as effective as %V in discriminating between sentences based on their speech rhythm. So, comparing the significant actions, %V (F-value=5.41) is the most efficient measure to reflect the Kalhori between-sentence variability based on this study’s data. Figure 3 indicates the %V and VarcoC changes for the sentences of the study.
Based on the boxplots in figure 3, comparing the sentences in terms of their %V values can be done. For example, we can see that sentence 5 has the lowest median %V value, suggesting that this sentence on average has fewer vowels than the other sentences. It also has the lowest variability in %V values, which means this sentence has less variation in vowel density compared to other sentences. Sentence 6 has the highest median %V value, which means that this sentence on average has more vowels than the other sentences. It also has the highest variability in %V values, which means that this sentence has more variation in vowel density than the other sentences.
Moreover, VarcoC comparison between the sentences indicates that sentence 2 has the lowest median VarcoC value, meaning this sentence on average and compared to others has less variability in consonant length. It also has the lowest variability in VarcoC values, which means that this sentence has more consistent consonant length than the other sentences. Sentence 7 on average and compared to others has the highest median VarcoC value, indicating more variability in consonant length. It also has the highest variability in VarcoC values, which means that this sentence has more variation in consonant length than the other sentences.
4.2. Spontaneous corpus analysis
⌅We investigated 210 tokens of spontaneous Kalhori sentences (10 speakers × 21 sentences) in the second experiment. The sum of duration of intervals considered in this experiment is shown in table 7.
Intervals | Sum |
---|---|
segmental intervals | 4999 |
syllable intervals | 2364 |
consonantal intervals | 2109 |
vocalic intervals | 2000 |
consonantal-vocalic intervals | 4479 |
consonant intervals | 1641 |
vowel intervals | 1974 |
peak intervals | 2208 |
voiced intervals | 2000 |
unvoiced intervals | 2109 |
According to the results for the Pearson correlation analysis, measures including rateSyl, ∆SylLn, VarcoC, nPVI-V and %V had low correlation. Table 8 shows the results of these five measures’ Pearson correlation analysis.
Table 8 indicates that the selected measures (rateSyl, ∆SylLn, VarcoC, nPVI_V, and %V), which had low correlation coefficients (r < 0.5), are less related to each other compared to the rest of the measures. In other words, they capture different aspects of speech rhythm, and do not provide redundant information.
rateSyl | ∆SylLn | VarcoC | nPVI_V | %V | |
---|---|---|---|---|---|
rateSyl | 1 | .07 | .05 | -.10 | .35** |
∆SylLn | .07 | 1 | .20** | .14* | .00 |
VarcoC | .05 | .20** | 1 | .02 | .09 |
nPVI_V | -.10 | .14* | .02 | 1 | .09 |
%V | .35** | .00 | .09 | .09 | 1 |
** Correlation is significant at the 0.01 level (2-tailed).
* Correlation is significant at the 0.05 level (2-tailed).
4.3. Between-speaker measures in read and spontaneous corpus
⌅To answer the third question, regarding which durational measures have a significant impact on the between-speaker rhythmic variability in read and spontaneous Kalhori speech, a MANOVA test was run on the data obtained from the results of the Pearson correlation analysis in section 4.1.2 and 4.2.2 for both corpora. In this study, the style and speaker were applied as independent variables and the rhythmic measures as the dependent variables. Table 9 presents the Multivariate Tests and Table 10 shows the tests of Between-Subjects Effects (univariate test).
Effect | Value | F | Sig. | |
---|---|---|---|---|
Intercept | Pillai’s Trace | .98 | 3993.95b | .00 |
Wilks’ Lambda | .01 | 3993.95b | .00 | |
Style | Pillai’s Trace | .34 | 27.31b | .00 |
Wilks’ Lambda | .65 | 27.31b | .00 | |
speakers | Pillai’s Trace | .55 | 3.63 | .00 |
Wilks’ Lambda | .53 | 3.86 | .00 | |
Style * speakers | Pillai’s Trace | .19 | 1.14 | .23 |
Wilks’ Lambda | .82 | 1.15 | .22 |
The MANOVA results (Table 9) for the multivariate tests demonstrate that both “Styles” and “Speakers” have significant and individual impacts on the variations in rhythmic measures. While the interaction between styles and speakers may not be statistically significant, the main effects of styles and speakers are indeed significant and contribute to the observed variability in the dataset.
The results of s of Between-Subjects test Effects (Table 10) for the dependent variables (rhythmic measures) under the effect Intercept (rateSyl, ∆SylLn, VarcoC, nPVI_V, %V) all show significant p-values (p < .001). This suggests that these features are highly effective in distinguishing between the styles and speakers.
Source | Dependent Variable | F | Sig. |
---|---|---|---|
Intercept | rateSyl | 7532.33 | .00 |
∆SylLn | 2853.23 | .00 | |
VarcoC | 3837.55 | .00 | |
nPVI_V | 2638.98 | .00 | |
%V | 9540.12 | .00 | |
Styles | rateSyl | 76.74 | .00 |
∆SylLn | 23.16 | .00 | |
VarcoC | .06 | .80 | |
nPVI_V | .73 | .39 | |
%V | 5.44 | .02 | |
Speakers | rateSyl | 7.41 | .00 |
∆SylLn | .78 | .63 | |
VarcoC | .41 | .92 | |
nPVI_V | 2.43 | .01 | |
%V | 8.13 | .00 | |
Styles * speakers | rateSyl | 1.92 | .04 |
∆SylLn | .42 | .92 | |
VarcoC | 1.16 | .32 | |
nPVI_V | .92 | .50 | |
%V | .90 | .52 |
For the “Styles” effect (Table 10), some features have significant p-values (rateSyl, ∆SylLn, %V), indicating their level of importance in distinguishing between the two speaking styles (read and spontaneous). However, VarcoC (p = .80) and nPVI-V (p = .39) do not indicate a significant effect, suggesting that they might not be as effective in differentiating styles.
Under the “Speakers” effect, the dependent variables rateSyl, nPVI_V, and %V show significant p-values (p < .001), suggesting significance in distinguishing between individual speakers. On the other hand, ∆SylLn and VarcoC have higher p-values (∆SylLn: p = .11, VarcoC: p = .67), indicating that they might be less effective in differentiating individual speakers. This also means that these measures do not vary much across speakers in either read or spontaneous speech. However, the interaction between style and speakers are not statistically significant.
Based on these results, it can be concluded that %V and rateSyl are the rhythmic measures that can discriminate speakers the best followed by nPVI-V. These two rhythmic measures (%V and rateSyl) have significant effects of speaker at the 0.000 level, and have relatively large F-values compared to the other measures which means that they vary significantly across speakers in both read and spontaneous speech. Table 11 and Figure 4 show %V and rateSyl changes for the participants of the study.
Speakers | rateSyl | %V | ||||
---|---|---|---|---|---|---|
Read | Spo | Mean | Read | Spo | Mean | |
1 | 4.77 | 5.32 | 5.18 | 45.23 | 41.39 | 42.35 |
2 | 4.25 | 5.51 | 5.20 | 51.24 | 46.77 | 47.89 |
3 | 4.09 | 5.82 | 5.39 | 41.71 | 39.61 | 40.14 |
4 | 3.94 | 4.72 | 4.52 | 36.47 | 34.47 | 34.97 |
5 | 3.36 | 4.41 | 4.15 | 39.71 | 42.33 | 41.68 |
6 | 4.29 | 5.42 | 5.14 | 38.52 | 33.53 | 34.78 |
7 | 4.91 | 5.19 | 5.12 | 43.62 | 42.90 | 43.08 |
8 | 4.41 | 4.77 | 4.68 | 42.85 | 38.23 | 39.39 |
9 | 5.08 | 6.14 | 5.87 | 41.90 | 42.15 | 42.09 |
10 | 4.44 | 6.03 | 5.63 | 41.56 | 41.69 | 41.66 |
The comparison of rateSyl in both corpora, represented on Table 11, indicates that Speaker 9 exhibits the highest mean rateSyl value in both styles, implying the fastest speech rate on average compared to other speakers; while Speaker 5 displays the lowest mean rateSyl value in both styles, suggesting the slowest speech rate on average in comparison with other speakers. Speakers 2, 3, 6, and 8 have similar mean rateSyl values in both modes, indicating relatively consistent speech rates between their read and spontaneous speech. However, Speakers 1, 4, 7, and 10 have intermediate mean rateSyl values in both styles, which suggests moderate speech rates compared to the other speakers.
The mean %V value varies among the speakers in both read and spontaneous speech styles. Speaker 2 exhibits the highest mean %V value in both styles while speaker 4 displays the lowest mean %V. Speakers 1, 5, 7, 8, 9, and 10 have intermediate mean %V values in both styles. However, Speaker 6 shows a notable difference in mean %V value between read and spontaneous speech modes, with a lower value in spontaneous speech compared to read speech. The comparison of %V in both corpora is shown in Table 11 and Figure 4.
The boxplots for rateSyl (Figure 4) show how the 10 speakers differ in their speech rate in Kalhori speech. According to the plots, the range of rateSyl values for speaker 9 is from 4 to 7.2, meaning that this speaker sometimes speaks as slow as 4 syllables per second and sometimes as fast as 7.2 syllables per second. This is while other speakers’ ranges were from 2.8 to 6.8. Speaker 2 also produces the lowest variability in rateSyl values, as indicated by the width and shape of the box and whiskers. The range of rateSyl values for speaker 2 is from 3 to 4.6, which means that this speaker does not change their speech rate as much and speaks consistently around 3 to 4 syllables per second. This is a narrower range compared to others which range from 3 to 7.2. The other speakers have median rateSyl values ranging from 35 to 40, and variabilities ranging from low to high. Therefore, rateSyl varies significantly between these 10 speakers, and it signifies different levels of variability, different medians, and different ranges across speakers.
The boxplots for %V (Figure 4) shows how the 10 speakers differ in their vocalic intervals in both spontaneous and read speech. Accordingly, speaker 1 has the highest median %V value, meaning that this speaker on average has more vowels in speech than the other speakers. It also has the highest variability in %V values, which means that this speaker, compared to others, produces more variation in vocalic intervals. Speaker 3 has the lowest median %V value. It also has the lowest variability in %V values. The other speakers’ median %V values range from 35% to 40%, and their variabilities range from low to high. Some speakers also have outliers, extreme values that deviate from the rest of the data. These outliers indicate that some speakers in some cases produce very low or very high %V values. Therefore, %V varies significantly between these 10 speakers, as it shows different levels of variability, different medians, and different ranges across speakers.
5. DISCUSSION AND CONCLUSION
⌅Documenting
and describing languages, whether they are endangered or widely spoken,
has many purposes, from conserving the inherited knowledge of the
language community to exploring the range of structures and
communication events the human mind can handle (Gibbon, 2022Gibbon, D. (2022). Speech rhythms: learning to discriminate speech styles. Proc. Speech Prosody 2022, 302-306.
).
One aspect of this range is how language relates to other modes of
communication, and one feature of this aspect is the specific rhythm
patterns of speech that distinguish a language community, along with
other regular events in daily life and culture (Gibbon, 2022Gibbon,
D. (2022). New Perspectives on Ibibio Speech Rhythm. In Current Issues
in Descriptive Linguistics and Digital Humanities: A Festschrift in Honor of Professor Eno-Abasi Essien Urua (pp. 457-486). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-2932-8_34
).
To
respond to the first research question (i.e., to study the rhythmic
typology of Kalhori rhythm based on the read corpus), ∆C, %V and nPVI-V
were analyzed. Ramus et al. (1999)Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
by calculating ∆C, %V showed that English is a
stress-timed language and French is a syllable-timed language while
stress-timed languages demonstrated a high ∆C by reflecting high
C-interval variability and low %V by reflecting high V-interval
variability, and syllable-timed languages indicated a low ∆C and high
%V. On the other hand, nPVI that were studied by Grabe and Low (2002)Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
classified English as a stress-timed language and
French as a syllable-timed language since the variability of
consecutive vocalic intervals in stress-timed languages was higher than
syllable-timed languages.
Findings of the descriptive analysis of read corpus (Table 3) demonstrate that the Kalhori nPVI-V is 47.36, std of %V is 5.61 and std of ∆C is 0.016. Table (4) compares ∆C, %V of French and English (derived from Ramus et. al, 1999Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
), their nPVI-V (derived from Grabe & Low, 2002Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
) to the finding of this study. These findings are comparable to the outcome of this study since both Ramus et al. (1999)Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73(3), 265-292. https://doi.org/10.1016/S0010-0277(99)00058-X
and Grabe and Low (2002)Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
studies used the story of “The North Wind and the Sun” to collect their data.
As
lower value of %V shows more variability of vowel intervals, and a
lower value of ∆C reflects less variability of consonant intervals (Dellwo, 2010Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
), Table (4) presents that Kalhori Kurdish has less variability of vowel intervals
and less variability of consonant intervals than English and French.
Moreover, nPVI-V reflects the variability of successive vocalic
intervals.
Drawing on Grabe and Low (2002)Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7(1982), 515-546. https://doi.org/10.1515/9783110197105
, Kalhori read speech is placed among the stress-timed languages since Table (4) shows that the variability of vowel intervals in Kalhori Kurdish is
higher than French, but lower than English. Consequently, the rhythm
class of Kalhori Kurdish can be placed between stress-timed and
syllable-timed based on the read corpus with the controlled situation in
which participants of the same aged group read a story in a normal
speed.
Furthermore, conducting the first experiment in read corpus
allowed us to investigate the impact of sentence structure on the
rhythmic measures of read Kalhori speech. Five measures of rateSyl,
∆SylLn, VarcoC, nPVI-V and %V were selected based on Pearson correlation
analysis. The results indicate that only two of these measures (VarcoC
and %V) are significantly different between sentences. While VarcoC is a
measure of consonantal variability and reflects the degree of variation
in the duration of consonantal intervals, %V is a measure of vocalic
proportion and shows the percentage of vowel duration in the total
duration of the utterance. These two measures are related to the
syllable structure and the vowel-consonant ratio of the sentences (Dellwo, 2010Dellwo, V. (2010). Influences
of Speech Rate on the Acoustic Correlates of Speech Rhythm: An
Experimental Phonetic Study Based on Acoustic and Perceptual Evidence. Doctoral dissertation, Bonn University, Germany.
). According to the results (Table 6),
VarcoC is only marginally significant VarcoC even while showing a low
F-value, proposing a small part of the total variation in the data.
On
the other hand, %V is highly significant, meaning that the difference
between sentences is due to sentence structure rather than random
variation. Moreover, %V has a high F-value (5.41), which is indicative
of a large part of the total variation in the data. The results suggest
that, based on data, %V is the best measure to determine the Kalhori
between-sentence variability. In other words, sentences with different
structures have different proportions of vowel duration in their total
duration. This may be related to the phonological and morphological
features of Kalhori, such as vowel harmony, vowel lengthening, and
consonant clusters. Hence, the outcome of this study is aligned with the
results of Taghva et al. (2021)Taghva,
N., Abolhasani Zade, V., and Moloodi, A., (2021). Durational rhythmic
variability of Persian between-sentence, Presented at SLE 2021, Socities
Linguistica Europia, 223-224.
, who showed that VarcoC and %V are robust measures among Persian between-sentence differences.
To respond to the research question probing the most efficient durational rhythmic measures for between-speaker rhythmic variability in Kalhori speech, the read speech style as well as the spontaneous speech style were examined using five rhythmic measures selected by Pearson correlation analysis: ratesyl, ∆Sylln, VarcoC, nPVI-V, and %V. Therefore, a MANOVA (Table 9 and 10) was conducted to examine which rhythmic measure or measures best discriminated between-speakers. The results revealed that:
-
RateSyl, %V and nPVI-V differed significantly between both speech styles and speakers. However, the F-value of nPVI-V (4.37) is less than RateSyl (11.036) and %V (11.121).
-
∆Sylln and VarcoC did not show significant differences between speakers.
Therefore,
based on this analysis, the rhythmic measures that best discriminated
between Kalhori speakers in both read and spontaneous speech styles were
%V and rateSyl. These two measures identified individual speakers most
effectively based on durational rhythmic analysis. Consequently, the
rate of the syllable intervals together with the vocalic proportion of
speech are the most useful features for identifying the speakers based
on durational rhythmic measures. Findings of this study are in line with
the findings of Asadi et al. (2018)Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
and Dellwo et al. (2015)Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
for Persian and German. Asadi et al. (2018)Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
demonstrated the robustness of %V against both
sources of within-speaker variability including time-lapsing and
speech-rate variability.
In conclusion, the use of durational
measures as a forensic cue may have important implications for the
situations where speaker identification information is required (Arvaniti, 2012Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech rhythm. Journal of Phonetics, 40(3), 351-373. https://doi.org/10.1016/j.wocn.2012.02.003
; Leeman et al., 2014Leemann,
A., Kolly, M. J., & Dellwo, V. (2014). Speaker-individuality in
suprasegmental temporal features: Implications for forensic voice
comparison. Forensic science international, 238, 59-67. https://doi.org/10.1016/j.forsciint.2014.02.019
; Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. https://doi.org/10.1121/1.4906837
; He & Dellwo, 2016He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language & the Law, 23(2). doi: 10.1558/ijsll.v23i2.30345
; Asadi et al., 2018Asadi,
H., Nourbakhsh, M., He, L., Pellegrino, E., & Dellwo, V. (2018).
Between-speaker rhythmic variability is not dependent on language
rhythm, as evidence from Persian reveals. International Journal of Speech, Language and the Law, 25(2), 151-174. https://doi.org/10.1558/ijsll.37110
). Therefore, the findings of this study hold
great potential for enhancing speaker identification in diverse forensic
cases. Particularly, the identification of %V and rateSyl as the most
distinguishing measures between speakers suggests their potential as
valuable acoustic-prosodic features for forensic voice comparison tasks.
However, the comparison of the most discriminative measures for between-sentence variability (VarcoC and %V) with those for between-speaker variability (rateSyl, %V) reveals that %V is influenced by both language-specific and speaker-specific factors, which may affect its variability between sentences and speakers. Hence, while rhythmic measures such as %V hold promise as effective discriminators between speakers, their performance can be influenced by factors other than the voice alone, including linguistic peculiarities. Therefore, forensic practitioners must exercise caution in adapting and validating speaker identification models to account for the specific linguistic and contextual characteristics of the language being investigated.
This study thus sheds light on the complex interplay between language-specific factors and speaker identification, highlighting the need for a nuanced and comprehensive approach to ensure the accuracy and reliability of forensic voice analysis techniques. Analyzing other varieties of Kurdish language could also serve as a fruitful area of study for future attempts.