1. INTRODUCTION
⌅The
way an individual speaks is highly idiosyncratic as it is largely
determined by his/her anatomy, sex, age, language background, social
status and health conditions (Dellwo et al., 2007Dellwo,
V., Huckvale, M., & Ashby, M. (2007). How is individuality
expressed in voice? An introduction to speech production and description
for speaker classification. In C. Müller (Ed.), Speaker Classification I (pp. 1-20), LNAI 4343. Berlin-Heidelberg: Springer-Verlag.
).
During
social interactions, however, the way individuals sound like they do is
also influenced by the characteristics of the interlocutor (i.e., age,
dialect, social status), the formality of the communicative setting
(i.e., formal vs informal) and the quality of background conditions
(i.e., noisy vs quiet) (Giles & Ogay, 2007Giles, H. & Ogay, T. (2007). Communication accommodation theory. In B. B. Whaley & W. Samter (Eds.), Explaining Communication: Contemporary Theories and Exemplars (pp. 293-310). Mahwah NJ: Lawrence Erlbaum.
).
When we address to infants, relative to adults, for example, we
typically speak slower, use longer pauses, exaggerate pitch variations
and hyper-articulate vowels (see a.o., Fernald et al., 1989Fernald
A., Taeschner T., Dunn J., Papousek M., de Boysson-Bardies B., &
Fukui I. (1989). A cross-language study of prosodic modifications in
mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16(3), 477-501. http://dx.doi.org/10.1017/S0305000900010679
; Soderstrom, 2007Soderstrom M. (2007). Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review, 27(4), 501-532. http://dx.doi.org/10.1016/j.dr.2007.06.002
). Most of these acoustic characteristics that are
used to gain an infant’s attention and to facilitate language
acquisition, are also present when talking to elderly people (Kemper, 1994Kemper, S. (1994). Speech accommodations to older adults. Aging and Cognition, 1, 17-28.
), and to some extent to second language speakers (Ferguson, 1975Ferguson, C. A. (1975). Towards a characterization of English foreigner talk. Anthropological Linguistics, 17, 1-14.
), or when the interaction takes place in a noisy environment (Hazan & Baker, 2011Hazan,
V. & Baker, R. (2011). Acoustic-phonetic characteristics of speech
produced with communicative intent to counter adverse listening
conditions. The Journal of the Acoustical Society of America, 130, 2139-2152. http://dx.doi.org/10.1121/1.3623753
) to foster comprehension.
In addition to
speech adjustments in response to interlocutors’ characteristics,
communicative and background conditions, there is evidence that
interlocutors tend to adjust their verbal and non-verbal behaviour
during and after exposure to a communication partner or a model talker.
This phenomenon is known as accommodation (Giles & Ogay, 2007Giles, H. & Ogay, T. (2007). Communication accommodation theory. In B. B. Whaley & W. Samter (Eds.), Explaining Communication: Contemporary Theories and Exemplars (pp. 293-310). Mahwah NJ: Lawrence Erlbaum.
), alignment (Pickering & Garrod, 2006Pickering, M. J. & Garrod, S. (2006). Alignment as the basis for successful communication. Research on Language and Computation, 4 (2-3), 203-228. http://dx.doi.org/10.1007/s11168-006-9004-0
), entrainment (Brennan, 1996Brennan, S. E. (1996). Lexical entrainment in spontaneous dialog. Proceedings of the International Symposium on Spoken Dialogue, Philadelphia, PA, 41-44.
), synchrony (Edlund et al., 2009Edlund, J., Heldner, M. & Hirschberg, J. (2009). Pause and gap length in face-to-face interaction. 10th Annual Conference of the International Speech Communication Association, 2779-2782. http://dx.doi.org/10.21437/Interspeech.2009-710
), mimicry (Pentland, 2008Pentland, A. (2008). Honest Signal: How They Shape Our World. Cambridge, MA: MIT Press.
) and chameleon effect (Chartrand & Bargh, 1999Chartrand, T. L. & Bargh, J. A. (1999). The chameleon effect: The perception- behavior link and social interaction. Journal of Personality and Social Psychology, 76(6), 893. http://dx.doi.org/10.1037/0022-3514.76.6.893
). Evidence of mutual adjustments between speakers has been found in conversation and shadowing tasks (see a.o. Pardo et al., 2018Pardo,
J. S., Urmanche, A., Wilman, S., Wiener, J., Mason, N., Francis, K.,
& Ward, M. (2018). A comparison of phonetic convergence in
conversational interaction and speech shadowing. Journal of Phonetics, 69, 1-11. http://dx.doi.org/10.1016/j.wocn.2018.04.001
) and encompasses many linguistic, para- and
extralinguistic features. Indeed, accommodation has been found at the
level of lexical choices (e.g., Bell, 2001Bell, A. (2001). Back in style: Reworking audience design. In P. Eckert, & J. R. Rickford (Eds.), Style and Sociolinguistic Variation (pp. 139-169). Cambridge: Cambridge University Press.
; Ward & Litman, 2007Ward,
A. & Litman, D. (2007). Automatically measuring lexical and
acoustic/prosodic convergence in tutorial dialogue corpora. In SLaTE Speech and Language Technology in Education 2007.
), grammatical and syntactical structures (Branigan et al., 2000Branigan, H. P., Pickering, M. J., & Cleland, A. A. (2000). Syntactic co-ordination in dialogue. Cognition, 75(2), B13-B25. http://dx.doi.org/10.1016/S0010-0277(99)00081-5
; Reitter et al. 2006Reitter,
D., Moore, J. D., & Keller, F. (2006). Priming of syntactic rules
in task-oriented dialogue and spontaneous conversation. In R. Sun (Ed.), Proceedings of the 28th Annual Conference of the Cognitive Science Society (pp. 685-690). Mahwah: Lawrence Erlbaum Associates, Inc.
), pronunciation (i.e., vowel quality, voice onset time, rate, f0, intensity) (see a.o, Babel, 2012Babel, M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics, 40(1), 177-189.
; Nielsen, 2011Nielsen, K. (2011). Specificity and abstractness of VOT imitation. Journal of Phonetics, 39(2), 132-142. http://dx.doi.org/10.1016/j.wocn.2010.12.007
; Sancier & Fowler, 1997Sancier, M. L. & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25(4), 421-436. http://dx.doi.org/10.1006/jpho.1997.0051
; Zellou, 2016Zellou, G., Scarborough, R., & Nielsen, K. (2016). Phonetic imitation of coarticulatory vowel nasalization. The Journal of the Acoustical Society of America, 140(5), 3560-3575. http://dx.doi.org/10.1121/1.4966232
; Levitan & Hirschberg, 2011Levitan,
R. & Hirschberg, J. B. (2011). Measuring acoustic-prosodic
entrainment with respect to multiple levels and dimensions. In P. Cosi,
R. De Mori, G. Di Fabbrizio, & R. Pieraccini (Eds.), Interspeech 2011, 3081-3084. http://dx.doi.org/10.21437/Interspeech.2011-771
; Manson et al., 2013Manson,
J. H., Bryant, G. A., Gervais, M. M., & Kline, M. A. (2013).
Convergence of speech rate in conversation predicts cooperation. Evolution and Human Behavior, 34(6), 419-426. http://dx.doi.org/10.1016/j.evolhumbehav.2013.08.001
; Ross et al., 2021Ross,
J. P., Lilley K. D., Clopper, C. G., Pardo, J. S., & Levi, S. V.
(2021). Effects of dialect-specific features and familiarity on
cross-dialect phonetic convergence. Journal of Phonetics, 86, 101041. http://dx.doi.org/10.1016/j.wocn.2021.101041
), facial expressions (Lakin, 2013Lakin, J. L. (2013) Behavioral mimicry and interpersonal synchrony. In J. A. Hall & M. L. Knapp (Eds.), Nonverbal Communication (pp. 539-576). Berlin: De Gruyter Mouton. http://dx.doi.org/10.1515/9783110238150.539
) and body movements (Dijksterhuis & Bargh, 2001Dijksterhuis,
A. & Bargh J. A. (2001). The perception-behavior expressway:
Automatic effects of social perception on social behavior. In M. Zanna
(Ed.), Advances in Experimental Social Psychology, vol. 33, (pp. 1-40). San Diego: Academic Press.
).
Accommodation is not a prerequisite of human-human interactions, as
evidence of this phenomenon has been found in human-computer
interactions (see a.o. Bell et al. 2003Bell, L., Gustafson, J., & Heldner, M. (2003). Prosodic adaptation in human-computer interaction. International Congress of Phonetic Sciences, (ICPhS), Barcelona, 2003, 2453-2456.
; Raveh et al., 2019Raveh,
E., Siegert, I., Steiner, I., Gessinger, I., & Möbius B. (2019).
Three’s a crowd? Effects of a second human on vocal accommodation with a
voice assistant. Interspeech 2019. Graz, 4005-4009. http://dx.doi.org/10.21437/Interspeech.2019-1825
; Gessinger et al. 2021Gessinger,
I., Möbius, B., Le Maguer, S., Raveh, E., & Steiner, I. (2021).
Phonetic accommodation in interaction with a virtual language learning
tutor: A Wizard-of-Oz study. Journal of Phonetics, 86, 101029. http://dx.doi.org/10.1016/j.wocn.2021.101029
) and in animal communication (Ruch et al., 2017Ruch, H., Zürcher Y., & Burkart J. (2017). The function and mechanism of vocal accommodation in humans and other primates. Biological Reviews. http://onlinelibrary.wiley.com/doi/10.1111/brv.12382/full.
for a review).
For
the domain of human-human communication, two major theoretical models
have been proposed to account for interspeaker’ adjustments: the social
approach of the Communication Accommodation Theory (CAT) (e.g., Giles et al. 1991Giles,
H., Coupland, N. & Coupland, J. (1991). Accommodation theory:
Communication, context, and consequence. In H. Giles, J. Coupland, &
N. Coupland (Eds.), Contexts of Accommodation: Developments in Applied Sociolinguistics (pp. 1-68). Cambridge: Cambridge University Press.
; Shepard et al., 2001) and the automatic account of the Interactive Alignment Model (IAM) proposed by Pickering & Garrod (2004)Pickering, M. J. & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27(2), 169-190. http://dx.doi.org/10.1017/S0140525X04000056
. The former postulates that speakers express
social closeness to or distance from their interlocutors, by
respectively becoming acoustically more similar (convergence) or
dissimilar (divergence) (Soliz & Giles, 2016Soliz,
J. & Giles, H. (2016). Relational and identity processes in
communication: A contextual and meta-analytical review of Communication
Accommodation Theory. Annals of the International Communication Association, 38(1), 107-144. http://dx.doi.org/10.1080/23808985.2014.11679160
). The latter, instead, assumes that convergence
in conversation is regulated by a priming mechanism based on the
automatic link between perception and production. Evidence in support of
CAT can be found in studies showing that social factors, among which
speakers’ perceived friendliness, dominance, attractiveness, attitude or
stereotypes towards a specific language variety (e.g., Babel et al., 2013Babel, M., McAuliffe, M., & Haber, G. (2013). Can mergers-in-progress be unmerged in speech accommodation? Frontiers in Psychology, 4(653), 1-14. http://dx.doi.org/10.3389/fpsyg.2013.00653
; 2014Babel, M., McGuire, G., Walters, S. & Nicholls, A. (2014). Novelty and social preference in phonetic accommodation. Laboratory Phonology, 5(1), 123-150. http://dx.doi.org/10.1515/lp-2014-0006
, Schweitzer & Lewandowski, 2014Schweitzer, A. & Lewandowski, N. (2014). Social factors in convergence of F1 and F2 in spontaneous speech. International Seminar on Speech Production, Cologne. https://www.ims.uni-stuttgart.de/documents/team/schweitz/docs/SchweitzerLewandowski2014.pdf
, Michalsky & Schoormann, 2017Michalsky, J., Schoormann H. (2017). Pitch convergence as an effect of perceived attractiveness and likability. Interspeech. Stockholm, 2253-2256.
; Gregory & Webster, 1996Gregory,
S. W. & Webster, S. (1996). A nonverbal signal in voices of
interview partners effectively predicts communication accommodation and
social status perceptions. Journal of Personality and Social Psychology, 70(6), 1231-1240. 10.1037/0022-3514.70.6.1231
) affect the amount and direction of
accommodation. IAM, instead, is supported by the line-up of studies
documenting convergence in non-interactive settings (e.g., shadowing
task) in which participants are not instructed to imitate the model
talker or explicitly requested to avoid imitation (e.g., Goldinger, 1998Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251-279. http://dx.doi.org/10.1037/0033-295X.105.2.251
; Shockley et al., 2004Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception & Psychophysics, 66(3), 422-429. http://dx.doi.org/10.3758/BF03194890
; Walker & Campbell-Kibler, 2015Walker,
A. & Campbell-Kibler, K. (2015). Repeat what after whom? Exploring
variable selectivity in a cross-dialectal shadowing task. Frontiers in Psychology, 6(546). http://dx.doi.org/10.3389/fpsyg.2015.00546
; cf. Dufour & Nguyen, 2013Dufour, S. & Nguyen, N. (2013). How much imitation is there in a shadowing task? Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00346.
, for a comparison between imitation and shadowing tasks).
Studies
on phonetic convergence, however, have pointed out the influence of
factors other than social on speakers’ accommodation behaviour. It has
been, indeed, observed that individuals greatly vary in the amount and
direction of convergence depending on the frequency characteristics of
the lexical items (Goldinger, 1998Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251-279. http://dx.doi.org/10.1037/0033-295X.105.2.251
; Goldinger & Azuma, 2004Goldinger, S. D. & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin & Review, 11(4), 716-722. http://dx.doi.org/10.3758/BF03196625
; Nielsen, 2011Nielsen, K. (2011). Specificity and abstractness of VOT imitation. Journal of Phonetics, 39(2), 132-142. http://dx.doi.org/10.1016/j.wocn.2010.12.007
), previous exposure to lexical items (Goldinger, 1998Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251-279. http://dx.doi.org/10.1037/0033-295X.105.2.251
; Goldinger & Azuma, 2004Goldinger, S. D. & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin & Review, 11(4), 716-722. http://dx.doi.org/10.3758/BF03196625
), cognitive load involved in a task (Abel & Babel, 2017Abel J. & Babel M. (2017). Cognitive load reduces perceived linguistic convergence between dyads. Language and Speech, 60(3), 479-502. http://dx.doi.org/10.1177/0023830916665652
), and phonetic distance between interlocutors’ language repertoires (Babel, 2012Babel, M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics, 40(1), 177-189.
; Walker & Campbell-Kibler, 2015Walker,
A. & Campbell-Kibler, K. (2015). Repeat what after whom? Exploring
variable selectivity in a cross-dialectal shadowing task. Frontiers in Psychology, 6(546). http://dx.doi.org/10.3389/fpsyg.2015.00546
; Walters et al., 2013Walters, S. A., Babel, M. E., & McGuire, G. (2013). The role of voice similarity in accommodation. Proceedings of Meetings on Acoustics, 19(1), 060047.58. http://dx.doi.org/10.1121/1.4800716
). The effect of linguistic and phonetic factors
was not accounted for by either IAM or CAT. A more dominant view that
reconciles the social and the automatic perspectives and integrates the
effect of linguistic-phonetic factors on accommodation is the so-called
hybrid approach (Babel, 2012Cohen Priva, U. & Sanker, C. (2018). Distinct behaviors in convergence across measures. Annual Conference of the Cognitive Science Society, Madison, WI, 1518-1523.
; Pardo, 2012Pardo, J. S., Gibbons, R., Suppes, A., & Krauss, R. M. (2012). Phonetic convergence in college roommates. Journal of Phonetics, 40(1), 190-197. http://dx.doi.org/10.1016/j.wocn.2011.10.001
; Pardo et al., 2017Pardo,
J. S., Urmanche, A., Wilman, S., & Wiener, J. (2017). Phonetic
convergence across multiple measures and model talker. Attention, Perception, & Psychophysics, 79(2), 637-659. http://dx.doi.org/10.3758/s13414-016-1226-0
). In this view, social, linguistic and phonetic
factors are seen as catalysts or inhibitors of convergence in that they
can boost or diminish the strength of the link between perception and
production.
The aim of the present paper is to contribute to advancing the understanding of forms and factors evoking convergence, shifting the attention from the typical acoustic correlates of phonetic convergence (i.e., vowel quality, rate, pitch, intensity, voice onset time) to speech rhythm, conceptualized here as the variability of segmental durational characteristics. Rhythmic convergence is studied using a pre-existent dataset designed to study cross-dialectal vowel convergence (cf. 2.1.). This will ultimately permit to compare the accommodation behavior of the same speakers across different measures, and test which type of factors between linguistic (cross-dialectal phonetic distance) and social (dialect markedness) will be driving convergence or divergence.
1.1. Rhythmic Accommodation
⌅Three basic questions may arise when studying rhythmic accommodation: (a) Can speech rhythm in terms of segmental timing properties be object of adjustments between speakers? (b) In which communicative contexts is it possible to study rhythmic accommodation? and (c) Why is the research on accommodation in segmental timing a worthwhile pursuit?
With
respect to (a) speech rhythm research has provided evidence that the
durational characteristics of consonantal and vocalic intervals, as well
as amplitude envelope characteristics, vary in response to the
interlocutor’s age and cognitive development. For example, studies on
the rhythmic characteristics of infant- compared to adult-directed
speech have shown that: a) English, Catalan and Spanish mothers present
less durational variability of consonantal and vocalic intervals as well
as longer vowel duration when speaking to their children compared to
addressing adults (Payne et al., 2009Payne, E., Post, B., Astruc, L., Prieto, P., & Vanrell, M. (2009). Rhythmic modification in child directed speech. Oxford University Working Papers in Linguistics, Philology & Phonetics, 12, 123-144.
);
b) in Australian English delta modulations corresponding to the
prosodic stress is greater in infant- than in adult-directed speech,
while theta modulations, tracking syllable patterns, dominated the
adult-directed speech modulation spectrum (Leong et al., 2017Leong V., Kalashnikova, M., Burnham, D., & Goswami, U. (2017). The Temporal Modulation Structure of Infant-Directed Speech. Open Mind: Discoveries in Cognitive Science, 1, 78-90.
).
Not only do speech rhythm vary depending on the interlocutors’
characteristics, but the presence itself of an interlocutor (i.e.,
reading partner) has been shown to influence the degree of rhythm
entrainment in synchronous reading tasks (Cerda-Oñate et al., 2021Cerda-Oñate, K., Toledo Vega, G., & Ordin, M. (2021). Speech rhythm convergence in a dyadic reading task. Speech Communication, 131, 1-12. http://dx.doi.org/10.1016/j.specom.2021.04.003
). In light of these findings, it seems plausible
to assume that speakers can also mutually adapt the production of
segmental timing features after exposure to a dialogue partner. On the
other hand, in view of evidence showing that the timing properties of
different speech intervals (e.g. consonants, vowels, voicing) are
resistant to different sources of within speaker variability (speaking
style, prosodic and linguistic factors) (Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M.-J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America, 137(3), 1513-1528. http://dx.doi.org/10.1121/1.4906837
; Leeman et al. 2014Leemann,
A., Kolly, M.-J., & Dellwo, V. (2014). Speaker-individuality in
suprasegmental temporal features: Implications for forensic voice
comparison. Forensic Science International, 238, 59-67. http://dx.doi.org/10.1016/j.forsciint.2014.02.019
), we cannot exclude that the speakers may
maintain their segmental durational characteristics in post-dialogue
productions. We will test precisely these two competing hypotheses in
the present study.
With respect to (b), one of the contexts in
which the study of rhythmic accommodation is possible is that of
dialects in contact. In this setting, one might examine whether speakers
of dialects that are mutually intelligible but present distinct
rhythmic features converge rhythmically after being exposed to each
other’s dialect. In this respect, the linguistic situation of
German-speaking Switzerland is an excellent testing ground for studying
cross-dialectal rhythmic accommodation. Swiss German dialects, indeed,
do not only differ for segmental features, speech rate and intonation
contours (see Leeman, 2012Leemann, A. (2012). Swiss German Intonation Patterns. Amsterdam, Philadelphia: John Benjamins Publishing Company.
for a review), but also for their rhythmic properties. It has been
documented that Midland vs Alpine dialects as well as Eastern vs Western
dialects can be grouped according to their rhythmic characteristics,
measured acoustically in terms of the timing variability of consonantal
and vocalic intervals (Leeman et al., 2012Leemann, A., Dellwo, V., Kolly, M. J., & Schmid, S. (2012). Rhythmic variability in Swiss German dialects. 6th International Conference on Speech Prosody, Shanghai, China, 607-610.
).
With
respect to (c), it has been argued that assessments of phonetic
convergence based on a single (supra)segmental feature hardly capture
the complexity of the phenomenon (Pardo et al., 2017Pardo,
J. S., Urmanche, A., Wilman, S., & Wiener, J. (2017). Phonetic
convergence across multiple measures and model talker. Attention, Perception, & Psychophysics, 79(2), 637-659. http://dx.doi.org/10.3758/s13414-016-1226-0
). Nevertheless, choosing one acoustic attribute
over another is still a valid approach when the comprehension of
dynamics of sound variation and change is at stake (Pardo et al., 2017),
or when decisions must be taken about which aspects of human-human
interaction have to be modelled in speech interactive systems to achieve
human-likeness (Beňuš, 2014Beňuš, Š. (2014). Social aspects of entrainment in spoken interaction. Cognition Computing, 6, 802-813. http://dx.doi.org/10.1007/s12559-014-9261-4
). Understanding whether rhythmic properties in
terms of segmental durational characteristics are object of mutual
adaptations can be also crucial for the interpretation of evidence in
forensic phonetic speaker comparisons. Any acoustic adjustments between
interlocutors might lead to mistake within- for between-speaker
variability and produce higher error in recognition rate.
2. THE STUDY
⌅2.1. Material
⌅To study rhythmic accommodation in a dialect contact situation, we used a corpus of speech material in Zurich and Grison German (henceforth ZHG and GRG), two Swiss German dialects exhibiting crucial segmental and suprasegmental differences (cf. 2.2.) that legitimate the assumption of interspeaker adjustments after exposure to the interlocutor’s dialect.
The corpus was designed, collected and annotated by Hanna Ruch to study vowel accommodation between GRG and ZHG (Ruch, 2015Ruch, H. (2015). Vowel convergence and divergence between two Swiss German dialects. 18th International Congress of Phonetic Sciences, Glasgow, UK.
). It included speech samples of:
-
2 audio-recorded diapix tasks (i.e., speakers comparing pictures that contain a certain number of differences, cf. Van Engen et al., 2010Van Engen, K. J., Baese-Berk, M., Baker, R. E., Choi, A., Kim, M., & Bradlow, A. R. (2010). The Wildcat Corpus of native-and foreign-accented English: Communicative efficiency across conversational dyads with varying language alignment profiles. Language and Speech, 53(4), 510-540. http://dx.doi.org/10.1177/0023830910372495
) performed by 18 pairs of previously unacquainted GRG and ZHG female speakers. -
18 pre- and 18 post-dialogue recordings (picture naming task and retelling a story based on a comic), performed individually by GRG and ZHG participants.
The diapix tasks were designed to elicit the target words present in picture naming task and story retelling. All tasks were carried out in one single recording session.
2.2. Cross-dialectal phonetic differences
⌅Grison and Zurich German present noticeable differences at several linguistic levels (Eckhardt, 1991Eckhardt, O. (1991). Die Mundart der Stadt Chur. Zürich: Phonogrammarchiv der Universität 624, Zürich.
; Fleischer & Smith, 2006Fleischer, J. & Schmid, S. (2006). Zurich German. Journal of the International Phonetics Association, 36, 243-253. http://dx.doi.org/10.1017/S0025100306002441
; Christen et al., 2010Christen, H., Glaser, E., & Friedli, M. (2010). Kleiner Sprachatlas der deutschen Schweiz. Frauenfeld: Huber Frauenfeld.
; Leeman, 2012Leemann, A. (2012). Swiss German Intonation Patterns. Amsterdam, Philadelphia: John Benjamins Publishing Company.
).
Phonetically, these have to do with the quality of front vowels,
realization of word-initial and post-vocalic k, speech rate and
intonation contours. It is of interest - for the purpose of this study -
that GRG and ZHG also exhibit segmental durational differences that
lead to a distinct rhythmic organisation of the two dialects. As
reported in the literature on acoustic differences between GRG and ZHG
(see a.o. Ruch, 2018Ruch, H. (2018). The role of acoustic distance and sociolinguistic knowledge in dialect identification. Frontiers in Psychology, 9, 818. http://dx.doi.org/10.3389/fpsyg.2018.00818
), these differences concern: a) intervocalic
sonorants gemination (henceforth ISG) in words ending in -e; b) open
syllable lengthening (henceforth OSL); c) vowel reduction in word final
position (henceforth RedVow).
Given that segmental timing properties are among the acoustic correlates of speech rhythm, in this paper we will refer to the three cross-dialectal differences in ISG, OSL and RedVow as rhythmic differences. Regarding ISG, GRG intervocalic sonorants can be realized either as geminates or as single consonants, while ZHG allows only the singleton realisation. As for OSL, in GRG open syllables can be either lengthened or not, while in ZHG the lengthening tendency has not been documented. With respect to RedVow, in GRG vowels in word final position are not reduced in quality, and presumably either in duration, while in ZHG word final vowels are always reduced. (Cf. Table 1 for examples of cross-dialectal realizations of ISG, OSL and GR).
)
Feature | Example | GRG realization | ZHG realization |
---|---|---|---|
ISG |
Sonne ‘sun’ |
nn [‘sunnɐ] n [sunɐ] |
n [‘sunǝ] |
OSL |
Sohle ‘sole’ |
V: [‘so:lɐ] V [‘solɐ] |
V [‘solǝ] |
Red Vow |
Suppe ‘soup’ |
ɐ [‘suppɐ] | ǝ [‘suppǝ] |
Evidence
in support that the differences in the quality of final vowels come
also with distinct timing patterns has been provided in Leeman et al. (2012)Leemann, A., Dellwo, V., Kolly, M. J., & Schmid, S. (2012). Rhythmic variability in Swiss German dialects. 6th International Conference on Speech Prosody, Shanghai, China, 607-610.
.
Here it was shown that the durational variability of vocalic intervals
was higher in Midland dialects (to which ZHG belongs to) than in the
group of Alpine dialects (to which GRG belongs to), and this was
interpreted in view of the tendency of Alpine dialects to retain full
vowels in unstressed position.
2.3. Method
⌅To understand whether pairs of GRG and ZHG speakers produce the rhythmic features more similarly after participating in the diapix tasks, the following steps were taken:
-
From the pre- and post-dialogue recordings of individual speakers, we extracted the lexical items instantiating the three target rhythmic features (ISG, OSL and RedVow)1As mentioned above, in this paper speech rhythm is conceived in a narrow sense, namely as the variability in segmental durational characteristics at the word level. For this reason, the analysis of convergence is focused on the three cross-dialectal segmental timing features (ISG, OSL and RedVow). However, as pointed out by one of the reviewers, the study of rhythmic convergence in broad sense would entail measuring more general parameters, like the classic rhythm metrics.
-
For every item, we measured the duration of individual segments. The raw measures of segment duration served as a basis for the calculation of three ratio measures designed ad hoc to capture inter-dialectal differences in ISG, OSL and RedVow.
-
For ISG, we calculated the ratio between the duration of intervocalic sonorants (l, n) in -CCe words (e.g., Sonne, Welle) and that of the corresponding sonorant in -Ce words (l or n from the item Melone).
-
For OSL, we calculated the ratio between the duration of stressed vowels in open syllables and that of unstressed vowels within the same item.
-
For RedVow, we calculated the ratio between the duration of stressed vowels in open and closed syllables and that of unstressed vowel within the same item.
-
To determine whether pairs GRG and ZHG speakers converge, diverge or maintain their rhythmic behaviour after the interaction, we calculated:
-
the Euclidean distance within individual pairs in the three ratio measures in pre- and post-dialogue recordings (i.e., dist 1 = GRG pre - ZHG pre; dist 2 = GRG post - ZHG post);
-
the difference in distance between the two speakers’ production of a word before the dialogues (i.e., dist 1 = GRG pre - ZHG pre) and after the dialogues (i.e., dist 2 = GRG post - ZHG post). Accommodation within a pair (DDpair) was calculated as follow: DDpair = dist 2 - dist 1. A negative difference in distance is evidence of convergence. A positive value indicates divergence. A value 0 demonstrates maintenance.
2.4. Data analysis and statistics
⌅The
present study reports on the data extracted from the picture naming
task. In view of evidence showing the influence of linguistic factors on
accommodation (Goldinger, 1998Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251-279. http://dx.doi.org/10.1037/0033-295X.105.2.251
; Goldinger & Azuma, 2004Goldinger, S. D. & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin & Review, 11(4), 716-722. http://dx.doi.org/10.3758/BF03196625
; Nielsen, 2011Nielsen, K. (2011). Specificity and abstractness of VOT imitation. Journal of Phonetics, 39(2), 132-142. http://dx.doi.org/10.1016/j.wocn.2010.12.007
), analysing the data from picture naming tasks
(henceforth PNT) has given the main advantage of controlling for the
effect of the item variability in the assessment of:
-
cross-dialectal differences before the interactions;
-
differences in distance between ZHG and GRG speakers in a pair before and after the interaction.
The lexical items used in this study and the dialectal features they instantiate are listed in Table 2 in the Standard German spelling.
Feature | Examples |
---|---|
ISG (5 items per speaker) |
Brunnen, Pfanne, Sonne, Spinne, Welle (5 items per speaker) |
OSL (8 items per speaker) |
Besen, Esel, Graben, Käfer, Lupe, Nase, Schlafen, Melone |
RedVow (15 items per speaker) |
Besen, Brunnen, Flosse, Graben, Lampe, Lunge, Lupe, Melone, Nase, Pfanne, Schlafen, Sonne, Spinne, Suppe, Welle |
To test (a), i.e., whether pairs of GRG and ZHG speakers realised the three durational contrasts differently before the interaction, and thus to make sure that there was room for rhythmic accommodation, we ran three separate Linear Mixed Effects Models with the ratio measures (ISG, OSL and RedVow) as dependent variables, dialect (ZHG and GRG) as fixed factor, and speaker and lexical item as random effect (i.e. random intercepts).
In the light of segmental durational differences between GRG and ZHG mentioned above, we make the following hypotheses regarding the rhythmic behaviour of ZHG and GRG speakers before the interaction:
-
ISG contrast is higher in GRG than in ZHG, given that in GRG intervocalic sonorants can be pronounced also as geminates, while in ZHG only as a single consonant.
-
OSL contrast is higher in GRG than in ZHG, given that in GRG open syllables can be lengthened, while in ZHG typically are not.
-
RedVow is higher in ZHG than in GRG given that in ZHG word final vowels are reduced, while in GRG they are pronounced as full vowels.
To test (b), i.e., whether pairs of GRG and ZHG speakers produce the three rhythmic features more similarly after the diapix tasks, we compared the Euclidean distances within pair in ISG, OSL and RedVow before and after the interactions. We ran three separate Linear Mixed Effect Models, with Euclidean distance in ISG, OSL and RedVow as dependent variables and Session (1 = before interaction; 2 = after the interaction) as fixed factor. Given that Euclidean distance between pairs may vary before and after the interaction, in the structure of the random effect we first included the random slope of Pairs by Session. However, this model was too complex to be supported by the data. For this reason, we simplified the random effects by including the intercept for the interaction between Session and Pair, instead of the random slope. The random part of the model comprised also the intercept for Item.
We hypothesise that if rhythmic features are object of
accommodation, dyads members adjust their rhythmic behaviour such that
the Euclidean distance in ISG, OSL and RedVow will be lower after than
before the interaction. In view of findings showing the effect that
speakers converge more for features that differ mostly between dialects (MacLeod, 2012MacLeod, B. (2012). The Effect of Perceptual Salience on Phonetic Accommodation in Cross-Dialectal Conversation in Spanish. Dissertation. Toronto: University of Toronto.
; Ruch, 2015Ruch, H. (2015). Vowel convergence and divergence between two Swiss German dialects. 18th International Congress of Phonetic Sciences, Glasgow, UK.
; Walker & Campbell-Kibler, 2015Walker,
A. & Campbell-Kibler, K. (2015). Repeat what after whom? Exploring
variable selectivity in a cross-dialectal shadowing task. Frontiers in Psychology, 6(546). http://dx.doi.org/10.3389/fpsyg.2015.00546
; Clopper & Dossey, 2020Clopper, C. G. & Dossey, E. (2020). Phonetic convergence to Southern American English: Acoustics and perception. The Journal of the Acoustical Society of America, 147(1), 671-671. http://dx.doi.org/10.1121/10.0000555
) and between the speakers and the model talkers (Babel, 2012Babel, M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics, 40(1), 177-189.
),
we hypothesise that more accommodation is evoked by RedVow than ISG and
OSL. RedVow, indeed, is one of the features that best distinguishes the
two dialects. ZHG indeed exhibits open syllable lengthening - though in
articulatory contexts other than GRG - and presents longer nasal
duration in -CCer words. However, given that the realisation of reduced
vowels is also a strong dialect marker for ZHG (Ruch, 2018Ruch, H. (2018). The role of acoustic distance and sociolinguistic knowledge in dialect identification. Frontiers in Psychology, 9, 818. http://dx.doi.org/10.3389/fpsyg.2018.00818
), in view of evidence about little convergence for features that are dialect markers (Babel, 2010Babel, M. (2010). Dialect divergence and convergence in New Zealand English. Language in Society, 39(4), 437-56. http://dx.doi.org/10.1017/S0047404510000400
), we cannot exclude that the speakers may diverge or maintain their original behaviour for RedVow.
To test these hypotheses, we ran one Linear Mixed Effects Model with DDpair as dependent variable and Ratio Type (ISG, OSL and RedVow) as fixed factor. The random part of the model comprised the intercept for the interaction between Pair and Ratio, as well as the intercept for Item. Statistical analyses were performed with RStudio (2009-2019) Version 1.2.1335.
2.5. Results and Discussion
⌅Regarding (a), i.e., cross-dialectal differences in ISG, OSL and RedVow before the interaction, the results from pre-dialogue recordings show a significant main effect of Dialect for the three measures (Table 3).
Ratio | Estimate | SE | t | p |
---|---|---|---|---|
ISG | -0.78 | 0.10 | -7.81 | <0.001 |
OSL | -0.31 | 0.06 | -4.87 | <0.001 |
RedVow | -0.12 | 0.03 | -3.16 | <0.01 |
As shown in Fig. 1, the scores obtained by GRG speakers in the three ratio measures are higher than ZHG speakers.
If the results for ISG and OSL are in line with predictions, what is more surprising is that RedVow is lower in ZHG than in GRG. One plausible explanation for this finding might be that in picture naming task, for which speakers were asked to pronounce words in isolation, ZHG speakers do not drastically reduce the duration of unstressed vowels in word final position, as these vowels are subjected to pre-pausal lengthening. In other words, in ZHG the durational difference between stressed and unstressed vowels in final word position is not that big as one might expect.
With respect to (b), i.e., the accommodation behaviour in ISG, OSL and RedVow, the results of statistical analysis reveal no significant main effect of Session (pre- and post-dialogue recordings) in the Euclidean distances (Table 4).
Ratio | Estimate | SE | t | p |
---|---|---|---|---|
ISG | -0.02 | 0.10 | -0.19 | 0.84 |
OSL | 0.05 | 0.06 | 0.83 | 0.41 |
RedVow | 0.008 | 0.03 | 0.25 | 0.82 |
In other words, the Euclidean distance between dyads members did not change significantly before and after the interactions (Fig. 2).
With respect to the hypothesis that RedVow is more prone to convergence compared to OSL and ISG, against the predictions, no significant differences in degree and direction of accommodation (DDpair) were found between the three ratio measures (Table 5).
Estimate | SE | t | p | |
---|---|---|---|---|
RatioOSL | 0.08 | 0.07 | 1.15 | 0.25 |
RatioRedVow | 0.04 | 0.06 | 0.71 | 0.47 |
Unlike
findings on vowel accommodation between GRG and ZHG or between other
dialects, showing more convergence for phonetically more distant
features (Ruch, 2015Ruch, H. (2015). Vowel convergence and divergence between two Swiss German dialects. 18th International Congress of Phonetic Sciences, Glasgow, UK.
; MacLeod, 2012MacLeod, B. (2012). The Effect of Perceptual Salience on Phonetic Accommodation in Cross-Dialectal Conversation in Spanish. Dissertation. Toronto: University of Toronto.
; Walker & Campbell-Kibler, 2015Walker,
A. & Campbell-Kibler, K. (2015). Repeat what after whom? Exploring
variable selectivity in a cross-dialectal shadowing task. Frontiers in Psychology, 6(546). http://dx.doi.org/10.3389/fpsyg.2015.00546
; Clopper & Dossey, 2020Clopper, C. G. & Dossey, E. (2020). Phonetic convergence to Southern American English: Acoustics and perception. The Journal of the Acoustical Society of America, 147(1), 671-671. http://dx.doi.org/10.1121/10.0000555
), and more divergence for acoustic attributes perceived as strong dialect markers (Babel, 2010Babel, M. (2010). Dialect divergence and convergence in New Zealand English. Language in Society, 39(4), 437-56. http://dx.doi.org/10.1017/S0047404510000400
; Clopper & Dossey, 2020Clopper, C. G. & Dossey, E. (2020). Phonetic convergence to Southern American English: Acoustics and perception. The Journal of the Acoustical Society of America, 147(1), 671-671. http://dx.doi.org/10.1121/10.0000555
), in the case of ISG, OSL and RedVow,
interpretations of accommodation based on phonetic distance or degree of
dialect markedness do not seem tenable (Fig 3).
As shown in Fig. 3, RedVow, indeed, was neither more nor less prone to accommodation than OSL and ISG. Conversely, the values of the three measures circle around zero pointing in favour of rhythmic maintenance.
There could be
at least two possible explanations for this result: (1) likewise the
rhythmic metrics analysed in previous research (e.g., Leeman et al., 2014Leemann,
A., Kolly, M.-J., & Dellwo, V. (2014). Speaker-individuality in
suprasegmental temporal features: Implications for forensic voice
comparison. Forensic Science International, 238, 59-67. http://dx.doi.org/10.1016/j.forsciint.2014.02.019
; Dellwo et al., 2015Dellwo,
V., Leemann, A., & Kolly, M.-J. (2015). Rhythmic variability
between speakers: Articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America, 137(3), 1513-1528. http://dx.doi.org/10.1121/1.4906837
), the three timing measures examined here may be
robust against source of within-speaker variability. The exposure to the
distinct rhythmic behaviour of the dialogue partner might have not
altered the post-dialogue realization of ISG, OSL and RedVow, as instead
was observed for vowel formants. We cannot exclude, however, that
accommodation in segmental timing properties has happened in the more
spontaneous tasks of the corpus which has not been object of the present
investigation. For future research, it will be interesting to examine
whether the same pattern would replicate when rhythm is examined at the
utterance level, using the metrics which have been typically employed in
speech rhythm research (see a.o. Ramus, Nespor and Mehler, 1999; Grabe & Low, 2002Grabe,
E. & Low, E. L. (2002). Durational variability in speech and the
rhythm class hypothesis. In N. Warner & C. Gussenhoven (Eds.), Papers in Laboratory Phonology 7 (pp. 515-546). Berlin: Mouton de Gruyter. http://dx.doi.org/10.1515/9783110197105.515
; Dellwo, 2006Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for C. In P. Karnowski & I. Szigeti (Eds.), Language and Language-Processing (pp. 231-241). Frankfurt am Main: Peter Lang.
; White and Mattys, 2007White, L. & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35(4), 501-522. http://dx.doi.org/10.1016/j.wocn.2007.02.003
; He & Dellwo, 2016He, L., & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language, and the Law, 23(2), 243-275. http://dx.doi.org/10.1558/ijsll.v23i2.30345
). (2) Another possible explanation may have to
do, instead, with the perceptual salience of the cross-dialectal
features, captured by the three rhythmic measures. Given that
differences must be perceptible in order to be imitated (Mitterer & Müssler, 2013Mitterer,
H. & Müsseler, J. (2013). Regional accent variation in the
shadowing task: Evidence for a loose perception-action coupling in
speech. Attention, Perception and Psychophysics, 75, 557-575. http://dx.doi.org/10.3758/s13414-012-0407-8
), the interspeakers’ differences in ISG, OSL and
RedVow may probably be too subtle to be perceived or retained after the
interaction. This would be also in line with findings from Swiss German
dialects recognition research that shows that listeners pay attention to
segmental features to a higher degree than rhythmic and prosodic
features when recognizing the dialectal origin of the speakers (see Leemann, et al., 2018Leemann,
A., Kolly, M.-J., Nolan, F., & Y. Li (2018). The role of segments
and prosody in the identification of a speaker’s dialect. Journal of Phonetics, 68, 69-84. http://dx.doi.org/10.1016/j.wocn.2018.02.001
; for varieties of English see a.o., Fuchs, 2015Fuchs,
R. (2015). You’re not from around here, are you? Dialect discrimination
experiment with speakers of British and Indian English. In E.
Delais-Roussarie, M. Avanzi, & S. Herment (Eds.), Prosody and Language in Contact (pp. 123-148). Berlin: Springer.
).
The
differences in accommodation behaviour of the same ZHG and GRG speakers
across segmental and rhythmic measures confirm the complexity and
multi-facetedness of vocal accommodation. As pointed out by Sanker (2015)Sanker, C. (2015). Comparison of phonetic convergence in multiple measures. Cornell Working Papers in Phonetics and Phonology 2015, 60-75.
and Cohen Priva and Sanker (2018)Cohen Priva, U. & Sanker, C. (2018). Distinct behaviors in convergence across measures. Annual Conference of the Cognitive Science Society, Madison, WI, 1518-1523.
,
patterns of convergence in one measure within a pair or within a
speaker cannot be taken to be representative of pairs and speakers’
overall convergence patterns in other measures.
3. CONCLUSIONS
⌅Based on a corpus of pre- and post-dialogue picture naming task performed by 18 speakers of GRG and ZHG, results reveal that members of pairs, who show significant durational differences before the interaction, do not shift noticeably the production of ISG, OSL and RedVow after being exposed to the interlocutors’ dialect. Although the evidence from rhythmic variability in child- and adult-directed speech, as well from synchronous reading, supports the view that rhythmic features can be object of interspeaker variations, these adjustments can be unidirectional and irrespective of the rhythmic behaviour of the dialogue partners