This paper asks whether there is an ‘interlanguage intelligibility benefit’ in perception of word-stress, as has been reported for global sentence recognition. L1 English listeners, and L2 English listeners who are L1 speakers of Arabic dialects from Jordan and Egypt, performed a binary forced-choice identification task on English near-minimal pairs (such as[ˈɒbdʒɛkt] ~ [əbˈdʒɛkt]) produced by an L1 English speaker, and two L2 English speakers from Jordan and Egypt respectively. The results show an overall advantage for L1 English listeners, which replicates the findings of an earlier study for general sentence recognition, and which is also consistent with earlier findings that L1 listeners rely more on structural knowledge than on acoustic cues in stress perception. Non-target-like L2 productions of words with final stress (which are primarily cued in L1 production by vowel reduction in the initial unstressed syllable) were less accurately recognized by L1 English listeners than by L2 listeners, but there was no evidence of a generalized advantage for L2 listeners in response to other L2 stimuli.
An ‘interlanguage intelligibility benefit’ has been reported for global sentence perception (Bent & Bradlow,
We use the term ‘stress’ to denote word-level stress or lexical prominence, and the term ‘accent’ to denote phrase-level stress or post-lexical prominence. The focus of our study is word-level stress as produced and perceived by speakers of English as first (L1) and as second or additional (L2) language. We note that—to investigate stress in languages such as English and Arabic in which both stress and accent are marked (Jun,
The acoustic correlates of stress have been shown to include duration, F0, overall intensity, frequency-sensitive intensity (spectral balance) and formant frequencies (F1/F2). Gordon and Roettger (
It is widely assumed that F0 is the most prominent and consistent cue to stress in English, based on the influential early study by Fry (
There has been less prior investigation of the acoustic correlates of stress in production of Arabic stress. Cross-dialectal variation in the acoustic cues to stress is likely, since cross-dialectal variation in phonological stress assignment is well established (Watson,
One of the first studies of the correlates of stress in Arabic was on Jordanian Arabic (JA), and indicated that the cues to stress in JA are duration and F1 (de Jong & Zawaydeh,
In a previous study we compared the correlates of stress in JA and EA—the two dialects investigated in the present study—and found that both dialects made use of duration, intensity and F0, but not formant frequencies or spectral balance (Almbark, Bouchhioua, & Hellmuth,
There is also cross-linguistic variation in the relative weighting of acoustic cues to stress in perception, and in the extent to which acoustic cues are relied upon compared to other factors.
Several studies have shown that listeners may rely on only a subset of the available acoustic cues in the signal. A recent study explored the perceptual behavior of English, Russian and Mandarin listeners in a forced choice identification task, in response to disyllabic pseudo-word stimuli in which F0, duration, intensity and F1/F2 of target vowels was systematically varied; vowel quality (F1/F2) had the greatest influence on the choices of listeners from all three language backgrounds, but there was variation in the relative weighting of suprasegmental cues (Chrabaszcz, Winn, Lin, & Idsardi,
Stress perception is also influenced by the phonological status of stress in the listener’s first language (L1). French is a language which does not display word-level stress, and a sequence of studies has shown that although French listeners are able to perceive the acoustic cues to stress in an AX discrimination task, they are unable to discriminate stress minimal pairs in a sequence recall task which requires phonological encoding of those acoustic cues in lexical representations (Dupoux, Pallier, Sebastián-Gallés, & Mehler,
Finally, perception of stress is not influenced solely by acoustic correlates to stress and their relative weighting or phonological status. Several studies have shown that ‘bottom-up’ phonetic cues are used alongside ‘top-down’ cues such as lexico-semantic information in perception and processing of stress (Cole, Mo, & Hasegawa-Johnson,
The term ‘interlanguage’ describes patterns of language use, displayed by second language learners, which fall somewhere between the grammar of the native language and the target language being acquired (Selinker,
The concept of an interlanguage speech intelligibility benefit was proposed by Bent and Bradlow (
The two main groups of L2 English listeners in the Bent and Bradlow (
In this study we explore whether there is an interlanguage speech intelligibility benefit in respect of L1 versus L2 realization of the phonetic cues to word-level stress.
The main research question of the paper is to determine whether there is an interlanguage intelligibility benefit in perception of English word stress. We use stimuli that were elicited using a paradigm designed to elicit English stress near-minimal pairs in a context in which the target word is realized without a phrase-level accent, thus focusing on listeners’ ability to make use of the phonetic cues to stress in the absence of cues to accent. Since vowel reduction is the primary cue to word stress for native English listeners (as noted in 2.2 above), it was important to use stimuli in which vowel reduction could appear, to determine whether failure to produce target words with appropriate vowel reduction reduces intelligibility, and perhaps differentially so for native versus non-native listeners. We therefore used near-minimal pairs in which vowel reduction in the unstressed syllable provides a segmental cue to stress alongside suprasegmental cues such as duration and intensity. The stimuli were produced by an L1 English native speaker (NE) and two L2 English non-native speakers (L2) from Jordan and Egypt, respectively. The listeners in a forced-choice identification task are L1 English listeners (NE) and L2 English listeners from Jordan and Egypt (L2). The over-arching research question stated in the title of this paper thus breaks down into three sub-questions, which we address in the present study by exploring the interaction of listener language and stimulus language in a single study with a crossed factor design:
Do NE listeners identify the position of stress in the productions of a NE speaker more accurately than in those of L2 speakers?
Do L2 listeners identify the position of stress in the productions of L2 speakers more accurately than in those of a NE speaker?
Do L2 listeners identify the position of stress in the productions of an L2 speaker from their own L1 dialect background more accurately than in those of an L2 speaker from a different L1 dialect background?
Based on Bent and Bradlow’s (
Stimuli which contrast in the position of stress were elicited using the nine English disyllabic near-minimal pairs, listed in
English near-minimal pairs, with stress on the first or second syllable.
stress on first syllable | stress on second syllable | ||
---|---|---|---|
ˈsʌbdʒɛkt | subject (n.) | səbˈdʒɛkt | subject (v.) |
ˈɹɛkɔːd | record (n.) | ɹɪˈkɔːd | record (v.) |
ˈkɒntɹæst | contrast (n.) | kənˈtɹæst | contrast (v.) |
ˈdaɪdʒɛst | digest (n.) | dɪˈdʒɛst | digest (v.) |
ˈkɒntɹækt | contract (n.) | kənˈtɹækt | contract (v.) |
ˈpɜːmɪt | permit (n.) | pəˈmɪt | permit (v.) |
ˈɒbdʒɛkt | object (n.) | əbˈdʒɛkt | object (v.) |
ˈkɒntɛnt | content (n.) | kənˈtɛnt | content (adj.) |
ˈkɒndʌkt | conduct (n.) | kənˈdʌkt | conduct (v.) |
Three further pairs (
The intended accent status of the target word was varied by using a carrier phrase that either attracts focus to the target word [+accent] or diverts focus away from it [−accent], again following Bouchhioua (
Target word (in bold) placed in carrier phrases to vary ±accent status. CAPITALS denote expected position of sentence accents under focus.
+accent | Say topic again. |
−accent | The subject is a grammatical category. |
The experimental stimuli for the present study were extracted from target-like tokens (as judged to consensus by the first and third authors) produced in −
(1) | stress on |
SAY ˈsʌbdʒɛkt again. |
stress on |
SAY səbˈdʒɛkt again |
The stimuli for the perception experiment were produced by three male speakers, from: Cairo, Egypt (EA); Amman, Jordan (JA); UK (native speaker of British English, NE). The speakers were aged 26, 20, and 39 years, respectively. The Arabic speakers had learned English at school for 12 years but had never resided in an English-speaking country; they were selected from participants in an earlier production study (Almbark
The results of acoustic analysis of the selected stimuli for duration, F0, intensity, F1/F2 and two measures of spectral tilt (H1.H2 or H1.A3), comparing properties of the vowel in the initial syllable (only), in stressed and unstressed condition, are illustrated in
Median and interquartile range for values of (from left to right) maximum F0, peak intensity, normalized vowel duration and two measures of spectral emphasis H1–H2 and H1A3, in the first vowel of experimental stimuli produced by the native speaker of Egyptian Arabic (EA; top row), Jordanian Arabic (JA; middle row), and English native speaker (NE; bottom row), grouped by stress condition (whether the vowel in which measurements was taken was stressed or unstressed).
F1/F2 plot of the first vowel in experimental stimuli produced by the native speaker of Egyptian Arabic (EA), Jordanian Arabic (JA) and English (NE), where the vowel is stressed (black dots) or unstressed (white dots).
The acoustic analysis shows that F0 differentiates stressed and unstressed syllables in the L2 English productions of the JA speaker, but not in those of the EA or NE speakers. Similarly, although intensity is somewhat higher in stressed syllables than unstressed syllables for all three speakers, including the EA speaker (for whom intensity is the strongest cue to stress on average), nevertheless it is only in the JA speaker’s production that this difference is significant. In contrast, neither vowel duration nor spectral tilt (H1.H2 or H1.A3) is used to differentiate stressed and unstressed syllables by any of the speakers. Finally, both F1 and F2 differentiate stressed and unstressed syllables to a significant extent in the NE speaker’s productions, but not in the productions of the EA speaker and JA speaker.
The differences among the three speakers in the observed cues to stress in the experimental stimuli match the generalizations reported for the full set of speakers who participated in the study from which the stimuli were extracted (Almbark
Participants were recruited by email invitation among the friends and family of graduate students of linguistics from Egypt and Jordan, and among students at the University of York. A total of 42 listeners meeting our inclusion criteria (by native language/dialect, excluding early bilinguals) completed the online perception experiment on a voluntary basis. From these a balanced subset of 36 was selected at random to yield three listener groups by native language: EA, JA or English (NE), with six male and six female listeners in each group. The Arabic-speaking listeners had all studied English for at least 12 years; six had English medium schooling (two EA, four JA); one JA listener was in the UK at the time of taking test.
The experiment was run using an online survey tool (SurveyGizmo,
Participants were familiarized with the test paradigm in a training phase; a selection of English stimuli were presented, which differed in stress position as in the main test, using the target word ‘project’ [ˈpɹɒdʒɛkt] ~ [pɹəˈdʒɛkt]. Participants were asked to answer the following question for each word they heard: “Was it PROject (first syllable) or proJECT (second syllable)?”. Feedback was given as to whether the provided answer was correct or incorrect.
After the training phase, in the first test phase the 36 sound files produced by the two L2 English speakers (9 target words × 2 stress conditions × 2 speakers = 36) were presented in randomized order. Each sound file was shown on a separate page with the question “Is it ___ (first syllable) or ___ (second syllable)?” and two answers (e.g., “SUBject with stress on the first syllable” or “subJECT with stress on the second syllable”) to choose from, in a binary forced choice. Then, in the second test phase, the 18 sound files produced by the L1 English speaker were presented, following the same procedure as for the first test phase.
We presented all L2 speech in one block, then all L1 speech in a separate block, to restrict the listeners’ task to word recognition. Randomisation of tokens extracted from L1 and L2 speech in one block might have drawn listeners’ attention to evaluation of the degree of foreign accent rather than the intelligibility (i.e., recognition) of the utterances as intended.
Each response was coded for
Median (bold vertical line within bars) and interquartile range (bar size) of the count of accurate responses for each individual participant, grouped by listener language, stimulus language, and elicited position of stress. EA = Egyptian Arabic; JA = Jordanian Arabic.
However, any variation across listener groups is clearly mediated by variation within listener groups that reflects the elicited position of stress in the word: EA listeners are less accurate at identifying words produced by the English speaker with initial stress; English listeners, in turn, are less accurate at identifying words produced by the Egyptian speaker with final stress. In contrast, accuracy rates of JA listeners show largely overlapping distributions by both position of stress and by stimulus language.
These effects were explored in a series of GLMM models; the best fit model includes fixed factors for stress condition (
Summary of the best fit GLMM [accuracy ~ listlang * stimlang * stress + (1 | item) + (1 | participant)].
Fixed effects | Estimate (log odds) | |||
---|---|---|---|---|
intercept | 1.88724 | 0.34165 | 5.524 | < .000 |
stimlangJA | -0.07544 | 0.38553 | -0.196 | 0.8448 |
stimlangNE | -1.30793 | 0.34590 | -3.781 | 0.0001 |
listlangJA | -0.79566 | 0.41624 | -1.912 | 0.0559 |
listlangNE | 1.57067 | 0.61761 | 2.543 | 0.0109* |
stresssecond | -0.84735 | 0.39831 | -2.127 | 0.0333 |
stimlangJA:listlangJA | -0.51935 | 0.49107 | -1.058 | 0.2902 |
stimlangNE:listlangJA | 0.58733 | 0.45942 | 1.278 | 0.2011 |
stimlangJA:listlangNE | -1.03615 | 0.71293 | -1.453 | 0.1461 |
stimlangNE:listlangNE | 1.30792 | 0.79567 | 1.644 | 0.1002 |
stimlangJA:stresssecond | -0.07017 | 0.49480 | -0.142 | 0.8872 |
stimlangNE:stresssecond | 1.57107 | 0.47366 | 3.317 | 0.0009*** |
listlangJA:stresssecond | 0.25170 | 0.46696 | 0.539 | 0.5898 |
listlangNE:stresssecond | -1.40917 | 0.65985 | -2.136 | 0.0327* |
stimlangJA:listlangJA:stresssecond | 1.01822 | 0.65254 | 1.560 | 0.1186 |
stimlangNE:listlangJA:stresssecond | -0.85047 | 0.63254 | -1.345 | 0.1787 |
stimlangJA:listlangNE:stresssecond | 1.59293 | 0.84933 | 1.876 | 0.0607 |
stimlangNE:listlangNE:stresssecond | -1.28632 | 0.92208 | -1.395 | 0.1630 |
For the dependent variable the reference level in all models was ‘incorrect’; the models thus predict the log odds of improved accuracy resulting from a change in
Predicted marginal means (and 95 % CI) for the best fit binomial GLMM by listener language, stimulus language (
The best fit model shows no significant three-way interactions and no main effect of stimulus language or stress position. There were no significant interactions between listener language and stimulus language; it is this type of interaction that would indicate an interlanguage intelligibility benefit.
There is a main effect of listener language: English listeners are much more accurate than JA listeners (
There is a significant interaction between stress and listener language: English listeners were less accurate at identifying words with final stress across the board, regardless of stimulus language (
Our specific research question was to explore a possible interlanguage intelligibility benefit in perception of English word stress; that is, to test the hypothesis that L2 listeners will more accurately interpret English word stress when produced by other L2 speakers. We found no evidence to support this hypothesis in this study, as there are no significant interactions between any levels of listener language and stimulus language in our data. This also rules out the type of interlanguage intelligibility benefit found by Stibbard and Lee (
Our results replicate the finding of Stibbard and Lee (
The study shows two significant interactions of listener/speaker language with the position of stress. The first of these is that NE listeners displayed lower accuracy in response to words with final stress, regardless of speaker language. The expected NE realization in these words has vowel reduction in the first syllable, to schwa [ə] in 7 out of 9 of our stimuli, and to [ɪ] in the other two cases (see
The aim of this paper was to explore a possible interlanguage intelligibility benefit for L2 listeners in perception of stress, due to potential transfer of L1 patterns of phonetic realization of stress into L2 productions. The results did not show any interlanguage intelligibility benefit but did confirm the previous finding of an overall advantage for L1 English listeners in lexical recognition tasks, which we attribute to the L1 listeners’ ability to make use of top-down lexical knowledge in perception of stress. This strategy supports accurate recognition in the face of non-target-like phonetic cues to stress encountered in L2 English productions, but we show that non-target-like cues can result in reduced intelligibility of L2 speakers when the primary cue expected by L1 listeners (here, vowel reduction) is the same cue that the L2 speaker fails to produce to a target-like extent.