1. INTRODUCTIONTop

1.1. The identity of stress and accent

The terms stress and accent have often been confoundedly used, referring to the same thing some times and to different things other times. In this study I take the position that they are different concepts, and look into their—also different—respective cues. The term stress is used to denote word-level prominence, i.e., metrical prominence assigned to a specific syllable in a certain word. This type of prominence is generally referred to as lexical stress. Definitions of lexical stress yet vary and their differentiations usually rely on the correlates assigned to it. Lehiste (1970) considers lexical stress an abstract property of the word that operates only when the word is placed in a context. Sluijter & Van Heuven (1996) abide by this definition of lexical stress as the non-figurative part of a word’s identity that has no phonetic entity of its own and define it as “a structural linguistic property of a word that specifies which syllable in the word is the strongest” (p. 2471). Stress for them is an abstract property of a word that serves as an index to the syllable within the word that has a potential to receive a pitch accent. Therefore, it is an abstract linguistic representation whose phonetic properties come in the foreground only when an accent is realized with it. Lexically stressed syllables are considered to always have an accent lending pitch movement associated with them when they occur within a single word produced in narrow focus (Sluijter, 1995). Stress, thus, is decided by language system rather than language behavior.

The term accent, however, is used to denote this phrasal-level prominence where one or more words within an utterance are chosen and assigned intonational prominence in addition to metrical prominence. This kind of prominence is a property of the utterance that is present in most languages of the world as it provides a means for the speaker to express communicational intentions and to organize speech. For various communicative purposes, one can increase one’s pitch range and project one’s voice to highlight new or important information in a particular sentence or phrase. This highlighted piece of information is said to receive a phrasal stress or an accent. However, each language following its own phonological system may vary as to the cues used to signal phrasal stress.

Phonetic correlates of phrasal stress other than f₀ include duration, intensity, spectral balance, and vowel quality. Duration seems to be the second most important perceptual cue to phrasal stress after a change in f₀. Phrasal stress has been reported to be responsible for changing the temporal pattern of words resulting in the lengthening of a constituent within the word, whether this is the stressed vowel, syllable or a larger domain (Beckman, Edwards & Fletcher, 1992; Turk & Sawusch, 1997; Turk & White, 1999). In addition, phrasal stress is correlated with an increase in intensity of the speech signal as well as a more smooth spectral tilt that could be explained by the increased pulmonary activity and vocalic effort made to produce phrasal stress. Intensity was considered to play an important role as a cue to phrasal stress. However, the cue of spectral tilt was suggested as a competitor to duration in signaling lexical and phrasal stress diminishing, thus, the importance of intensity, while vowel quality was considered an overall weaker cue to phrasal stress (Sluijter, 1995; Sluijter, Shattuck-Hufnagel, Stevens, & Van Heuven, 1995).

Summarizing the identity of phrasal stress and the way it is realized, we can plainly state that when a speaker produces a word or a phrase that includes new or important information, s/he may anticipate the additional effort the listener needs to incorporate it into memory. S/he realizes this word or phrase as being communicatively important by placing the word under focus (Baart, 1987; Ladd, 1980). This focused constituent is realized by placing a pitch accent on the prosodic head of the word or phrase. The prosodic head within the word is generally the lexically stressed syllable. In addition, the speaker may produce a slower speech rate and pronounce the focused word more carefully. This helps the listener to decide where s/he has to pay special attention. The acoustic correlates associated with such kind of prominence are expected to be more or less different from those of unfocused constituents.

1.2. Stress and accent in Arabic

Lexical stress placement in Arabic varies from standard to dialect and from one dialect to another but is in all cases fixed stress conditioned by syllabic weight. De Jong and Zawaydeh (1999) examined vowel durations, qualities, and f₀ patterns of speakers of Arabic from Amman, Jordan. Lexical stress is reported to consistently affect duration and the F1 of low vowels. In fact, stress was found to interact with quantity in such a way to expand the already clear distinction between short and long vowels in Ammani Arabic. Surprisingly, there was no main effect of lexical focus on duration since it did not induce any lengthening, nor did it increase durational differences due to quantity. These findings confirm previous research on Arabic (Rajouani, Najim, Chiadmi, & Zyoute, 1987; Braham, 1997) which also found that duration is a very weak cue to stress in Modern Standard Arabic (MSA). Rajouani et al. (1987) studied the effects of intensity, pitch and duration on the perception of word stress in MSA produced by Moroccan speakers and found that the most important cue is pitch followed by intensity, and that the least important cue is duration. Braham (1997) reports no effects of stress on the duration of short vowels in MSA produced by Tunisian speakers. Duration is ranked last in this study, while the most important cue to stress was f₀. Other research on the factors that affect syllabic structure in MSA (Ben Slama, 2002) also showed that there was no effect of stress on the duration of vowels in both normal and rapid speech. Braham (1997) explained the weakness of duration as correlate of stress in MSA by stating that Arabic is a language with a phonemic length contrast which has to be maintained even under stress so that no confusion is made between intrinsically long vowels and those lengthened by stress. However, in none of these studies a distinction between lexical and phrasal stress was made, which leads to confound their correlates.

This study tries to find out whether duration, vowel quality and spectral balance are correlates of stress and accent independently in two typologically different languages: Southern British English (SBE) and Tunisian Arabic (TA). The reasons are, first, to allow for a cross-linguistic comparison between the cues used to signal stress and accent in typologically distant languages, and, second, to deeply understand the phonic features of English since it is currently being taught as an important foreign language in Tunisia. The similarities or differences in the use of stress and accent are expected to affect the production of English word stress and accent by speakers of Tunisian Arabic and would hopefully provide more insight into the production and perception of L2 English prosodic features.

2. METHODOLOGYTop

2.1. Experiment 1: Measuring the acoustic correlates of stress and accent in SBE

2.1.1. Test material

The test material in this experiment comprised two sets of 12 disyllabic English minimal pairs of the kind ˈpermit ~ perˈmit which differ only in their stress pattern. The use of minimal pairs allows investigating the factor stress while other factors are kept constant. Thus, acoustic measures made on the first syllable of ˈpermit (noun) are compared to the same measures made on the first syllable of perˈmit (verb). The word pairs were placed in focused and non-focused conditions in order to assess the impact that focus may have over stress and the interaction between them. The frame sentences where the target words are placed were the same. They were designed to naturally elicit from the speakers the desired prosodic contour, and in order to aid in segmentation. Two contexts of the same carrier sentence were contrasted, one which directed phrasal stress on the target word, and one which deviated phrasal stress from the target word. The way to signal which word was prominent each time was to write it in bold capital letters. The position of the target word in the two focus conditions was non-final to eliminate any effects of constituent final lengthening. The content of the frame sentences was such that it depicted a semantic relationship between the target word in the experimental sentence and a foil word in the sentence before. This was meant to aid the speakers in giving prominence to the desired word for the purpose of the experiment. Besides, the subjects had the category of the word marked on the cards they read to know where to place lexical stress. Examples are provided in (1).

(1) [+Focus] condition: Lexical stress + phrasal stress
Say LICENCE again.
Say PERMIT again.

Since this is a command and since there is a semantic relationship between the two words in bold, the target syllable in permit for example will be both lexically and phrasally stressed. It is focused because it is contrasted to licence in the sentence before and the pitch accent therefore falls on it. The example in (2), however, illustrates the condition where lexical stress is measured independently of focus.

(2) [−Focus] condition: Lexical stress only (phrasal stress is placed on the word in bold)
A permit is another word for licence.
WRITE permit again.
SAY permit again.

Here a pre-cursor sentence is used to suggest the likely location of new or unpredictable information on the assumption that unpredictable words are likely to bear phrasal stress. In this example, phrasal stress is placed on write and say as opposed to each other and as being the new information in this case. Focus is therefore deviated from the target word permit, hence the target syllable per will be analyzed as merely lexically stressed with no focus being placed on it at the same time. The terms [+Focus], [−Focus] and their abbreviations [+F] and [−F] are used to indicate the focus condition, while the abbreviations [+S] and [−S] refer to stressed and unstressed targets (vowels or syllables). In order to ensure that the findings can be generalized, twelve minimal pairs were used as test words in each experimental condition. The subjects were instructed to produce the desired lexical and phrasal stress in the right focus conditions.

2.1.2. Subjects

The subjects for this study were six speakers (5 males and 1 female) of British English with a southern accent and with no known hearing or speaking disorders. They were between the age of 26 and 55 and voluntarily accepted to be recorded.

2.1.3. Recording procedure

Recordings took place in one of the recording studios in the Department of Theoretical and Applied Linguistics, University of Edinburgh. They were made in a soundproof room using an AKG hypercardiod microphone and downsampled to 16 kHz mono. In the recording session, speakers initially read five sentences for practice randomly selected from the experimental sentences and other sentences. They were presented to the subjects in two blocks of 24 each, one block of nouns (initial stress) and one block of verbs (final stress) composed of the 24 experimental sentences and 24 other filler sentences placed in a random order.

Subjects were instructed to read the sentences with a normal speech rate, not to pause between words and to focus the words written in bold capital letters. They read the cards presented to them three times. After each repetition, cards were shuffled to get randomized again. The subjects were asked to repeat sentences that were incorrectly uttered.

2.1.4. Data collected

The data comprised 24 test words in two focus conditions, [+F] and [−F], pronounced three times by 6 native speakers of English. The result was 24 (words) * 2 (focus conditions) * 6 (speakers) * 3 (repetitions) = 864 words. The wrong utterances omitted were 53, giving 811 utterances to segment and analyze.

2.2. Experiment 2: Measuring the acoustic correlates of stress and accent in Tunisian Arabic

2.2.1. Test material

Two sets of 10 disyllabic Tunisian Arabic near minimal pairs of the kind: /ˈbeddel/ ~ /bedˈdelt/ were used. The choice of these near minimal pairs is meant to allow investigating the effect of stress in the same consonantal environment. Thus, the acoustic measurements made on the first syllable of /ˈbeddel/ ‘he changed’ are compared with the same measures made on the first syllable of /bedˈdelt/ ‘I changed’. The word pairs were also placed in a focused and a non-focused condition in order to assess the impact that focus may have over stress and the interaction between them. The frame sentences in which the target words were placed were the same, designed to naturally elicit from the speakers the desired prosodic contour and aid in segmentation. Two contexts of the same carrier sentence were also contrasted, one which directed phrasal stress on the target word and one which deviated phrasal stress from it. The way to signal which word was prominent each time was writing it in bold bigger letters in Arabic script. The position of the target word in the two focus conditions was again non-final for the same purposes mentioned in experiment 1. The content of the frame sentences was such that it depicted a semantic relationship between the target word in the experimental sentence and a filler word in the previous sentence. When no semantic relationship between the words could be found, a word that rimes with the test word was chosen instead. This was meant to aid the speakers in giving prominence to the desired word. Examples are provided in (3).

(3) [+Focus] condition: Lexical stress + phrasal stress
/ԛul ˈχɑmmɑm mɑrtin/
/ԛul ˈfɑkkɑr mɑrtin/
Translation:
‘Say consider twice’
‘Say think twice’

Since this is a command and since there is a semantic relationship between the two words in bold (synonyms), the target syllable (underlined) in the target word (/ˈfɑkkɑr/ in this example) will be both lexically and phrasally stressed. It is focused because it is related to /ˈχɑmmɑm/ in the sentence before and the pitch accent, therefore, falls on it. The example below, however, illustrates the condition where lexical stress is measured independently of focus in experiment 2.

(4) [−Focus] condition: Lexical stress only (phrasal stress is placed on the word in bold)
/ˈfɑkkɑr kɪlmә sәhlә/
/ԛul ˈfɑkkɑr mɑrtin/
/ʢɑwɪd ˈfɑkkɑr mɑrtin/
Translation:
‘Think is an easy word’
‘SAY think twice’
‘REPEAT think twice’

A precursor sentence is used to suggest the likely location of new or unpredictable information. In this example, phrasal stress is placed on /ԛul/ and /ˈʢɑwɪd/ as opposed to each other and as being the new information in this context. Focus is, therefore, deviated from the target word /ˈfɑkkɑr/, and the target syllable /fɑk/ will be analyzed as being lexically stressed only, with no focus, hence, no pitch accent placed on it. Ten near minimal pairs were used in each experimental condition.

2.2.2. Subjects

Six Tunisian students were recorded. They were all native speakers of Tunisian Arabic with no hearing or speaking disorders. They were three females and three males between the age of 22 and 26 majoring in English.

2.2.3. Recording procedure

Recordings took place in a soundproof room using a professional microphone. They were recorded directly onto a computer at a frequency response of 44.1 kHz and then downsampled to 16 kHz mono. The subjects had two training sessions with the author to get familiar with the test words. Parallel procedures were followed in that in the recording session the subjects initially read five sentences for practice selected randomly from the experimental sentences and other sentences. They were presented to the subjects in two blocks of 20 each composed of the 20 experimental sentences and 20 other foil sentences put in a random order (one block for the items that receive an initial stress, and one block for the items that receive a final stress). Subjects were instructed to emphasize the words written in bold, not to pause between words, and to keep a normal pace in their speech. The subjects read the cards presented to them three times for each block of cards after being shuffled each time.

2.2.4. Data collected

The data comprised 20 (words) * 2 (focus conditions) * 6 (speakers) * 3 (repetitions) = 720 words to be segmented and analyzed.

2.3. Segmentation criteria

The segmentation criteria were based on the spectral characteristics most identifiable in spectrograms with the aid of waveform displays that are helpful in showing dips and rises in amplitude often corresponding to onsets of constrictions and their releases. The corpus was comprised of various classes of sounds including oral stops, fricatives, and nasal stops. As onsets and releases of oral consonantal constrictions producing stops, fricatives and affricates often coincide with abrupt spectral changes (Turk, Nakai & Sugahara, 2006) acoustic segment durations were determined by the intervals that these oral consonantal events define. The duration of a vowel is defined as the duration of the interval between a C1 constriction release landmark and a following C2 constriction onset landmark in a C1VC2 sequence. This interval is, however, not totally vocalic as it includes formant transitions and noise burst that cue the identity of surrounding consonants in addition to any aspiration from preceding voiceless aspirated stops. For geminate consonants in the Arabic data, a temporal midpoint was hypothesized to be taken as the end of one portion of the sound and the beginning of its next portion.

2.4. Data Analysis

Records of segmentation decisions were kept for each audio file via the use of associated label files. Special Praat scripts were then used for automatic measurements of the dependent variables, mainly duration, F1, F2 and spectral balance. Measurements of different variables were extracted through the aid of these scripts in a format that can be used in spreadsheets. Measures of f₀ at the midpoint of the target vowel and of intensity at the peak of the target vowel were obtained by hand, but their results are not reported here because they need further refinement. Statistical analysis of the values obtained was performed through SPSS, version 15.

3. RESULTSTop

3.1. Experiment 1: The acoustic correlates of stress and accent in SBE

3.1.1. Duration

Duration measurements taken from the disyllabic test words are obtained for both initial and final stressed and unstressed syllables in the two focus conditions ([+F] and [−F]). The aim from this is to examine the effects that stress, focus and syllable position (initial vs. final) have on duration in SBE, as well as the interaction between these factors.

The mean length differences between initial stressed and unstressed syllables in the [+F] context show that, when a pitch accent is realized on the word, the duration of stressed syllables in SBE exceeds that of unstressed syllables by almost 37 %. However, when no pitch accent is realized on the word, that is, in the [−F] context, a mean length difference of almost 50 ms is found between initial stressed and unstressed English syllables, representing about 23 % length difference. To check the significance of the values obtained, the duration values of the initial syllable were subjected to a two-way repeated measure ANOVA with stress [+S, −S] and focus [+F, −F] as fixed effects, repetition as a repeated measure, and subject as a random factor. Results show that factor stress was significant (F (1, 5) = 64.3; p < .001), focus was also significant (F (1, 5) = 96.04; p < .001), while the interaction between stress and focus was non-significant. This shows that duration is a correlate of lexical stress in SBE, that is, when the target syllable is not placed under focus, and that duration is a correlate of accent (phrasal stress) as well, since focus is found to have a significant effect on duration in the [+F] condition, namely, when the target syllable is placed under focus and receives the pitch accent.

The effect of phrasal stress on the duration of both stressed and unstressed syllables in both initial and final positions was then examined. The values obtained show a mean length difference of about 26 ms between initial stressed English syllables in the [+F] and [−F] contexts. Focus seems to augment the duration of the initial stressed syllable. To explore the interaction between syllable position and focus effects, the final stressed syllable of the disyllabic test words was also measured and the way it is affected by focus was assessed. The duration values obtained show that all the six speakers produce longer duration when they place the word under focus. A mean length difference of almost 45 ms is found between the two focus conditions, representing 13 % of lengthening. Focus, apparently, has in SBE a significant lengthening effect on the duration of final stressed syllables as well.

In order to check the statistical significance of the focus effect on duration of stressed syllables and to assess any possible interaction between focus and syllable position for stressed syllables in English, a two-way ANOVA is used with focus and syllable position as fixed effects, repetition as a repeated measure, and with subject as a random factor. As expected from the mean values obtained, the factor focus is found to be highly significant (F (1, 5) = 61.78; p < .001), syllable position is also highly significant (F (1, 5) = 56.18; p < .001) but the interaction between them is non-significant, which shows that focus increases the duration of the stressed syllable in SBE disyllabic words whether in initial or final position.

The effect of phrasal stress on the duration of unstressed syllables, both initial and final, was then examined to find out whether focus lengthens the duration of unstressed syllables in SBE. Comparing the mean duration of initial unstressed syllables [+F] and [−F], a small length difference of about 10 ms is found. For unstressed final syllables [+F] and [−F], the values show that all speakers produced longer duration when a pitch accent is used. The length difference between the focus contexts is of about 26 ms. Focus seems to lengthen the duration of the final unstressed syllable in English. Since previous literature on English (Turk & White, 1997; Turk & Sawusch, 1999) leads us to expect an interaction between focus-related lengthening and syllable position for unstressed syllables, a two-way ANOVA with focus and syllable position as fixed effects, repetition as a repeated measure and subject as a random factor is used for unstressed syllables. Results revealed that syllable position is highly significant (F (1, 5) = 88.05; p < .001). The factor focus is also significant (F (1, 5) = 14.47; p < .001). However, contrary to stressed syllables, a significant interaction is found between syllable position and focus for unstressed syllables (F (1, 5) = 10.72; p < .05). Focus seems to lengthen the duration of final unstressed syllables only in SBE. Figure 1 illustrates these results.

Figure 1. Effects of focus on duration (in ms) of stressed and unstressed syllables in initial and final positions by six speakers of SBE.

3.1.2. Spectral Balance Results for SBE

Results for two measures of spectral balance H1−A3 and H1−A2 are presented for SBE. H1 and H2 refer to the amplitudes of the first and second harmonics, respectively, while A2 and A3 refer to the amplitudes of F2 (second formant) and F3 (third formant), respectively. Unstressed vowels are expected to have less energy at higher frequencies, hence higher values for H1−A2 and H1−A3. These two measures were investigated for both lexical stress positions (stressed and unstressed) and in both focus conditions for four vowels produced by 5 male native English speakers. The vowels from which the tilt measures are taken are all the monophthongs occurring in the first syllable of the minimal pairs test words. They include /ɒ/ in pairs like ˈcontract ~ conˈtract, /e/ in ˈrecord ~ reˈcord, /ɜ/ in ˈpermit ~ perˈmit, and /ʌ/ in ˈsubject ~ subˈject. The /ɒ/ in the pair ˈobject ~ obˈject was discarded from the vowel analysis because it occurs word-initially and was often glottalized by most of the informants and glottalization is known to independently affect spectral balance.

The effect of stress is first investigated on the H1−A3 measure of spectral balance in the presence of focus, i.e. in the [+F] condition, on the target word. Table 1 presents the mean values and standard deviations for this tilt measure of glottal closure, and skewness of the glottal pulse of stressed and unstressed four English vowels in the [+F] context.

Table 1: St deviation and mean values of H1−A3 for four English vowels by five male speakers in the [+F] and the [−F] contexts.

Vowels	Mean H1−A3 (in dB)			Mean H1−A3 (in dB)
Vowels	[−S, +F]	[+S, +F]	St deviation	[−S, −F]	[+S,−F]	St Deviation
/ɜ/	16.50	8.50	2.33	26.15	20.30	.66
/e/	25.10	17.95	1.89	29.85	23.90	1.45
/ɒ/	26.20	15.20	2.86	28.90	21.15	2.06
/ʌ/	26.40	17.20	1.93	30.30	20.9	2.33

As shown in Table 1, unstressed vowels had higher H1−A3 values indicating a more high frequency emphasis for stressed vowels. In the [−F] context too, unstressed vowels have higher H1−A3 values. To check the significance of these values, a three-way fixed effect ANOVA is used with vowel type, stress, and focus as fixed factors, repetition as a repeated measure, and speaker as random factor. The factor vowel type is found to be non-significant (F (3, 288) = .32; p > .05). This shows that H1−A3 is not significantly different for these vowels. The effect of stress on this tilt measure is, however, found to be highly significant (F (1, 288) = 54.42; p < .001).

Focus is also found to be highly significant (F (1, 288) = 74.33; p < .001). No significant interaction is found between vowel type and stress or between vowel type and focus. A tendency towards significance is found for the interaction between stress and focus (F (1, 288) = 5.33; p > .05 (p = .069)). The three-way interaction between vowel type, stress, and focus is non-significant.

The effect of stress was then investigated on the H1−A2 measure of spectral balance in the presence of focus on the target word (see Table 2).

Table 2: St deviation and mean values of H1−A2 for four English vowels by five male speakers in the [+F] and [−F] contexts.

Vowels	Mean H1−A2 + 10 (in dB) [1]			Mean H1−A2 + 10 (in dB)
Vowels	[−S, +F]	[+S, +F]	St Deviation	[−S, −F]	[+S, −F]	St Deviation
/ɜ/	15.33	8.35	2.94	18.50	12.80	2.75
/e/	18.24	7.95	3.61	18.35	12.10	2.15
/ɒ/	18.98	9.45	3.05	21.70	14.85	2.25
/ʌ/	20.36	4.55	6.33	23.15	10.80	3.65

[1] Note that the H1-A2 values are offset by 10dB to adjust the negative values.

Table 2 presents mean values and standard deviations for H1−A2 as a measure of glottal closure and skewness of the glottal pulse for 4 stressed and unstressed English vowels in the [+F] context. It is clear from the table that unstressed vowels have higher H1−A2 values, indicating a more high frequency emphasis for stressed vowels. Unstressed vowels in the [−F] context also had H1−A2 values almost twice as high as the values of stressed vowels especially for /ʌ/.

Just like for the H1−A3 measure, a three-way analysis of variance is used to check the significance of the results and see whether there is any significant interaction between the factors vowel type, stress, and focus. The effect of vowel type is found to be non-significant, which shows that H1−A2 is not significantly different for these different vowels. This is probably because the vowels chosen are not very distant in the vocalic space. The effect of stress is highly significant (F (1, 288) = 164.57; p < .001). Results showed a significant effect of focus, too (F (1, 288) = 65.57; p < .001). No significant interaction is found between vowel type and stress or between vowel type and focus. A very significant interaction is however found between stress and focus (F (1, 288) = 13.40; p < .001). The significant interaction found between stress and focus in the H1−A2 measure shows that the magnitude of the stress effect depends on the focus value as there is a greater effect of stress on focused constituents. It also shows that the effect of focus depends on the stress value. It is more pronounced for stressed vowels. The three-way interaction between stress, focus and vowel type is, however, non-significant.

Summarizing the results of experiment 1 on spectral balance as a correlate of stress and/or accent in SBE, we find that both measures, H1−A2 and H1−A3, are reliable measures of spectral tilt. In both focus conditions, unstressed syllables have higher H1−A2 and H1−A3 values indicating higher frequency emphasis for stressed vowels. The factors stress and focus are both found to have significant effects on these measures of spectral balance, while vowel type effect is non-significant. Spectral balance can accordingly be considered a correlate of both lexical and phrasal stress in SBE.

3.1.3. Vowel quality results for SBE

The data analyzed are those of the five male speakers only. The female speaker was discarded from the analysis because sex is known to affect the formant values of vowels and one female speaker only does not make it possible to include the variable sex in the analysis. The formant values (F1 and F2) are calculated at a midpoint of the vowel segment specified in a text grid. The procedure is repeated automatically through a script, but values were also manually checked.

The vowels measured are those existing in the disyllabic minimal pair test words except the diphthong in digest to allow for a more homogenous comparison between the rest of the monophthong vowels, /ɒ/ /e/, /ɜ/ and /ʌ/.

Formant values F1 and F2 are measured in stressed and unstressed position in both focus conditions. The values obtained are compared and tested for significance through fixed-effect ANOVA tests. A three-way ANOVA was performed on each formant separately with vowel type, stress, and focus as fixed effects, repetition as a repeated measure and speaker as a random factor. The results for F1 show that the vowel type and focus have a non significant effect. On the contrary, the effect of stress on F1 is highly significant (F (1, 240) = 44.02; p < .001). No two-way interaction is at all found between vowel and stress, vowel and focus, or between stress and focus. The three-way interaction between stress, vowel, and focus is also non-significant. The F1 of these British English vowels seems, thus, to be affected by lexical stress only, and not by focus accent.

The results for F2 show that the vowel type effect is highly significant (F (3, 240) = 32.34; p < .001). Focus had no significant effect on F2. Stress also had no significant effect. The effect of the factor stress can be, however, seen in the significant interaction existing between vowel type and stress (F (3, 240) = 14.32; p < .001), which indicates that the magnitude of the stress effect on F2 depends on vowel type. The interaction between vowel and focus is non-significant. The interaction between stress and focus is non-significant, too. The three-way interaction between stress vowel and focus is also non-significant. The results of experiment 1 on vowel quality as a correlate of stress and accent in SBE show that F1 in these British English vowels is affected by stress only. Focus does not affect it at all. Focus does not affect F2, either. Stress, nevertheless, seems to have an effect on F2 as shown in the significant interaction existing between vowel type and stress. This interaction shows that the stress effect on F2 depends on the type of the vowel in SBE. Figures 2 and 3 show the results of the variables stress, focus and vowel on each formant separately.

Figure 2. Effects of vowel type, stress, and focus on F1 (in Hz) in four English vowels by five male speakers.

Figure 3. The effects of vowel type, stress, and focus on F2 (in Hz) in four English vowels.

3.2. Experiment 2: The acoustic correlates of stress and accent in TA

3.2.1. Duration

Duration measurements taken from the disyllabic TA test words were obtained for initial stressed and unstressed syllables as well as final stressed and unstressed syllables in the two focus conditions ([+F] and [−F]). This is again intended to examine the effects that stress, focus, and syllable position (initial vs. final) have on duration in TA as well as the possible interaction between these factors. It would also allow comparison between English and Tunisian Arabic.

The effects of lexical and phrasal stress on the duration of the initial syllable in TA were measured and the results show that when focus is placed on the target word, initial stressed syllables are on average 26 ms longer than unstressed syllables (about 22 % difference). However, this does not seem to be true for all the subjects in the experiment as one of the subjects produced longer unstressed syllables in two of his repetitions. Nevertheless, when no focus is produced on the target word, no noticeable difference between the length of stressed and unstressed syllables in TA is observed at all. Table 3 shows that two speakers (S2 and S4) produced longer unstressed syllables in the [−F] context.

Table 3: The mean duration (in ms) of stressed and unstressed TA syllables in the [−F] context.

Speakers	Initial stressed syllable [−F]	Initial unstressed syllable [−F]
S1	162	149
S2	192	195
S3	172	164
S4	182	185
S5	168	150
S6	174	172
Grand mean	174	168

To verify the significance of the values obtained, the duration values of the initial syllable were tested through a two-way repeated measure ANOVA with stress and focus as fixed effects, repetition as a repeated measure, and with subject as a random factor. Results show that the factor stress was non- significant (F (1, 5) = .06; p > .05), the factor focus was significant (F (1, 5) = 29.6; p < .001) and the interaction between stress and focus was also significant (F (1, 5) = 56.3; p < .001). It may, therefore, be possible to say that in the absence of focus on the target word, duration is not a correlate of stress in TA. The significant length differences found in the [+F] context would rather be associated to the pitch accent with which the target word is realized and not to lexical stress. The significant interaction between stress and focus shows that duration is used to cue accent only when the target word is placed under focus in Tunisian Arabic. It is therefore possible to say it is a correlate of accent only and not of lexical stress.

Measurements made on initial stressed syllables [+F] are compared to those same syllables [−F]. The results show that initial stressed syllables in the [+F] context are much longer than in the [−F] context. There is a length difference of almost 80 ms, i.e. 27 %. Although the final syllable is not the target for examining duration in different stress and focus conditions, the whole duration of this syllable is measured to assess the effect of phrasal stress on the final syllable and to see whether there is any lengthening that is due to final position or to word boundary effects. The values obtained show a length difference of about 123 ms (34 %) between the two focus conditions. To check whether there is any interaction between syllable position (initial or final) and focus condition ([+F] and [−F]) for stressed syllables in TA, a two-way ANOVA is used. In this two-way ANOVA, focus and syllable position are used as fixed effects, speaker as a random factor, and repetition as a repeated measure. The results show that syllable position has a significant main effect on the duration of stressed syllables in TA (F (1, 5) = 89.1; p < .001). Focus is also found to be highly significant (F (1, 5) = 21.86 and p < .001). The two-way interaction between focus and syllable position is found to be significant (F (1, 5) = 7.94; p < .05), showing that the effect of focus on stressed syllables depends on their position. Final stressed syllables are more affected by focus (34 %) than initial stressed syllables.

The effect of focus on the duration of initial and final unstressed syllables was also assessed and results revealed that initial unstressed syllables [+F] were constantly longer than in the [−F] context for all the speakers. The average length difference between initial unstressed syllables [+F] and [−F] was of about 50 ms, representing 34 % of lengthening. Focus seems to create an extra length on unstressed syllables in TA as it does in English. To check whether focus has an effect on the duration of the final syllable even when it is lexically unstressed, the length differences found between the two focus conditions were compared and were found to exhibit about 42 ms on average, that is, 21 % length difference.

In order to check the significance of this length difference and to see whether there is any interaction between syllable position (initial or final) and focus condition (+F, −F) for these unstressed syllables in TA, a two-way ANOVA is used, where focus and syllable position are used as fixed effects, speaker as a random factor, and repetition as a repeated measure. Results revealed that position had no significant effect on the duration of unstressed syllables in TA (F (1, 5) = 2.26; p > .05). Focus is, nonetheless, found to be highly significant (F (1, 5) = 59.76; p < .001) and the two-way interaction between focus and syllable position is non-significant, too, (F (1, 5) = 5.36; p > .05). This means that the focus effect on unstressed syllables does not depend on their position in the word. While a significant interaction is found between syllable position and focus for unstressed syllables in SBE, no such interaction is found in TA. It is worthwhile reminding that a positive interaction is found between syllable position and focus for TA stressed syllables (see Figure 4).

Figure 4. The effect of focus on stressed and unstressed syllables in initial and final positions in TA.

3.2.2. Spectral Balance in TA

The vowels measured for spectral balance are four vowels existing in closed syllables in the Tunisian near minimal pairs used in the study and are produced by the six Tunisian speakers participating in the experiment. The vowels nuclei were measured in both stress and focus conditions and they include /ɑ/ like in ˈfɑkkɑr/fɑkˈkɑrt, /e/ like in ˈbeddel/bed ˈdelt, /ɪ/ as in ˈkɪsbɪt/kɪsˈbuh, and /ʊ/ like in ˈmʊχʈɪr/mʊχˈʈɑr.

The effect of stress on H1−A3 in the [+F] and [−F] contexts were examined first.

Mean values were obtained for this measure of glottal closure and skewness of the glottal pulse for four stressed and unstressed TA vowels in the [+F] context. Unstressed vowels in TA were found to have higher H1−A3 values, indicating a greater emphasis in high frequencies for stressed vowels. A mean difference of about 9 dB was observed between the H1−A3 values of unstressed and stressed vowels in this [+F] condition. TA unstressed vowels in the [−F] context, too, seem to have higher H1−A3 values than stressed vowels (see Table 4).

Table 4: St deviation and mean values of H1−A3 for four TA vowels by six speakers in [+F] and [−F] contexts.

Vowels	Mean H1−A3(in dB)			Mean H1−A3 (in dB)
Vowels	[−S, +F]	[+S, +F]	St deviation	[−S, −F]	[+S, −F]	St deviation
/ɑ/	23.22	16.35	2.56	30.17	26.95	1.55
/e/	23.98	14.61	3.86	27.35	22.71	2.01
/i/	24.85	17.38	2.89	27.34	21.51	2.88
/u/	32.23	19.73	3.95	36.72	31.76	1.78

To check the significance of these results, a three-way analysis of variance is used with vowel type, stress, and focus as fixed effects. Unlike English, the effect of vowel type is found to be significant (F (3, 288) = 13.90; p < .001). The effect of stress is found to be highly significant (F (1, 288) = 35.87; p < .001). There is also a significant effect of focus on this tilt measure of stressed and unstressed vowels (F (1, 288) = 15.98; p < .001). As far as possible interactions are concerned, the results of this test show that no significant interaction is by any means found between the independent variables. Interactions between vowel and stress, vowel and focus, stress and focus and the three-way interactions between vowel, stress, and focus are all found to be non-significant.

The effect of stress on H1−A2 in the [+F] and [−F] contexts was then checked. Mean values were obtained for this tilt measure of four different stressed and unstressed TA vowels in the [+F] context first. Tunisian Arabic unstressed vowels had much higher H1−A2 values in this context. TA unstressed vowels in the [−F] context also have higher H1−A2 values than stressed vowels. In order to check the significance of these results, a three-way ANOVA is used with vowel type, stress, and focus as fixed effects, repetition as repeated measure, and speaker as random factor. Just like for the H1−A3 measure, the effect of vowel type is found to be significant (F (3, 288) = 6.67; p < .001). The effect of stress appears to be highly significant (F (1, 288) = 26.95; p < .001). There is as well a significant effect of focus on this tilt measure (F (1, 288) = 14.036; p < .001). Concerning interaction between the different independent variables, the results show that no significant interaction existed between vowel and stress or between vowel and focus. There is, nevertheless, a significant interaction between stress and focus (F (1, 288) = 8.32; p < .05). The three-way interaction between vowel, stress, and focus is non-significant. Just like in English, the significant interaction between stress and focus in this H1−A2 measure means that in the presence of focus, there is a much bigger effect of stress (see Table 5).

Table 5: St deviation and mean values of H1−A2 for four TA vowels by six speakers in [+F] and [−F] contexts.

Vowels	Mean H1−A2 (in dB)			Mean H1−A2 (in dB)
Vowels	[−S, +F]	[+S, +F]	St deviation	[−S, −F]	[+S, −F]	St deviation
/ɑ/	17.25	10.02	4.22	20.91	17.77	2.55
/e/	23.68	11.26	5.55	22.62	18.62	1.98
/ɪ/	28.05	16.93	6.79	30.24	26.20	2.33
/ʊ/	22.99	12.74	5.95	25.38	19.41	3.88

3.2.3. Vowel quality results for TA

The effects of stress and accent on vowel quality in TA were assessed taking the variable sex into account because of the equal number of subjects (3 males and 3 females). Again, the effects of stress and focus are checked through ANOVA tests. The four vowels measured are those making up the nucleus of closed syllables in the near minimal pairs used as test items in this experiment. These vowels are /ɑ/ like in ˈfɑkkɑr/fɑkˈkɑrt, /e/ like in ˈbeddel/bed ˈdelt, /ɪ/ like in ˈkɪsbɪt/kɪsˈbuh, and /ʊ/ like in ˈmʊχʈɪr/mʊχˈʈɑr.

The effects of lexical and phrasal stress on F1 in the [+F] and [−F] conditions were explored. The variations due to stress, focus, vowel type, sex of the speaker, or to the interaction between these factors are assessed through a four-way ANOVA with sex,vowel type, stress, and focus as fixed effects, repetition as repeated measure, and speaker as random factor. The results show that sex has no main effect despite an observed disparity in reduction seen between males and females, especially in the [−F] condition. Vowel type, stress and focus are all very significant with F (3, 288) = 28.70; p < .001 for vowel type, F (1, 288) = 63.58; p < .05 for the factor stress, and F (1, 288) = 139.01; p < .001 for the factor focus. No significant interaction is found either between stress and sex or between vowel and stress. A significant three-way interaction is, however, found between sex, stress and vowel type (see Figure 5).

Figure 5. Effects of sex, stress, and vowel type on F1 (in Hz) in TA vowels.

This significant three-way interaction between sex, stress, and vowel type indicates that the effect of stress on F1 depends on the nature of the vowel and the sex of the speaker. It can be concluded from these results that the first formant of Arabic vowels is highly affected by focus, stress, and by vowel type as well, and that it can be used as a predictor of stress and focus in TA, especially for male speakers.

Results for F2 patterns under stress and focus are reported for the same four TA vowels as produced by the same male and female speakers. A four-way analysis of variance is used with sex, vowel type, stress, and focus as fixed effects, repetition as repeated measure and speaker as random factor to check the significance of the values found. The results show that factor sex is highly significant (F (1, 288) = 75.45; p < .001), vowel type is very significant, too (F (3, 288) = 158.27; p < .001). Stress and focus are, however, non-significant. All types of interaction between the different fixed effects are found to be non-significant. It can be concluded from these results that the type of the vowel and the sex of the speaker affect only F2 in TA vowels. Stress and focus do not cause significant changes to this formant. Plotting the vowels on the F1/F2 dimension for male and female speakers in the two focus conditions revealed that unstressed TA vowels are not as centralized as English unstressed vowels (see Figure 6). A more comprehensive discussion of the results of experiment 2 on the role of vowel quality in cueing stress and or accent in TA is provided in the next section.

Figure 6. Vowel plot for TA stressed and unstressed vowels in [+F] and [−F] conditions.

4. DISCUSSIONTop

4.1. Duration

In this study, duration is found to be a correlate of lexical stress in SBE, which is consistent with both the studies that distinguish between lexical stress and accent (Sluijter, 1995, for American English and Dutch) and with those studies where stress and accent are confounded (Fry, 1958; Lieberman, 1960; Mac Clean & Tiffany, 1978). In TA, however, the results of experiment 2 show that duration is not used to signal lexical stress by Tunisian speakers. No significant difference is at all found between stressed and unstressed syllables in the absence of a pitch accent on the word. The results confirm Braham’s (1997) findings concerning MSA which revealed no significant effect of lexical stress on the duration of short vowels, and also those by Ben Slama (2002), who found no significant effect of stress on the duration of vowels in MSA in either normal or rapid speech. It is yet worthwhile reminding that neither Braham’s (1997) nor Ben Slama’s (2002) experiments were controlled for focus effects and we cannot consequently know to which type of stress their results are related.

The lack of lexical stress durational involvement observed in TA can be considered similar to the behavior of Japanese (Beckman, 1986; Mitsya & Stugito, 1978, in Beckman, 1986), Turkish (Levi, 2003) and Welsh (Williams, 1985), where no significant durational differences between stressed and unstressed vowels and syllables could be found. In fact, TA seems to confirm Berinstein’s (1979) hypothesis that languages with phonemic length do not use duration as an acoustic correlate of stress (K’ekchi and Latvian, Bond, 1991). This phenomenon implies that when a prosodic parameter is used to encode a certain contrast in the phonological system of a language, its importance as a stress cue may be diminished. Length is phonemic in TA, just as in MSA and most of the other dialects of Arabic. Possible extra lengthening brought about by lexical stress may change the phonemic structure of a segment and therefore change meaning.

Concerning accent, results show that duration does signal accent in both SBE and TA. When a pitch accent is realized on the word, significant length differences between stressed and unstressed syllables are found in the two languages. Although both groups of speakers show similar focus-related lengthening effects on stressed syllables, English but not Arabic speakers show an asymmetric lengthening pattern with respect to the effect of focus on unstressed syllables. Focus lengthens the duration of final unstressed syllables only in SBE, while in TA both initial and final unstressed syllables were affected by focus. Results of both experiments demonstrate that duration is a reliable acoustic correlate of phrasal stress. This finding is very consistent with previous work on the acoustic correlates of stress in various languages of the world, which shows that duration is a very reliable and constant cue to stress and accent especially in those languages that do not use length phonemically.

In TA, although duration is found not to be a correlate of lexical stress, stressed syllables placed under focus differed significantly from their unstressed counterparts in terms of length. Furthermore, focus is found to increase the duration of both stressed and unstressed syllables. The temporal expansion of accented items, here, is meant to highlight the word and draw the listener’s attention to it. It seems to a have a linguistic communicational function.

4.2. Spectral Balance

Spectral balance is a reliable correlate of both stress and accent in the languages explored. In SBE, both tilt measures (H1−A2 and H1−A3) are found to clearly distinguish between lexically stressed and unstressed syllables in the [−F] as well as in the [+F] condition. Similar results are found for TA. In fact, ANOVA tests show that in TA, just like in SBE, glottal pulses are more “sinusoidal” in unstressed vowels. The mid and high-frequency emphasis (shown through H1−A2 and H1−A3 values) is weaker for unstressed vowels. This is known in the literature to indicate gentler and slower vocal fold movements. Focus does affect the rate of the closure as the results show that both stressed and unstressed vowels of focused elements have more mid and high-frequency emphasis than their unfocused counterparts. These findings are similar to results reported for Dutch and American English by Sluijter (1995).

The results of this research on spectral balance reinforce its significance and strength in cueing stress and accent. Previous studies suggested that the importance of this cue overrides that of f₀ and overall intensity and that it equals duration in its strength in discriminating stressed from unstressed constituents (Sluijter, 1995; Sluijter & Van Heuven, 1996). It is also considered as such because it signals not only focal accent but also lesser degrees of accent (Sluijter & Van Heuven, 1996). In addition, this acoustic correlate of stress and accent was found to be significant in typologically different languages (English and Dutch (Sluijter, 1995), Polish, Macedonian and Bulgarian (Crosswhite, 2003), Russian (Gordieva et al., 2003), Maya (Remijsen, 2001) and Tunisian Arabic in the present study, which reinforces its strength in cueing stress and accent.

4.3. Vowel Quality

The results of vowel quality cue to stress and accent in the languages explored in the present study exhibit a lot of variation. In SBE, vowel quality is found to be a reliable correlate of lexical stress. The statistical tests show that stress affects both F1 and F2 values of British vowels. In TA, however, only the first formant of the vowels used in experiment 2 is affected by lexical stress and could be used to predict it. The second formant, however, was not affected by stress. Regarding accent, the two languages behave the same way as in neither language is vowel quality used as a correlate of accent. The statistical tests in the two experiments show that focus has no significant effect on either F1 or F2 of the vowels explored. This is different from Sluijter’s (1995) findings about vowel quality and its role in cueing accent in American English and Dutch where focused constituents marked by a pitch accent had a fuller vowel quality compared with unfocused constituents. Stress and focus affect the first formant of TA vowels but not their second formant. Although these vowels undergo some changes due to stress and focus, the degree of F2 change under stress and focus differs from vowel to vowel and from male to female speakers. Actually, these vowels have not changed their front-back positions. Extreme cases of reduction, where vowels lose their quality and become schwa-like are scarcely observed in this experiment especially in the [+F] condition, that is, when a pitch accent is realized on the vowel. The results of experiment 2 allow claiming that only gradient vowel height is a correlate of stress in TA. The type of change occurring to unstressed vowels in TA seems to be rather similar to what Harris (2004) referred to as “centrifugal” reduction—as vowels in this type of reduction are dispersed in the far corners of the vocalic space—and it is opposed to what the same author called “centripetal”, where reduced reflexes are drawn into a central region in the vocalic space. Both, centripetal and centrifugal reductions have the shared effect of diminishing the amount of phonetic information in the speech signal (Harris, 2004).

The nature of vowel reduction observed in TA in this experiment may also be caused by the nature of the speech used: controlled speech, that is evidently different from spontaneous speech. More severe spectral changes are likely to happen in spontaneous speech. Further experiments on other types of speech (spontaneous or rapid) should be performed to get more insight on the role of vowel quality in signaling stress and/or accent in TA.

Larger parts of this research explored the roles of both f₀, measured at a midpoint of the target vowel, and intensity, measured at the peak of the target vowel as well, but it was considered ineffective to report these results since they needed more refining. Further accurate and more precise measurements of these two parameters would certainly provide more insight into their roles as correlates of stress and/or accent in English and Tunisian Arabic.


Baart, J. L.G. (1987). Focus, syntax, and accent placement: towards a rule system for the derivation of pitch accent patterns in Dutch. PhD Dissertation , Leiden University.
Beckman, M. (1986). Stress and non-stress accent. Dordrecht: Foris. https://doi.org/10.1515/9783110874020
Beckman, M., Edwards, J.R., & Fletcher, J. (1992). Prosodic structure and tempo in a sonority model of articulatory dynamics. In G. Docherty and D.R. Ladd (Eds.), Papers in Laboratory Phonology, 2. Cambridge University Press: Cambridge, 68-86. https://doi.org/10.1017/cbo9780511519918.004
Ben Slama, N. (2002). The factors influencing syllable structure in Standard Arabic. Unpublished MA Thesis, University of Carthage, Tunisia.
Berinstein, A. (1979). A cross- linguistic study on the perception and production of stress. Working Papers in Phonetics, 47. University of California, Los Ángeles.
Braham, AF. (1997).The temporal organization of speech in Arabic (a perceptual study). PhD Dissertation. Faculty of Letters, La Manouba, University of Tunis I.
Bond, D. (1991). Vowel and word duration in Latvian. Journal of Baltic Studies, 22, 133-144.
Crosswhite, K (2003). Spectral tilt as a cue to word stress in Polish, Macedonian and Bulgarian. Proceedings of the 15th International Congress of Phonetic Science, Barcelona.
De Jong, K, & Zawaydeh, B. (1999). Focus, phonological focus, quantity, and voicing effects on vowel duration in Ammani Arabic. Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco.
Fry, D.B (1958). Experiments in the perception of stress. Language and Speech, 1, 126-152.
Gordieva, O., Mennen, I., & Scobbie, J. (2003). Vowel duration and spectral balance in Scottish English and Russian. Proceedings of the 15th International Conference of Phonetic Sciences, Barcelona.
Harris, J (2004). Vowel reduction as an information loss. In P. Carr, J. Durand, & C.J. Ewen (Eds.), Headhood, elements, specification, and contrastivity (pp. 121-132). Amsterdam: John Benjamins.
Ladd, R.D. (1980). The structure of intonational meaning: Evidence from English. Bloomington: Indiana University Press.
Lehiste, I. (1970). Suprasegmentals. Cambridge, M.A: MIT Press.
Levi, S.V. (2005). Acoustic correlates of lexical accent in Turkish. Journal of the International Phonetic Association, 35(1), 73-97. https://doi.org/10.1017/S0025100305001921
Lieberman, P. (1960). Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of America, 32, 451-454. https://doi.org/10.1121/1.1908095
Mc Clean, M.D., & Tiffany, W.R. (1973). The acoustic parameters of stress in relation to the syllable position, speech loudness, and rate. Language and Speech, 16, 283-290.
Rajouani, A., Najim, A., Chiadmi, D., & Zyoute, M. (1987). Synthesis-by-rule of Arabic language. European Conference on Speech Technology. Edinburgh, Scotland.
Remijsen, B. (2001). Word-prosodic systems of Raja Ampat languages. Leiden: LOT Utrecht publishers.
Sluijter, A.M. (1995). Phonetic correlates of stress and accent. PhD Dissertation. University of Leiden.
Sluijter, A.M.C. & Van Heuven, V. (1996). Spectral balance as an acoustic correlate of linguistic stress. Journal of the Acoustic Society of America, 100(4), 2471-2485. https://doi.org/10.1121/1.417955
Sluijter, A. M., Shattuck-Hufnagel, K. N., Stevens, V.J., & Van Heuven, V. (1995). Supra-laryngeal resonance and glottal pulse shape as correlates of prosodic stress and accent in American English. Proceedings of the 13th Congress of Phonetic Sciences, Stockholm, 630-633.
Turk, A. & Sawusch, J. R. (1997). The domain of accentual lengthening in American English. Journal of Phonetics, 25, 25-41. https://doi.org/10.1006/jpho.1996.0032
Turk A. & White, L. (1999). Structural influences on accentual lengthening in English. Journal of Phonetics, 27, 171-206. https://doi.org/10.1006/jpho.1999.0093
Turk, A., Nakai, S. & Sugahara, M. (2006). Acoustic segment durations in prosodic research: A practical guide. In Sudhoff, S., D. Lenertová, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter & J. Schließer (Eds.) Methods in Empirical Prosody Research. Berlin, New York: De Gruyter, 1-28. https://doi.org/10.1515/9783110914641.1
Williams, B. (1985). Pitch and duration in Welsh stress perception: the implications for intonation. Journal of Phonetics, 13, 381-401.

Typological variation in the phonetic realization of lexical and phrasal stress: Southern British English vs. Tunisian Arabic

ABSTRACT

RESUMEN

1. INTRODUCTIONTop

2. METHODOLOGYTop

3. RESULTSTop

4. DISCUSSIONTop

5. CONCLUSIONTop

REFERENCESTop