Barkhuysen, P., Krahmer, E., & Swerts, M. (2008). The interplay between the auditory and visual modality for end-of-utterance detection. The Journal of Acoustical Society of America, 123(1), 354–365.
Research questions being investigated in this study included: First, which modalities (auditory, visual, or the combination of both) help listeners in signaling the finality of utterances, and second, how sensitive native listeners are to the signals offered by these modalities. Two perception experiments were designed to answer these questions. The first was a reaction time experiment, in which eight native Dutch speakers’ answers to an interview-style elicitation were selected (either five-word or three-word long). These raw materials were then edited to be available in three modalities: auditory-only (AO), visual-only (VO), and audio-visual (AV). Each participant received all stimuli with different modalities separated by blocks. Their reaction time was measured. Results showed that longer fragments (five words) were easier for end-of-utterance detection, especially in the case of VO condition. More importantly, participants responded more quickly under AV and AO conditions, as compared with the VO condition. Following this, the second classification experiment was conducted, aiming for further clarification on the factors contributing to the judgment. Stimuli were two-word and one-word targets chosen from the same eight speakers’ recordings. This time, finality was added in as a within-group factor, and modality changed to be a between-group factor. Results showed that those assigned to the bimodal condition yielded the highest accuracy, and longer stimuli again, evoked more correct answers. Based on these findings, the authors thus concluded that participants performed better when provided with bimodal cues, and when finality cues were not available, they needed longer fragments to help them make a decision.
No comments:
Post a Comment