Toward a Unified Theory of Audiovisual Integration in Speech Perception
Auditory and visual speech recognition unfolds in real time and occurs effortlessly for normal hearing listeners. However, model theoretic descriptions of the systems level cognitive processes responsible for integrating auditory and visual speech information are currently lacking, primarily because they rely too heavily on accuracy rather than reaction time predictions. Speech and language researchers have argued about whether audiovisual integration occurs in a parallel or in coactive fashion, and also the extent to which audiovisual occurs in an efficient manner. The Double Factorial Paradigm introduced in Section 1 is an experimental paradigm that is equipped to address dynamical processing issues related to architecture (parallel vs. coactive processing) as well as efficiency (capacity). Experiment 1 employed a simple word discrimination task to assess both architecture and capacity in high accuracy settings. Experiments 2 and 3 assessed these same issues using auditory and visual distractors in Divided Attention and Focused Attention tasks respectively. Experiment 4 investigated audiovisual integration efficiency across different auditory signal-to-noise ratios. The results can be summarized as follows: Integration typically occurs in parallel with an efficient stopping rule, integration occurs automatically in both focused and divided attention versions of the task, and audiovisual integration is only efficient (in the time domain) when the clarity of the auditory signal is relatively poor--although considerable individual differences were observed. In Section 3, these results were captured within the milieu of parallel linear dynamic processing models with cross channel interactions. Finally, in Section 4, I discussed broader implications for this research, including applications for clinical research and neural-biological models of audiovisual convergence.