The study of the perception of synthetic character emotion is important for two reasons. Firstly, a greater understanding of how humans perceive these displays may lead to more streamlined designs for avatar and robotic emotional displays. Secondly, it will further allow researchers to address the more fundamental question of how humans integrate information to arrive at final emotional assignments.
This study was motivated by the well-known work of McGurk and MacDonald in which they found that in the presence of conflicting syllabic audio-visual information, the combined perception may result in a syllable perception different from that presented in either of the individual channels. For a demo of the audio-visual effect, please click here.
The question of perception integration in the presence of conflicting emotional cues has been studied most commonly through the use of still photographs [deGelder 2000, Massaro 2000, deGelder 1999, Hietanen 2004] and accompanying emotional vocalizations presented concurrently to the participant. The participant was then asked to identify the emotion to which either the combined or single channel (voice only or face only) utterance belonged using a discrete emotion category (e.g. happy vs. sad) also referred to as a forced-choice analysis. The results showed that the facial emotion expression more strongly biased the emotional perception of the user than the vocal emotion expression [deGelder 2000]. In another study, using film [Parke07], researchers combined emotional music with neutral video content. They modeled the user's emotional perceptions using linear regression and found that the music accompanying the film clip had a stronger effect on the perception of the users than the visual content of the film clip.
The evaluations are performed using a web interface (see image below). The participants are randomly presented with a clip from one of the following three categories: video and audio, audio only, and video only. The clips are rated from 0 - 100 along the dimensions of valence, activation, and dominance (VAD). The rate scores are normalized using z-score normalization along all three dimensions to allow for comparissons between individuals.
This work is supported by the National Science Foundation's Graduate Research Fellowship Program, the Herbert Kunzel Engineering Fellowship, and the Intel Foundation Fellowship.