Robotics Research Lab
CRES
USC Computer Science
USC Engineering
USC
/ Research / Projects / Human Perception of Synthetic Character Emotion

Overview Background Approach Videos
Images Publications Support Contact Details

Top
Overview

The study of the perception of synthetic character emotion is important for two reasons. Firstly, a greater understanding of how humans perceive these displays may lead to more streamlined designs for avatar and robotic emotional displays. Secondly, it will further allow researchers to address the more fundamental question of how humans integrate information to arrive at final emotional assignments.

Background

This study was motivated by the well-known work of McGurk and MacDonald in which they found that in the presence of conflicting syllabic audio-visual information, the combined perception may result in a syllable perception different from that presented in either of the individual channels. For a demo of the audio-visual effect, please click here.

The question of perception integration in the presence of conflicting emotional cues has been studied most commonly through the use of still photographs [deGelder 2000, Massaro 2000, deGelder 1999, Hietanen 2004] and accompanying emotional vocalizations presented concurrently to the participant. The participant was then asked to identify the emotion to which either the combined or single channel (voice only or face only) utterance belonged using a discrete emotion category (e.g. happy vs. sad) also referred to as a forced-choice analysis. The results showed that the facial emotion expression more strongly biased the emotional perception of the user than the vocal emotion expression [deGelder 2000]. In another study, using film [Parke07], researchers combined emotional music with neutral video content. They modeled the user's emotional perceptions using linear regression and found that the music accompanying the film clip had a stronger effect on the perception of the users than the visual content of the film clip.

Approach

The evaluations are performed using a web interface (see image below). The participants are randomly presented with a clip from one of the following three categories: video and audio, audio only, and video only. The clips are rated from 0 - 100 along the dimensions of valence, activation, and dominance (VAD). The rate scores are normalized using z-score normalization along all three dimensions to allow for comparissons between individuals.

Videos

Images

Experimental Setup
Four Faces

Publications
Support

This work is supported by the National Science Foundation's Graduate Research Fellowship Program, the Herbert Kunzel Engineering Fellowship, and the Intel Foundation Fellowship.

Contact

Emily Mower: mower at usc dot edu
Emily Mower's Personal Page