Affect Recognition with AutoTutor
The
challenge in this project is to develop a classifier to distinguish affective
states such as boredom, flow and frustration from face-videos of 28 participants interacting with an
intelligent tutoring system (AutoTutor). AutoTutor appears as an animated agent
that acts as a dialog partner with the learner (Graesser, Chipman, Haynes,
& Olney, 2005). The animated agent delivers AutoTutor's dialog moves with
synthesized speech, intonation, facial expressions, and gestures. This project
is an NSF-funded joint collaboration between the University of Memphis and MIT
Media Lab. Please see (DÕMello et el., 2005 for a brief overview of the
project).
On this web page you will
find the following:
á
Description of affective states
á
Suggestions for face detection and feature
extraction
á
Suggestions for classification
á
Link to video clips and labels
á
Link to affective-DBN matlab code
The videos are labeled
with one of the following affective states:
á
Boredom -
being weary or restless through lack of interest
á
Confusion
- a noticeable lack of understanding
á
Delight -
a high degree of satisfaction
á
Flow - a
state of interest that results from involvement in an activity
á
Frustration
- dissatisfaction or annoyance
á
Neutral -
no apparent emotion or feeling
á
Surprise
- wonder or amazement, especially from the unexpected
A study
was conducted on 28 undergraduate students and was divided into two sessions.
The first session consisted of a pretest, interaction with AutoTutor, posttest,
and judgments of emotions they experienced while interacting with AutoTutor (self judgments). The second session
consisted of judgments of the emotions of a peerÕs interaction with AutoTutor (peer judgments). Additionally, two
judges, trained on the Facial Action Coding System (Ekman & Friesen, 1978),
judged all of the sessions separately.
The
ratings session proceeded by displaying video streams of the learnerÕs screen
and the face captured during the AutoTutor session. Judges were instructed to make judgments on what affective
states were present in each 20-second interval at which the video automatically
paused (mandatory judgments), as well as any affective states observed in
between the 20-second stops (voluntary judgments).
Interrater reliabilities
(CohenÕs Kappa) computed across the six rater pairs were: self-peer (.08),
self-trained judge1 (.14), self-trained judge2 (.16), peer-trained judge1
(.14), peer-trained judge2 (.18), and trained judge1-trained judge2 (.36). The
trained judges clearly had the highest agreement, as confirmed by statistical
analyses. Additionally, agreement was higher at the voluntary samples than the
mandatory points.
Therefore, the labels in
the accompanying Excel file were based on voluntary points where the trained
judges agreed on the emotions of the learner. Additionally, the emotions have
been sampled so as to obtain an approximately consistent distribution among
participants. The samples only include boredom, confusion, delight,
frustration, and neutral. This is because surprise rarely occurs and flow
mainly occurs during the mandatory sampling. A description of the study can be
found in Graesser et. el., 2006.
The first
step in any facial analysis system is face-detection. We recommend using
IntelÕs open-source machine vision library - OpenCV to find the face in the
videos. The function is called facedetect and uses haar features to find the
face. Facedetect will return the coordinates of located face(s).
The face
location serves as your starting point for feature extraction. There are many
approaches to feature extraction, including image-based (e.g., whole face
region, Gabor wavelets) and appearance-based methods (e.g. shape deformation of
facial features).
Using a
supervised learning paradigm, pick a classification mechanism such as: Support
vector machines, Hidden Markov models, Bayesian networks. In your choice
consider factors such as (1) is the classifier discriminative or generative?;
(2) is the classifier probabilistic; (3) does it put into consideration
temporal information; (4) is the model a hierarchical one?
How many classifiers have
you used to recognize the set of affective states? How did you make that
decision?
What feature extraction
approach did you go for? What factors did you consider in that decision?
(speed, accuracy, minimal manual preprocessing?)
What features were the
most discriminative? Is this different for each affective state?
What type of evaluation
procedure did you follow (cross-validation?)? Were the tests person-dependent
or person-independent (whatÕs the different in accuracy between the two?)
Which factors confused the
classifier(s) (e.g. light changes, head movement, face occlusion)?
Which affective state
yields the highest classification rate? Which affective state is the hardest to
detect? Why?
What is the confusion
matrix for these states?
To
download the video clips (about 3 seconds each) and accompanying labels:
1. Use ftp and log in to
141.225.14.27.
2. User anonymous
3. Pswd, your email address
4. "cd pub"
5. " get
Mem-Exp107-AffectSamples.tar.gz"
Please
note that we do not have permission from most of the participants to display
these clips at conferences or publish them in journal articles. If you would
like to do so, please let us know so we can locate the signed release forms for
the few participants who gave us permission.
Download
affective-DBN Matlab code (by Rana el Kaliouby) here. The
code uses Kevin MurphyÕs BNT.
For more
information, please email Rana el Kaliouby Kaliouby@media.mit.edu
or Sidney DÕMello sdmello@memphis.edu
1. D'Mello, S. K., Craig, S. D.,
Gholson, B., Franklin, S., Picard, R.,& Graesser, A. C. (2005). Integrating
affect sensors in an intelligent tutoring system. In Affective Interactions:
The Computer in the Affective Loop Workshop at 2005 International conference on
Intelligent User Interfaces (pp. 7-13) New York: AMC Press. http://affect.media.mit.edu/pdfs/05.dmello-etal.pdf
2. Ekman,
P., & Friesen, W. V. (1978). The facial action coding system:
A technique for the measurement of facial movement. Palo Alto: Consulting
Psychologists Press.
3. Graesser, A.C., Chipman, P.,
Haynes, B.C., & Olney, A. (2005). AutoTutor: An intelligent tutoring system
with mixed-initiative dialogue. IEEE Transactions in Education, 48, 612-618.
http://emotion.autotutor.org/files/graesser-autotutor-ieee-05.pdf
4. Graesser, A.C., McDaniel, B.,
Chipman, P., Witherspoon, A., DÕMello, S., and Gholson, B. in press. Detection
of Emotions During Learning with
AutoTutor. Proceedings of the 28th Annual Conference of the
Cognitive Science Society, (pp.
285-290). Cognitive Science Society 2006. http://emotion.autotutor.org/files/graesser-emotions-cogsci06.pdf