Affect Recognition with AutoTutor

The challenge in this project is to develop a classifier to distinguish affective states such as boredom, flow and frustration from face-videos of 28  participants interacting with an intelligent tutoring system (AutoTutor). AutoTutor appears as an animated agent that acts as a dialog partner with the learner (Graesser, Chipman, Haynes, & Olney, 2005). The animated agent delivers AutoTutor's dialog moves with synthesized speech, intonation, facial expressions, and gestures. This project is an NSF-funded joint collaboration between the University of Memphis and MIT Media Lab. Please see (DÕMello et el., 2005 for a brief overview of the project).

On this web page you will find the following:

á        Description of affective states

á        Suggestions for face detection and feature extraction

á        Suggestions for classification

á        Issues to consider

á        Link to video clips and labels

á        Link to affective-DBN matlab code

Description of affective states

The videos are labeled with one of the following affective states:

á        Boredom - being weary or restless through lack of interest

á        Confusion - a noticeable lack of understanding

á        Delight - a high degree of satisfaction

á        Flow - a state of interest that results from involvement in an activity

á        Frustration - dissatisfaction or annoyance

á        Neutral - no apparent emotion or feeling

á        Surprise - wonder or amazement, especially from the unexpected


A study was conducted on 28 undergraduate students and was divided into two sessions. The first session consisted of a pretest, interaction with AutoTutor, posttest, and judgments of emotions they experienced while interacting with AutoTutor (self judgments). The second session consisted of judgments of the emotions of a peerÕs interaction with AutoTutor (peer judgments). Additionally, two judges, trained on the Facial Action Coding System (Ekman & Friesen, 1978), judged all of the sessions separately.


The ratings session proceeded by displaying video streams of the learnerÕs screen and the face captured during the AutoTutor session.  Judges were instructed to make judgments on what affective states were present in each 20-second interval at which the video automatically paused (mandatory judgments), as well as any affective states observed in between the 20-second stops (voluntary judgments).



Interrater reliabilities (CohenÕs Kappa) computed across the six rater pairs were: self-peer (.08), self-trained judge1 (.14), self-trained judge2 (.16), peer-trained judge1 (.14), peer-trained judge2 (.18), and trained judge1-trained judge2 (.36). The trained judges clearly had the highest agreement, as confirmed by statistical analyses. Additionally, agreement was higher at the voluntary samples than the mandatory points.

Therefore, the labels in the accompanying Excel file were based on voluntary points where the trained judges agreed on the emotions of the learner. Additionally, the emotions have been sampled so as to obtain an approximately consistent distribution among participants. The samples only include boredom, confusion, delight, frustration, and neutral. This is because surprise rarely occurs and flow mainly occurs during the mandatory sampling. A description of the study can be found in Graesser et. el., 2006.

Suggestions for face detection and feature extraction

The first step in any facial analysis system is face-detection. We recommend using IntelÕs open-source machine vision library - OpenCV to find the face in the videos. The function is called facedetect and uses haar features to find the face. Facedetect will return the coordinates of located face(s).


The face location serves as your starting point for feature extraction. There are many approaches to feature extraction, including image-based (e.g., whole face region, Gabor wavelets) and appearance-based methods (e.g. shape deformation of facial features).

Suggestions for classification

Using a supervised learning paradigm, pick a classification mechanism such as: Support vector machines, Hidden Markov models, Bayesian networks. In your choice consider factors such as (1) is the classifier discriminative or generative?; (2) is the classifier probabilistic; (3) does it put into consideration temporal information; (4) is the model a hierarchical one?

Issues to think about

How many classifiers have you used to recognize the set of affective states? How did you make that decision?

What feature extraction approach did you go for? What factors did you consider in that decision? (speed, accuracy, minimal manual preprocessing?)  

What features were the most discriminative? Is this different for each affective state?

What type of evaluation procedure did you follow (cross-validation?)? Were the tests person-dependent or person-independent (whatÕs the different in accuracy between the two?)

Which factors confused the classifier(s) (e.g. light changes, head movement, face occlusion)?

Which affective state yields the highest classification rate? Which affective state is the hardest to detect? Why?

What is the confusion matrix for these states?

Download the video clips and labels

To download the video clips (about 3 seconds each) and accompanying labels:

1. Use ftp and log in to

2. User anonymous

3.  Pswd, your email address

4. "cd pub"

5. " get Mem-Exp107-AffectSamples.tar.gz"


Please note that we do not have permission from most of the participants to display these clips at conferences or publish them in journal articles. If you would like to do so, please let us know so we can locate the signed release forms for the few participants who gave us permission.


Download affective-DBNs Matlab code

Download affective-DBN Matlab code (by Rana el Kaliouby) here. The code uses Kevin MurphyÕs BNT.


For more information, please email Rana el Kaliouby

or Sidney DÕMello


1.      D'Mello, S. K., Craig, S. D., Gholson, B., Franklin, S., Picard, R.,& Graesser, A. C. (2005). Integrating affect sensors in an intelligent tutoring system. In Affective Interactions: The Computer in the Affective Loop Workshop at 2005 International conference on Intelligent User Interfaces (pp. 7-13) New York: AMC Press.

2.      Ekman, P., & Friesen, W. V. (1978). The facial action coding system: A technique for the measurement of facial movement. Palo Alto: Consulting Psychologists Press.

3.      Graesser, A.C., Chipman, P., Haynes, B.C., & Olney, A. (2005). AutoTutor: An intelligent tutoring system with mixed-initiative dialogue. IEEE Transactions in Education, 48, 612-618.

4.      Graesser, A.C., McDaniel, B., Chipman, P., Witherspoon, A., DÕMello, S., and Gholson, B. in press. Detection of  Emotions During Learning with AutoTutor. Proceedings of the 28th Annual Conference of the Cognitive Science Society,  (pp. 285-290). Cognitive Science Society 2006.