Classifying Positive and Negative Affective Responses

through Facial Expressions

Kyunghee Kim

Media Arts and Sciences

Purpose

A study was conducted to observe a customer's affective response through facial expressions after a customer tried two different kinds of drink. In this study, participants' facial expressions were coded into positive or negative responses by human coders at two moments: one is the moment when a participant got the drink from a vending machine, and the other is the moment when a participant tried a sip of drink. This project aims to figure out Facial Action Units that make significant differences in positive and negative facial expressions so that computer vision algorithms can improve the recognition rate of positive and negative facial expressions.

Data

Among the whole participants in this study, 17 participants are classified into a group of people who showed expressive facial expressions. Each participant tried a sip of drink 30 times, so there are 30*17 = 510 trials, 510 video recordings of participants when they got the outcome from the vending machine and 510 video recordings of participants when they tried a sip of drink. Human coders labeled these video recordings into positive(1), negative(-1) and neutral(0). Table 1 shows the summary of the data.

Table 1. Data

If a tracker was mis-tracking the feature points of participants' faces, the data was excluded for the analysis of the tracker case. Each movie clip of facial expressions was classified into positive or negative by other two human coders and I also observed it to extract features. If I couldn't find any distinctive features to classify it into positive or negative expressions such as eye brow raise, upper lip raise, or lip corner depressor, although it was labeled as positive or negative expressions by other two human coders, then the data was excluded for the analysis of the human coder case. In short, for positive expressions of outcome, 10 data points were excluded for tracker and 1 data was excluded for human coder, i.e., the number of data tracked by a tracker is less than that of a human coder by 9 data points. For negative expressions of outcome, 1 data point was excluded for tracker and 1 data point was excluded for human coder, i.e., the number of data tracked by a tracker is as same as that of a human coder. For positive expressions of sipping, none of the data was excluded for tracker and 3 data points were excluded for human coder, i.e., the number of data tracked by a tracker is greater than that of a human coder by 3 data points. For negative expressions of sipping, 2 data points were excluded for tracker and 18 data points were excluded for human coder, i.e., the number of data tracked by a tracker is greater than that of a human coder by 16 data points.

Feature Selection

Table 2 Feature Selection

While I observed the video data, I found some features that can distinguish positive and negative expressions. For example, people close their eyes, raise inner eye brow, make a slight smile with the corner of their lips and move shoulder up and down slowly when they are satisfied. When they are disappointed, people wrinkle their eye brows and lower the corner of their lips. I figured out 16 features that can be observed frequently in positive/negative facial expressions. These are jaw drop, disappointed nodding, satisfied head nodding, frown, eye brow raise, closed eyes, a big smile, lip corner puller, lip corner depressor, lip puckerer, lip suck, upper lip raiser, shoulder move, lip move to the left, eye rolling, and headshaking. Among these 16 features, some features such as eye brow raise, shoulder move and head nodding are frequently found in both positive and negative facial expressions and others such as lip corner puller and a big smile are distinctly found in positive facial expressions. Also, frown and headshaking are distinctly found in negative facial expressions.

The other method used to extract the features was through the FaceReader[1] software. And for this tracker, 10 features are chosen to classify positive and negative expressions. These 10 features are lip puckerer, smile, eyebrow raiser, lip corner depress, head nodding, headshake, jaw drop, lip stretch, mouth stretch, and lip corner pull.

Table 2 shows 16 features observed by a human coder and 10 features tracked by a tracker.

Classification Results

Three different methods were used for classification i.e., SVM, KNN, and Decision Tree. Dot product kernel function was chosen for SVM, K equals to 1 for KNN and 'prune' is 'on' and 'Splitmin' is 10 in Decision Tree. 'Prune' and 'Split' are parameters in MATLAB for decision tree function and the explanation is like the following:

'prune' 'on' (default) to compute the full tree and the optimal sequence of pruned subtrees, or 'off' for the full tree without pruning

'splitmin' A number K such that impure nodes must have K or more observations to be split (default 10)

Table 3 SVM Classification

Table 4 KNN classification

Table 5 Decision Tree Classification

As we can see from SVM, KNN and Decision Tree classification results above, the classification of the data observed by a human coder is pretty accurate, almost close to 100%. But the classification of the data tracked by a tracker is too poor. Why a tracker is too poor at recognizing positive and negative facial expressions?

The differences between a human coder and a tracker of feature extractions

I found that intuitively the main differences of a human coder and a tracker happen when they detect Lip Corner Puller and Lip Corner Depressor movements. Lip Corner Puller and Lip Corner Depressor are Facial Action Units shown in Figure 1. People are good at distinguishing Lip Corner Puller and Lip Corner Depressor but a tracker tends to classify most of the lip movements into Lip Corner Depressor. For example, as it is shown in Table 6, a tracker considered a lip movement as Lip Corner Depressor when a human considered it as a lip puller.

Figure 1 Left : Lip Corner Puller, Right : Lip Corner Depressor

Table 6 Lip Corner Depressor : Comparison between a human coder and a tracker

Semi-Automatic Coding

A tracker extracts 10 features explained in Table 2 automatically. Since its lip movement detection is significantly different from that of a human coder, as a third method, I tried semi-automatic coding, which replaces the lip movement features of a tracker, lip corner puller and lip corner depressor, with those of a human coder. As it is shown in Table 7, Table 8 and Table 9, the performances of the SVM, KNN and Decision Tree classification have improved a lot; the accuracy to classify positive and negative expressions for outcome case, has improve from 42.86% to 63.41% for positive expressions and from 53.66% to 80.49% for negative expressions.

Table 7 SVM Classification

Table 8 KNN Classification

Table 9 Decision Tree Classification

Which one is the most important feature?

Using floating search methods[2] that were covered in the class, important features were extracted. For the outcome case, Lip Corner Depressor is the most important feature, which agrees with my intuition. And for the sipping case, Upper Lip Raiser and Lip Corner Depressor are the most important features. This result also agrees with my intuition. And the features are numbered from 1 to 16 as shown in Table 10. Please refer to this numbering to understand Table 11 and Table 12.

Table 10. 16 features are numbered from 1 to 16.

Table 11 Feature Selection for the outcome case

Table 12 Feature Selection for the sipping case

Data Adjustment

Since I excluded the data point that I couldn't find any distinctive features for the data of the human coder case, this might increase the accuracy of the human coder case. So I adjust the data so that both the human coder case and the tracker case have the same number of data. And the data can be tracked by a tracker and have distinctive features when I observed the data. The performance of classifiers is like the followings. As we can see from Table 13 to Table 15, a human coder's accuracy of Decision Tree Classification, decreased by 10% compared to the case when I used the previous data. Except for that case, any other cases are similar to the previous ones.

Table 13 SVM classification

Table 14 KNN classification

Table 15 Decision Tree Classification

Lessons I learned from this project

1. Lip Corner Depressor was chosen to be the most important feature from feature selection algorithms, followed by Jaw Drop as the second and Head nodding as the third. I thought Lip Corner Depressor would be a main feature for the classification intuitively. But I assumed that Frown or Upper Lip Raiser would be more important than Jaw Drop and Head Nodding. Therefore, we can confirm our intuition or revise our assumption through this process to figure out important features to classify positive and negative expressions.

2. Once we figure out important features, we can contribute to the improvement of the accuracy of computer vision software that infers the positive or negative emotional state automatically. For example, we can focus on the implementation of more accurate Action Unit detectors for these features or use these features in Dynamic Bayesian Network rather than using less important features. Also, we can introduce semi-automatic coding method for human coders who analyze video data. For example, once the software tells the user that lip movement is detected, a human coder can label it as Lip Corner Depressor or Lip Corner Puller.

3. While I was observing video data and comparing my observation with two other human coders' observation, I could find quite a few cases that three of us don't agree. My recognition of positive and negative expressions is different not only from the recognition of software but also from other people's recognition. Therefore, it would be interesting to compare the observations of different people. Also it could be used as education data base to teach people on Autism Spectrum Disorder the fact that the differences of facial expression recognition exist not only between people on Autism Spectrum Disorder and people who are not, but also among people who are not on Autism Spectrum Disorder.

[1] API for Facial Analysis and Tagging http://web.media.mit.edu/~kaliouby/API.html

[2] Feature Selection Algorithms P. Somol, P. Pudil, http://ro.utia.cz/fs/fs_guideline.html#SRCCODES