Further classifiers & results

The previous sections detailed some of the feature selection techniques to ground facial classes-to-visual features bindings. The filter as well as the wrapper approaches localizes features corresponding to some of the classes. The following subsections describe various classification methods to distinguish these facial feature classes. More emphasis is laid on the classes that rank higher in contributing to the training dataset. Facial classes such as sex, expressions form the major bulk of the dataset while property related classes contribute less.

In a generic binary classifier, an input facial image generates a scalar output whose polarity determines class membership. The magnitude of the scalar output can usually be interpreted as a measure of belief or certainty in the decision made. Nearly all binary classifiers may be viewed in these terms: for density based classifiers, the output is the log likelihood ratio, while for kernel based methods, the output is related to the distance from the separating boundary. To extend these techniques for multi class techniques, classification decisions are made using one-vs-all classifiers or all-pairs classifiers.

Support Vector Machines

The basic training principle behind SVMs is finding the optimal linear hyper plane such that good generalization performance is achieved. According to the structural risk minimization principle, a function that effectively classifies the training data and belongs to the set of functions with the lowest VC dimension will generalize best regardless of the dimensionality of the space. Linear classifiers have a VC dimension of d+1 where d is the dimensionality of the space. This is the lowest among the family of classifiers. For linearly non-separable data, SVMs map the input to a higher dimensional where a linear hyper plane can be obtained. The SVM paradigm requires preliminaries listed below

· The function space should have countable basis. Dot product has to be defined in the space. The space can be infinite dimensional. (RKHS)

· There exists a unique, symmetric, positive definite kernel for a RKHS called the reproducing kernel that maps a cross product of d dimensional spaces or their subsets to the real number space. The kernel possesses reproducing property

· A symmetric positive definite kernel can be expressed as a dot product of implicit non linear projections (Mercer theorem)

· A function can be described as a linear combination of eigen functions in the function space

· The solution to the Tikhonov Regularization problem of minimizing a loss function with a regularization on the norm of the function can be expressed as

o f(x) = S_i a_iK(x,x_i) (Representer Theorem)

· A special case of Tikhonov Regularization problem having Hinge loss as the loss function with the norm defined on RKHS results in the problem of Support Vector Machines

Several kernels like linear, polynomial and gaussian radial basis functions have been shown to satisfy both the reproducing property and Mercer theorem.

Embedded HMMs

A one-dimensional HMM may be generalized to give it the appearance of a two-dimensional structure, by allowing each state in the one dimensional HMM to be an HMM. In this way, the HMM consists of a set of states called super-states along with a set of embedded states. Transitions between embedded states in different super states are not allowed. The elements of the HMM topology and related parameters are listed below

The number of super states and the set of super states
The initial state distribution of the super states
The super state transition probability matrix
The initial state distribution of the embedded states
The embedded states transition matrix
The embedded states emission distribution parameters

Besides these, we also used gaussian mixture models, one dimensional hmms and neural networks for classification purposes.

Experiments

The following subsections describe our study on each of the classification problems.

Gender Classification

In our study for gender classification, the original images were down sampled to 32x32 images. Image regions that discriminated gender from other classes were taken into account and image vectors were formed. Class priors were estimated from the training dataset. Given the fair distribution in the priors, principal component analysis was performed on the set of vectors. The first hundred components with the highest eigen values were considered. A total of 1997 images were used in our experiments. For each classification technique, we used 5-fold cross validation over the training set, i.e, 1/5^th of the training set was used as a validation set and the process was repeated with non-overlapping rotations.

The SVM classifier was used with various kernels; linear, polynomial and gaussian RBF kernels. For each of these kernels, we needed to determine the regularization parameter C at controls the tradeoff for empirical risk minimization and the norm of the function. A larger value of the parameter takes care of the empirical loss more than the norm of the function. This results in over fitting and longer training time. On the other hand, a smaller value will restrict the norm of the function and hence over generalize the problem. The value of C was estimated using the validation set. In the case of polynomial classifiers, the degree of the polynomial was estimated using the same CV technique.

The following table reports average test error for each of the kernel methods. The error rates are evaluated using parameters estimated from cross validation

Kernel Methods

Regularization Parameter

Standard Deviation Parameter

Average Cross Validation Accuracy

Average Accuracy on the test set

Gaussian RBF

Cubic Polynomial Kernel

Quadratic Polynomial Kernel

Linear Kernel

C = 19

C = 27

C = 33

C = 40

s = 13.42

83.04%

79.7%

76.2%

71%

80.6%

74.36%

72.41%

69.3%

HMM based classifiers were also studied. A five super state two-dimensional embedded HMM was trained on faces cropped from the images for each of the classes. A 10x10 window was used to scan the images top to bottom and left to right. For each 10x10 window, 2D-DCT coefficients were calculated resulting 3x3 observation vector per block. The HMMs were trained using OpenCV Embedded HMM toolkit. The average test set accuracy is reported in the following table.

Average Test Set Accuracy

HMM without priors

HMM with priors

70.1%

63%

Linear as well as quadratic density based classifiers were tried on the training dataset.

Age Classification

The age classification problem is a multi class problem. We used SVMFu, a multi class trainable SVM package, to train support vector machines by one-vs.-all as well as all-pairs approach. The software reports average leave-one-out error over the training dataset. We used the leave-one-out bound to decide on the regularization parameters. Linear as well as polynomial kernels were used to train the multi class SVMs. Due to the uneven distribution of the priors, Gaussian RBF kernels overfitted the training data.

Kernel Methods

Regularization Parameter

Standard Deviation Parameter

Average Cross Validation Accuracy

Average Accuracy on the test set

Gaussian RBF

Cubic Polynomial Kernel

Quadratic Polynomial Kernel

Linear Kernel

C = 5

C = 14

C = 17

C = 22

s = 23.9

95.05%

91.7%

89.2%

87.04%

86.7%

(Baseline)

86.3%

85.63%

83.2%

Linear Discriminant Analysis was performed on the 64x64 images taking into account the regions chosen by the feature selection algorithms. A 4-class problem resulted in 3 dimension feature vectors. Density based classifiers were studied on the resulting feature vectors. Gaussian mixture models were trained on the resulting dataset. 5-fold cross validation was used to determine the number of mixtures, same as in gender classification. Embedded HMMs were also trained for each of the classes. The HMMs were trained using the same datset as above with the same topology.

Number of mixtures

Average Test Set Accuracy

Gaussian Mixture Models with priors

HMMs with priors

C = 3

85.8%

86.7%

(Baseline accuracy)

Ethnicity Classification

A multi class svm problem

Uneven priors

Linear discriminant analysis yielding 4 dimensional feature vectors

Kernel Methods

Regularization Parameter

Standard Deviation Parameter

Average Cross Validation Accuracy

Average Accuracy on the test set

Gaussian RBF

Cubic Polynomial Kernel

Quadratic Polynomial Kernel

Linear Kernel

C = 4

C = 16

C = 20

C = 23

s = 19

86.05%

87.7%

85.24%

81.12%

85.7%

(Baseline)

85.9%

82.63%

78.2%

Number of mixtures

Average Test Set Accuracy

Gaussian Mixture Models with priors

HMMs with priors

C = 6

C = 3

85.4%

85%

Properties Classification

Among the facial properties studied, the feature selection algorithms grounded moustache and glasses satisfactorily. The rest of the facial properties such as bandana and beard were left ungrounded. The following tables report experimental results on classification of moustaches, glasses and hats

Moustache/None

Average Test Set Accuracy

LDA and Gaussian Mixture Models

One dimensional HMM

Embedded HMM

97.06%

99.2%

78.8%

Hat/None

Average Test Set Accuracy

Gaussian Mixture Models and LDA

One dimensional HMM

Embedded HMM

99.1%

99%

Glass/None

Average Test Set Accuracy

Gaussian Mixture Models and LDA

One dimensional HMM

Embedded HMM

99.56%

99.43%

99.6%

Final Results

GENDER – 80.6% SVM with Gaussian RBF kernel / 73.9% with simple classifier

AGE – 85.8% below baseline with quadratic kernel / 84.4% with simple classifier

ETHNICITY – 85.9% with cubic kernel SVM / 85.3% with simple classifier

EXPRESSION – 80% with simple classifier

MOUSTACHE – 97.9% with one-dimensional HMM/95% with simple classifier

GLASS – 99.6% with embedded HMM/ 98.2% with simple classifier

BANDANA – 99.5% with simple classifier

HAT – 99.1% with one-dimensional HMM/97.7% with simple classifier

Next page: Discussion & future directions