Further classifiers & results

 

The previous sections detailed some of the feature selection techniques to ground facial classes-to-visual features bindings. The filter as well as the wrapper approaches localizes features corresponding to some of the classes. The following subsections describe various classification methods to distinguish these facial feature classes. More emphasis is laid on the classes that rank higher in contributing to the training dataset. Facial classes such as sex, expressions form the major bulk of the dataset while property related classes contribute less.

 

In a generic binary classifier, an input facial image generates a scalar output whose polarity determines class membership. The magnitude of the scalar output can usually be interpreted as a measure of belief or certainty in the decision made. Nearly all binary classifiers may be viewed in these terms: for density based classifiers, the output is the log likelihood ratio, while for kernel based methods, the output is related to the distance from the separating boundary.  To extend these techniques for multi class techniques, classification decisions are made using one-vs-all classifiers or all-pairs classifiers.

 

Support Vector Machines

 

            The basic training principle behind SVMs is finding the optimal linear hyper plane such that good generalization performance is achieved. According to the structural risk minimization principle, a function that effectively classifies the training data and belongs to the set of functions with the lowest VC dimension will generalize best regardless of the dimensionality of the space. Linear classifiers have a VC dimension of d+1 where d is the dimensionality of the space. This is the lowest among the family of classifiers. For linearly non-separable data, SVMs map the input to a higher dimensional where a linear hyper plane can be obtained. The SVM paradigm requires preliminaries listed below

 

·        The function space should have countable basis. Dot product has to be defined in the space. The space can be infinite dimensional. (RKHS)

·        There exists a unique, symmetric, positive definite kernel for a RKHS called the reproducing kernel that maps a cross product of d dimensional spaces or their subsets to the real number space. The kernel possesses reproducing property

·        A symmetric positive definite kernel can be expressed as a dot product of implicit non linear projections (Mercer theorem)

·        A function can be described as a linear combination of eigen functions in the function space

·        The solution to the Tikhonov Regularization problem of minimizing a loss function with a regularization on the norm of the function can be expressed as

o       f(x)  =  Si aiK(x,xi)    (Representer Theorem)

 

·        A special case of Tikhonov Regularization problem having Hinge loss as the loss function with the norm defined on RKHS results in the problem of Support Vector Machines

 

 

Several kernels like linear, polynomial and gaussian radial basis functions have been shown to satisfy both the reproducing property and Mercer theorem.

 

Embedded HMMs

           

            A one-dimensional HMM may be generalized to give it the appearance of a two-dimensional structure, by allowing each state in the one dimensional HMM to be an HMM. In this way, the HMM consists of a set of states called super-states along with a set of embedded states. Transitions between embedded states in different super states are not allowed. The elements of the HMM topology and related parameters are listed below

 

 

 

Besides these, we also used gaussian mixture models, one dimensional hmms and neural networks for classification purposes.

 

 

 

Experiments

 

The following subsections describe our study on each of the classification problems.

 

Gender Classification

 

            In our study for gender classification, the original images were down sampled to 32x32 images. Image regions that discriminated gender from other classes were taken into account and image vectors were formed. Class priors were estimated from the training dataset. Given the fair distribution in the priors, principal component analysis was performed on the set of vectors. The first hundred components with the highest eigen values were considered. A total of 1997 images were used in our experiments. For each classification technique, we used 5-fold cross validation over the training set, i.e, 1/5th of the training set was used as a validation set and the process was repeated with non-overlapping rotations.

 

The SVM classifier was used with various kernels; linear, polynomial and gaussian RBF kernels. For each of these kernels, we needed to determine the regularization parameter C at controls the tradeoff for empirical risk minimization and the norm of the function. A larger value of the parameter takes care of the empirical loss more than the norm of the function. This results in over fitting and longer training time. On the other hand, a smaller value will restrict the norm of the function and hence over generalize the problem. The value of C was estimated using the validation set. In the case of polynomial classifiers, the degree of the polynomial was estimated using the same CV technique.

 

The following table reports average test error for each of the kernel methods. The error rates are evaluated using parameters estimated from cross validation

 

 Kernel Methods

 Regularization   Parameter

 Standard Deviation Parameter

 Average Cross Validation Accuracy

Average Accuracy on the test set 

 Gaussian RBF

 

Cubic Polynomial Kernel

 

Quadratic Polynomial Kernel

 

Linear Kernel

 C = 19

 

 

 C = 27

 

 

 

 C = 33

 

 

 

 C = 40

 s = 13.42

 

 

         -

 

 

 

         -     

 

 

   

         -      

 83.04%

 

 

 79.7%

 

 

 

 76.2%

 

 

 

  71%

 80.6%

 

 

 74.36%

 

 

 

 72.41%

 

 

 

  69.3%

 

 

HMM based classifiers were also studied. A five super state two-dimensional embedded HMM was trained on faces cropped from the images for each of the classes.  A 10x10 window was used to scan the images top to bottom and left to right. For each 10x10 window, 2D-DCT coefficients were calculated resulting 3x3 observation vector per block. The HMMs were trained using OpenCV Embedded HMM toolkit. The average test set accuracy is reported in the following table.

 

 

Average Test Set Accuracy

HMM without priors

 

HMM with priors

      70.1%

 

      63%

 

 

 

Linear as well as quadratic density based classifiers were tried on the training dataset.

 

 

Age Classification

 

            The age classification problem is a multi class problem. We used SVMFu, a multi class trainable SVM package, to train support vector machines by one-vs.-all as well as all-pairs approach. The software reports average leave-one-out error over the training dataset. We used the leave-one-out bound to decide on the regularization parameters. Linear as well as polynomial kernels were used to train the multi class SVMs. Due to the uneven distribution of the priors, Gaussian RBF kernels overfitted the training data.

 

 

 

 Kernel Methods

 Regularization   Parameter

 Standard Deviation Parameter

 Average Cross Validation Accuracy

Average Accuracy on the test set 

 Gaussian RBF

 

Cubic Polynomial Kernel

 

Quadratic Polynomial Kernel

 

Linear Kernel

 C = 5       

 

 

 C = 14

 

 

 

 C = 17

 

 

 

 C = 22

       s = 23.9

 

 

         -

 

 

 

         -    

 

 

   

         -      

 95.05%

 

 

 91.7%

 

 

 

 89.2%

 

 

 

  87.04%

 86.7%

(Baseline)

 

 86.3%

 

 

 

 85.63%

 

 

 

  83.2%

 

           

            Linear Discriminant Analysis was performed on the 64x64 images taking into account the regions chosen by the feature selection algorithms. A 4-class problem resulted in 3 dimension feature vectors. Density based classifiers were studied on the resulting feature vectors. Gaussian mixture models were trained on the resulting dataset. 5-fold cross validation was used to determine the number of mixtures, same as in gender classification. Embedded HMMs were also trained for each of the classes. The HMMs were trained using the same datset as above with the same topology.

 

 

 

 

Number of mixtures

Average Test Set Accuracy

Gaussian Mixture Models with priors

 

HMMs with priors

  C = 3

 85.8%

 

 

 86.7%

 (Baseline accuracy)

 

 

 

Ethnicity Classification

 

*  A multi class svm problem

*  Uneven priors

*  Linear discriminant analysis yielding 4 dimensional feature vectors

 

 

 

 Kernel Methods

 Regularization   Parameter

 Standard Deviation Parameter

 Average Cross Validation Accuracy

Average Accuracy on the test set 

 Gaussian RBF

 

Cubic Polynomial Kernel

 

Quadratic Polynomial Kernel

 

Linear Kernel

 C = 4      

 

 

 C = 16

 

 

 

 C = 20

 

 

 

 C = 23

       s = 19

 

 

         -

 

 

 

         -    

 

 

   

         -      

 86.05%

 

 

 87.7%

 

 

 

 85.24%

 

 

 

  81.12%

 85.7%

(Baseline)

 

 85.9%

 

 

 

 82.63%

 

 

 

  78.2%

 

 

 

 

 

Number of mixtures

Average Test Set Accuracy

Gaussian Mixture Models with priors

 

HMMs with priors

  C = 6

 

 

  C = 3

 85.4%

 

 

 85%

 

 

 

 

 

           

Properties Classification

 

 

            Among the facial properties studied, the feature selection algorithms grounded moustache and glasses satisfactorily. The rest of the facial properties such as bandana and beard were left ungrounded. The following tables report experimental results on classification of moustaches, glasses and hats

 

 

 

Moustache/None

 

Average Test Set Accuracy

LDA and Gaussian Mixture Models

 

One dimensional HMM

 

Embedded HMM

 

  97.06%

 

 

 

 

   99.2%

 

 

   78.8%

.

 

Hat/None

 

Average Test Set Accuracy

Gaussian Mixture Models and LDA

 

One dimensional HMM

 

Embedded HMM

 

99.1%

 

 

 

99.1%

 

 

 

 99%

.

 

Glass/None

 

Average Test Set Accuracy

Gaussian Mixture Models and LDA

 

One dimensional HMM

 

Embedded HMM

 

99.56%

 

 

 

99.43%

 

 

 

 99.6%

.

 

              Final Results

 

                          GENDER – 80.6% SVM with Gaussian RBF kernel / 73.9% with simple classifier

                          AGE – 85.8% below baseline with quadratic kernel / 84.4% with simple classifier

                          ETHNICITY – 85.9% with cubic kernel SVM / 85.3% with simple classifier

                          EXPRESSION – 80% with simple classifier

                          MOUSTACHE – 97.9% with one-dimensional HMM/95% with simple classifier

                          GLASS – 99.6% with embedded HMM/ 98.2% with simple classifier

                          BANDANA – 99.5% with simple classifier

                          HAT – 99.1% with one-dimensional HMM/97.7%  with simple classifier

 

 

                         

  

 

 

Next page: Discussion & future directions