The previous sections detailed some of the feature selection techniques to ground facial classes-to-visual features bindings. The filter as well as the wrapper approaches localizes features corresponding to some of the classes. The following subsections describe various classification methods to distinguish these facial feature classes. More emphasis is laid on the classes that rank higher in contributing to the training dataset. Facial classes such as sex, expressions form the major bulk of the dataset while property related classes contribute less.
In a generic binary classifier, an input facial image generates a scalar output whose polarity determines class membership. The magnitude of the scalar output can usually be interpreted as a measure of belief or certainty in the decision made. Nearly all binary classifiers may be viewed in these terms: for density based classifiers, the output is the log likelihood ratio, while for kernel based methods, the output is related to the distance from the separating boundary. To extend these techniques for multi class techniques, classification decisions are made using one-vs-all classifiers or all-pairs classifiers.
The basic training principle behind SVMs is finding the optimal linear hyper plane such that good generalization performance is achieved. According to the structural risk minimization principle, a function that effectively classifies the training data and belongs to the set of functions with the lowest VC dimension will generalize best regardless of the dimensionality of the space. Linear classifiers have a VC dimension of d+1 where d is the dimensionality of the space. This is the lowest among the family of classifiers. For linearly non-separable data, SVMs map the input to a higher dimensional where a linear hyper plane can be obtained. The SVM paradigm requires preliminaries listed below
·
The function space
should have countable basis. Dot product has to be defined in the space. The
space can be infinite dimensional. (RKHS)
·
There exists a unique,
symmetric, positive definite kernel for a RKHS called the reproducing kernel
that maps a cross product of d dimensional spaces or their subsets to the real
number space. The kernel possesses reproducing property
·
A symmetric positive
definite kernel can be expressed as a dot product of implicit non linear
projections (Mercer theorem)
·
A function can be
described as a linear combination of eigen functions in the function space
·
The solution to the
Tikhonov Regularization problem of minimizing a loss function with a
regularization on the norm of the function can be expressed as
o f(x) = Si aiK(x,xi) (Representer Theorem)
· A special case of Tikhonov
Regularization problem having Hinge loss as the loss function with the norm
defined on RKHS results in the problem of Support Vector Machines
Several kernels like linear,
polynomial and gaussian radial basis functions have been shown to satisfy both
the reproducing property and Mercer theorem.
A
one-dimensional HMM may be generalized to give it the appearance of a
two-dimensional structure, by allowing each state in the one dimensional HMM to
be an HMM. In this way, the HMM consists of a set of states called super-states
along with a set of embedded states. Transitions between embedded
states in different super states are not allowed. The elements of the HMM
topology and related parameters are listed below
Besides these, we also used gaussian mixture models, one dimensional hmms and neural networks for classification purposes.
The
following subsections describe our study on each of the classification
problems.
In
our study for gender classification, the original images were down sampled to
32x32 images. Image regions that discriminated gender from other classes were
taken into account and image vectors were formed. Class priors were estimated
from the training dataset. Given the fair distribution in the priors, principal
component analysis was performed on the set of vectors. The first hundred
components with the highest eigen values were considered. A total of 1997
images were used in our experiments. For each classification technique, we used
5-fold cross validation over the training set, i.e, 1/5th of the
training set was used as a validation set and the process was repeated with
non-overlapping rotations.
The SVM classifier was used with
various kernels; linear, polynomial and gaussian RBF kernels. For each of these
kernels, we needed to determine the regularization parameter C at controls
the tradeoff for empirical risk minimization and the norm of the function. A
larger value of the parameter takes care of the empirical loss more than the
norm of the function. This results in over fitting and longer training time. On
the other hand, a smaller value will restrict the norm of the function and
hence over generalize the problem. The value of C was estimated using the
validation set. In the case of polynomial classifiers, the degree of the
polynomial was estimated using the same CV technique.
The following table reports average
test error for each of the kernel methods. The error rates are evaluated using
parameters estimated from cross validation
Kernel Methods |
Regularization Parameter |
Standard Deviation Parameter |
Average Cross Validation Accuracy |
Average Accuracy on the test set |
Gaussian RBF Cubic Polynomial Kernel Quadratic Polynomial Kernel Linear Kernel |
C = 19 C = 27 C = 33 C = 40 |
s =
13.42
-
-
- |
83.04% 79.7% 76.2% 71% |
80.6% 74.36% 72.41%
69.3% |
HMM based classifiers were also studied. A five super state two-dimensional embedded HMM was trained on faces cropped from the images for each of the classes. A 10x10 window was used to scan the images top to bottom and left to right. For each 10x10 window, 2D-DCT coefficients were calculated resulting 3x3 observation vector per block. The HMMs were trained using OpenCV Embedded HMM toolkit. The average test set accuracy is reported in the following table.
|
Average Test Set Accuracy |
HMM without priors
HMM with priors |
70.1%
63% |
Linear as well as quadratic density based
classifiers were tried on the training dataset.
The age classification problem is a multi class problem. We used SVMFu, a multi class trainable SVM package, to train support vector machines by one-vs.-all as well as all-pairs approach. The software reports average leave-one-out error over the training dataset. We used the leave-one-out bound to decide on the regularization parameters. Linear as well as polynomial kernels were used to train the multi class SVMs. Due to the uneven distribution of the priors, Gaussian RBF kernels overfitted the training data.
Kernel Methods |
Regularization Parameter |
Standard Deviation Parameter |
Average Cross Validation Accuracy |
Average Accuracy on the test set |
Gaussian RBF Cubic Polynomial Kernel Quadratic Polynomial Kernel Linear Kernel |
C = 5 C = 14 C = 17 C = 22 |
s = 23.9
-
-
- |
95.05% 91.7% 89.2%
87.04% |
86.7% (Baseline) 86.3% 85.63%
83.2% |
Linear Discriminant Analysis was performed on the 64x64 images taking into account the regions chosen by the feature selection algorithms. A 4-class problem resulted in 3 dimension feature vectors. Density based classifiers were studied on the resulting feature vectors. Gaussian mixture models were trained on the resulting dataset. 5-fold cross validation was used to determine the number of mixtures, same as in gender classification. Embedded HMMs were also trained for each of the classes. The HMMs were trained using the same datset as above with the same topology.
|
Number of mixtures |
Average Test Set Accuracy |
Gaussian Mixture Models with priors
HMMs with priors |
C = 3 |
85.8%
86.7% (Baseline accuracy) |
A
multi class svm problem
Uneven
priors
Linear
discriminant analysis yielding 4 dimensional feature vectors
Kernel Methods |
Regularization Parameter |
Standard Deviation Parameter |
Average Cross Validation Accuracy |
Average Accuracy on the test set |
Gaussian RBF Cubic Polynomial Kernel Quadratic Polynomial Kernel Linear Kernel |
C = 4 C = 16 C = 20 C = 23 |
s = 19
-
-
- |
86.05% 87.7% 85.24%
81.12% |
85.7% (Baseline) 85.9% 82.63%
78.2% |
|
Number of mixtures |
Average Test Set Accuracy |
Gaussian Mixture Models with priors
HMMs with priors |
C = 6 C = 3 |
85.4% 85% |
Among the facial properties studied, the feature
selection algorithms grounded moustache and glasses satisfactorily. The rest of
the facial properties such as bandana and beard were left ungrounded. The
following tables report experimental results on classification of moustaches,
glasses and hats
|
Average
Test Set Accuracy |
LDA and
Gaussian Mixture Models One
dimensional HMM Embedded
HMM |
97.06% 99.2% 78.8% |
.
|
Average
Test Set Accuracy |
Gaussian
Mixture Models and LDA One
dimensional HMM Embedded
HMM |
99.1% 99.1%
99% |
.
|
Average
Test Set Accuracy |
Gaussian
Mixture Models and LDA One
dimensional HMM Embedded
HMM |
99.56%
99.43%
99.6% |
.
Final
Results
GENDER – 80.6% SVM with Gaussian RBF
kernel / 73.9% with simple classifier AGE – 85.8% below baseline with
quadratic kernel / 84.4% with simple classifier ETHNICITY – 85.9% with cubic kernel SVM
/ 85.3% with simple classifier EXPRESSION – 80% with simple classifier MOUSTACHE – 97.9% with one-dimensional
HMM/95% with simple classifier GLASS – 99.6% with embedded HMM/ 98.2% with simple classifier BANDANA – 99.5% with simple classifier HAT – 99.1% with one-dimensional
HMM/97.7% with simple classifier |
|
Next page: Discussion & future directions