Fischer Linear Discriminant
The FLD technique is primarily a way to reduce dimensions. It does this
such that the distributions of the reduced data are maximally separate for Gaussian
criteria. In other words, the new means of the two reduced sets are as far apart as possible,
while maintaining the new variances as small as possible. Reducing to a single dimension for classification,
which I did here, is probably the simplest application of Fischer Discriminants, although it is
possible to reduce to other dimensions as well.
I was pleasantly surprised by this technique. I underestimated it completely. I
thought it would perform only slightly better than Bayesian classification. This is because
following the dimension reduction, a conventional Bayes classification takes place. True, the FLD technique
improves the separation, but I did not anticipate it would be so good. In fact, this turned out
to be the most accurate classifier.
As usual, I began by surveying the features against the validation data one at a time. I was impressed by how many
of them (22 out of 24) were higher than the priors:
rate | Feat # |
60.4167 | 7 |
60.4167 | 21 |
62.5000 | 14 |
64.5833 | 5 |
64.5833 | 17 |
66.6667 | 1 |
66.6667 | 2 |
66.6667 | 3 |
66.6667 | 4 |
66.6667 | 6 |
66.6667 | 8 |
66.6667 | 9 |
66.6667 | 10 |
66.6667 | 11 |
66.6667 | 12 |
66.6667 | 13 |
66.6667 | 15 |
66.6667 | 16 |
66.6667 | 18 |
66.6667 | 20 |
66.6667 | 22 |
66.6667 | 23 |
66.6667 | 24 |
68.7500 | 19 |
Improved Feature Separation
FLD improves separation of otherwise lousy features.
For example,
remember the track density features examined in detail in the Bayesian classification section?
FLD improved them significantly:
Feature | Bayes Class. Rate w/o FLD | Bayes w/ FLD |
Kick Density | 60.66% | 66.66% |
Snare Density | 60.66% | 66.66% |
Clap Density | 60.66% | 66.66% |
One can also see this in the graphs of their distributions. The FLD transformed the
datasets into more separable ones. This is most clear in features 2 and 3, although not
necessarily in feature 1. Also notice the vertical axis. The pre-FLD distributions
had maximum heights at around 4. The FLD transform significantly narrowed the
distributions, lowering the variances radically:
![](img/fif01.jpg) |
Feature 1 after FLD |
![](img/fif02.jpg) |
Feature 2 after FLD |
![](img/fif03.jpg) |
Feature 3 after FLD |
Next, I selected groups of potentially useful features using forward, backward and random
processes and tested them:
feature | Fwd | Bkwd | Rand 1 | Rand 2 | Rand 3 |
1 | 1 | 0 | 1 | 0 | 0 |
2 | 1 | 0 | 1 | 1 | 1 |
3 | 1 | 0 | 1 | 0 | 1 |
4 | 0 | 0 | 0 | 0 | 1 |
5 | 1 | 0 | 1 | 0 | 1 |
6 | 1 | 0 | 0 | 1 | 0 |
7 | 1 | 0 | 0 | 1 | 1 |
8 | 1 | 1 | 1 | 1 | 1 |
9 | 0 | 0 | 1 | 0 | 0 |
10 | 0 | 1 | 0 | 1 | 0 |
11 | 1 | 0 | 1 | 0 | 0 |
12 | 1 | 1 | 1 | 0 | 1 |
13 | 1 | 0 | 0 | 0 | 1 |
14 | 0 | 0 | 0 | 0 | 0 |
15 | 0 | 0 | 0 | 1 | 0 |
16 | 1 | 1 | 1 | 0 | 1 |
17 | 0 | 0 | 1 | 1 | 0 |
18 | 1 | 1 | 0 | 1 | 1 |
19 | 0 | 0 | 0 | 1 | 0 |
20 | 1 | 1 | 1 | 0 | 0 |
21 | 0 | 1 | 1 | 1 | 0 |
22 | 0 | 1 | 1 | 1 | 0 |
23 | 1 | 1 | 0 | 1 | 0 |
24 | 0 | 1 | 1 | 0 | 1 |
Validation Rate | 75.00% | 66.66% | 64.5833% | 72.9167% | 77.08% |
Testing Rate | 60.3% | 58.1% | 61.0% | 56.6% | 61.03% |
Improved Separation
In the following five graphs, I graph the distributions of the classes after
FLD reduction:
![](img/fi_c1.jpg) |
Forward Selected Distributions |
![](img/fi_c2.jpg) |
Backward Selected Distributions |
![](img/fi_c3.jpg) |
Random #1 Selected Distributions |
![](img/fi_c4.jpg) |
Random #2 Selected Distributions |
![](img/fi_c5.jpg) |
Random #3 Selected Distributions |
Cheating
Just out of curiousity, I executed the random feature seeking
script against the testing data in order to find the theoretical
maximum possible classification. It reached a maximum value of
65.44%. This result can not be counted toward actual classification,
it is just a reference number that suggests how much better the
classification might be able to get with more data.
Results
The correct classification rate for the group of features
that tested highest against the validation data is 61.03%.
--->k Nearest Neighbor
Index