k Nearest Neighbor

Examining the graphs of features against each other, one can make an observation: Although the features are very entangled, there are indeed some local pockets where one color prevails. This implies that although some slightly differing rhythms get counterclassified, some method might be able to correctly classify the ones that are similar enough that they are in the same class.

For example, in the following graph, near the top, there is a patch of red x's, and on the right, there are several patches of green o's. Also, in the bottom left are some green o's.

12 vs. 9

This suggests that classification techniques that rely on data around a point could be useful. K Nearest Neighbor and Parzen windows come to mind. Both are capable of generating extremely complex decision boundaries with relatively little computing resources. First, I looked at kNN.

As usual, I swept through the features independently to find which ones might contribute the most. I was pleased to note that many of them classified better than the Priors. I had not had such luck so far on this project. In fact, the top feature with kNN was nearly as good as the carefully-sifted combination using Bayes. It was also intriguing to see the top 5 features were 4, 12, 15, 16 and 17.

Classification Rate Feature #
57.4468 19
57.4468 22
57.4468 24
59.5745 1
59.5745 2
59.5745 14
59.5745 21
59.5745 23
61.7021 3
61.7021 9
61.7021 11
61.7021 13
63.8298 5
63.8298 7
63.8298 10
63.8298 18
63.8298 20
65.9574 6
65.9574 8
65.9574 17
68.0851 4
68.0851 12
68.0851 15
70.2128 16

In this graph, you can see the best two features plotted against each other. You will probably see lots of good local regions. Since these graphs represent my personal preferences, when I look at this I sometimes think, "This is what my brain looks like."

12 vs. 15

Ironically enough, that is when I realized that the axes were off. If one were to examine two points that were, for example, distance 5 away from each other in the horizontal direction, that would be very far, but in the vertical direction, it is not. Therefore, I decided to pre-process the features for kNN classification. I normalized them by dividing by the square root of their variance and subtracting their means. This makes it easier to connect the scaled graph with the results.

The rightmost column in the table below indicates that the classification rate was a little higher using this improved distance metric. Also interesting to note that the optimal k value for that result is k=1. The optimal k was much larger when the axes were uncorrected. All of these results are against the validation data.

Feat. # rand init fwd int bwd rand, norm.
1 0 1 1 0
2 0 1 0 0
3 0 1 1 0
4 1 1 1 1
5 0 0 1 0
6 0 0 0 1
7 0 1 1 0
8 0 0 0 1
9 0 0 1 0
10 0 0 1 1
11 0 0 0 1
12 0 1 1 1
13 0 0 0 1
14 0 0 1 0
15 1 1 1 1
16 1 1 1 0
17 0 0 1 0
18 0 0 1 0
19 0 0 0 0
20 0 0 0 1
21 0 0 0 0
22 0 0 0 1
23 0 0 1 0
24 0 0 0 0
Rate 76.60% 76.60% 72.34% 77.083%


Overfitting

Of course, the feature selection scheme may have led to overfitting--- when classifying the testing data using the best features above, the results were much less exciting: 59.6%.

This may have been improved with a better window function.

--->Parzen Windows

Index