Markov Model

This is a completely different kind of classifier from all those we have looked at so far. Since it involves states, I thought it would be a magic bullet. The rhythm data are inherently related to time, just like the Markov Model. Considering rhythms for a moment, they consist of drum strikes in a sequence of time. Certain combinations of drum strikes are more common to one class than to another.

Consider the state transition matrices for the good and bad classes. This grid represents the probability that one drum sound will be heard following another. It is easy to see, that for example, transitions from state 0 to state 0 happen more often in bad rhythms than in good ones:

good =

    0.2702    0.3007    0.3064    0.3016    0.2842    0.2853    0.2485    0.2486
    0.1530    0.1608    0.1410    0.1467    0.1554    0.1706    0.1775    0.1568
    0.1694    0.1556    0.1718    0.1467    0.1554    0.1324    0.1420    0.1405
    0.0701    0.0745    0.0731    0.0870    0.0784    0.0706    0.0858    0.0649
    0.1580    0.1451    0.1474    0.1630    0.1421    0.1500    0.1627    0.1784
    0.0722    0.0641    0.0615    0.0571    0.0717    0.0706    0.0828    0.0811
    0.0708    0.0627    0.0641    0.0734    0.0691    0.0882    0.0592    0.0649
    0.0365    0.0366    0.0346    0.0245    0.0438    0.0324    0.0414    0.0649


bad =

    0.3178    0.2933    0.3176    0.2895    0.2934    0.3235    0.3012    0.3463
    0.1505    0.1538    0.1541    0.1654    0.1614    0.1581    0.1352    0.1245
    0.1430    0.1342    0.1645    0.1203    0.1504    0.1342    0.1639    0.1089
    0.0656    0.0871    0.0728    0.0846    0.0747    0.0551    0.0676    0.0739
    0.1567    0.1502    0.1238    0.1523    0.1411    0.1434    0.1639    0.1401
    0.0757    0.0818    0.0662    0.0902    0.0646    0.0735    0.0676    0.0739
    0.0599    0.0631    0.0699    0.0695    0.0729    0.0680    0.0717    0.0739
    0.0308    0.0364    0.0312    0.0282    0.0415    0.0441    0.0287    0.0584

For example, if a rhythm segment happened to have, as its first four steps:

kick, silent, snare, silent

that would get translated into states:
2,1,3,1

To determine which model, that of the "bad" rhythm, or that of the "good" rhythm, the pattern fits best, construct the three transition probabilities for both classes:

Pr[2->1], Pr[1->3], Pr[3->1]

good: .1530 * .3064 * .1694 = .007941336480

bad: .1505 * .3176 * .1430 = 006835228400

Thus, it is more likely the pattern is a "good" rhythm.

Note that since the rhythm is repeated, the 16th transition is from the 16th step to the first step.

Classification Results

I test-classified all three datasets, out of curiousity, because I thought it should have been much better, especially when classifying itself:

Dataset Classification Rate
Training 49.5%
Validation 64.6%
Testing 52.9%

A Possible Improvement

The results are not great. In trying to reason why, I considered that, when hearing a rhythm, sometimes it is not so important which drum gets hit. Instead, it is simply important that at that instant in time, any drum gets hit. From this perspective, a rhythm consists of a repeating sequence of noises with varying amplitudes.

In other words, what I was trying to do was examine the transition of a feature through time. It might be possible, for example, to say that a rhythm consists of a short group of loud sounds, followed by a longer group of quiet ones, more often than other combinations of loud and quiet sounds for brief and long intervals.

At first, this concept sounded like too much additional work for this project. However, I noticed a shortcut that enabled me to report on the results. I simply remapped the states 1-8 into a string that looked like this:

remap=[ 1 2 2 3 2 3 3 4 ];

The significance of that string is that it represents the number of drum sounds sounded simultaneously in each state. In other words, it is a rough measure of amplitude. A beat that goes:

kick+snare, silence, clap, silence

would get translated to:
4, 1, 5, 1 (into state names)

3 1 2 1 (remapping to magnitudes)

Magnitude Classification results

Once again, the results are not very good, even against the training dataset:

Dataset Classification Rate
Training 50.7%
Validation 50.0%
Testing 50.8%

Arbitrary Remapping Classification Results

After thinking about this for a while, I imagined that remapping the signals into a subspace reduced the amount of possible information that could have been encoded into the model. In other words, instead of an 8 state model, I now had a 4 state one.

So, perhaps it was a better idea to remap the 8 states into a new 8 state space with 8 varying levels of amplitude. There are 8! possible such mappings, namely:

[1 2 3 4 5 6 7 8]

[2 1 3 4 5 6 7 8]

[3 1 2 4 5 6 7 8]

[3 2 1 4 5 6 7 8]

[2 3 1 4 5 6 7 8]

[4 2 3 1 5 6 7 8]

[ .... ]

[8 7 6 5 4 3 2 1]

There was almost no time left on the assignment, so rather than explore this carefully, I just looked at them randomly. This was easy, because the Markov Model classifier runs very quickly. I wrote a script that generated three hundred different amplitude remappings, classified with all of them, and saved the best remappings.

Unfortunately, this technique was very prone to overfitting. On the validation dataset, it produced 73% correct classification, but that number dropped to 49.6% on the testing data. The magnitude remappings were:

mags=[0 0 7 0 7 7 1 0]


I decided to repeat the experiment, training the magnitude remapper on the training set instead of the validation set. This resulted in a rate of 57.8% on validation data, which dropped to 51.6% on testing data. The magnitude remapping it chose is:

with mags=[7 4 7 7 7 3 6 6 ]


Conclusion

Overal, the Markov Model in this application was not very effective at classifing the data. Why could that be? I believe the biggest weakness in the model is that it has no idea where in time it is. This is important, because these seven patterns would all get classified the same, yet they sound quite different:
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

Another chronic example is when two beats are similar, but a middle section is shifted. This ruins the rhythm altogether, but only makes a small difference in probabilities. For example, if the states go:

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

That might sound really nice. However, if a pair of mutations occur, causing:

1 2 2 3 4 1 2 3 4 1 2 3 1 2 3 4

then the only changes in the likelihood of the sequence are to remove one transition from 3 to 4, and add one transition from 3 to 1. Those changes could be very small, depending completely on the rest of the training data. Only 1/8 of the transition probabilities have changed, but 5/8 of the rhythm has changed, creating the strong chance that it has changed classes.

Moreover, such a change in this model is indistinguishable from this sequence:

1 2 2 3 4 1 2 3 4 1 2 3 4 1 2 3

Results


Dataset Classification Rate
Training 49.5%
Validation 64.6%
Testing 52.9%

--->2nd Order Markov Model

Index