Hidden Markov Model

The Hidden Markov Model (HMM) is a whole step more complex than MMs. There are still state transitions, but instead of each model state representing the drum sound state, it represents a symbol distribution. The distribution of symbols is the distribution of drum sounds.

My thought when choosing this model was that a rhythm sometimes transitions through several phases over time. For example, it might spend a few steps in one state alternating between kicks and silence, then switch to a new state that consists mostly of snares and kick+snare+clap. That second state is more likely to happen following some states than others:

75% probability

clap   . . . . . . . X X . X X . X . .

snare  . . . . . . X X X X X . X X . .

kick   X . X . X . . X X . X . . X . .

25% probability

clap   . . . . . . X . X X X . X X . .

snare  . . . . . . X . X X X . X X . .

kick   X . X . X . . . . . . . . . . .

Also, I believed that the concept of an internal state was similar to that of a human percussion player. Inside a musician's mind, the thoughts may mimic that of an HMM pattern. For example, "There's been enough of these drums for a bit, let's switch to some others."

Unfortunately, the HMM is still vulnerable to the problems of MMs in this context. It has no idea when a pattern is coming to a close, so it can not properly repeat patterns. Also, it is vulnerable to the mutation problem (mentioned in the sections on MMs).

Finally, and perhaps worst of all, within an internal state, it has no idea what the pattern is. For example, it can not tell the difference between these two rhythms:

clap   . . . . X . . . X . X . X . X .

snare  . . . X X X . . . X . X . X . X

kick   X . X X X X X . X . X . X . X .

state  1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

symbol 1 0 1 3 7 3 1 0 5 2 5 2 5 2 5 2

clap   . . X . . . . . X X X X . . . .

snare  . . X X X . . . . . . . X X X X

kick   X . X X X X X . X X X X . . . .

state  1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

symbol 1 0 7 3 3 1 1 0 5 5 5 5 2 2 2 2

It can not tell the differnce, because in both rhythms, the state transition probabilities are identical:

a =  7/8 1/8
     1/8 7/8

The symbol distributions are identical, too: The symbol distribution in state 1 is:

symbol  density
  0       2/8
  1       3/8
  3       2/8
  7       1/8

In state 2:

symbol  density  
  2       1/2
  5       1/2

Other issues with HMMs: the number of internal states is not obvious and it can take a long time for them to train.

Given these limitations, I gave them a very brief treatment. I trained only once on each class with 8 internal states. I picked 8 because it was equal to the number of transition states in the earlier MM models, so it could be crudely compared to them. Also, it trained within a reasonable time (about 1/2 hour per class).

Each class was trained until all numbers in all matrices converged to within 5 percent of their previous values. This typically required about 28 iterations.

Given that the number of transition states was equal to the earlier MM models, I anticipated that this model would produce somewhat better results than MM, because it had at least as much information encoded into it.

However, when I looked at the final results for b, the symbol distribution matrix, I was disappointed. In both classes, extremely low-looking values were found in several columns. The model had converged to a local minimum in which states 4,7 and 8 in class 1, and 2, 6, 7 and 8 in class 2 were near zero. The lowest value in class 1's b was 10^-7. In class 2 it was 10^-9. I do not believe this could actually represent the symbol distribution:

mb(:,:,1) =

    0.3260    0.0869    0.4682    0.0000    0.1181    0.0008    0.0000    0.0000
    0.4737    0.1128    0.3293    0.0000    0.0840    0.0002    0.0000    0.0000
    0.7893    0.0270    0.1119    0.0000    0.0718    0.0000    0.0000    0.0000
    0.6469    0.0531    0.2775    0.0000    0.0214    0.0011    0.0000    0.0000
    0.6442    0.0479    0.2277    0.0000    0.0800    0.0001    0.0000    0.0000
    0.2791    0.0986    0.5953    0.0000    0.0267    0.0004    0.0000    0.0000
    0.8248    0.0315    0.1259    0.0000    0.0173    0.0004    0.0000    0.0000
    0.4712    0.0522    0.4056    0.0000    0.0707    0.0002    0.0000    0.0000


mb(:,:,2) =

    0.9017    0.0000    0.0483    0.0236    0.0263    0.0000    0.0000    0.0000
    0.8075    0.0000    0.0217    0.0561    0.1147    0.0000    0.0000    0.0000
    0.7639    0.0000    0.0570    0.0906    0.0885    0.0000    0.0000    0.0000
    0.8810    0.0000    0.0213    0.0542    0.0434    0.0000    0.0000    0.0000
    0.1728    0.0000    0.7097    0.0280    0.0894    0.0000    0.0000    0.0000
    0.7960    0.0000    0.0697    0.0758    0.0584    0.0000    0.0000    0.0000
    0.3455    0.0001    0.3376    0.2912    0.0255    0.0000    0.0000    0.0000
    0.8222    0.0000    0.1581    0.0154    0.0043    0.0000    0.0000    0.0000

The values in the a matrices seemed reasonble:

ma(:,:,1) =

    0.0803    0.0917    0.1472    0.1020    0.1437    0.1145    0.1973    0.1234
    0.0849    0.0995    0.1504    0.0995    0.1475    0.1058    0.1922    0.1202
    0.1278    0.1192    0.1229    0.0872    0.1412    0.1359    0.1321    0.1337
    0.0851    0.1038    0.1574    0.1000    0.1496    0.0956    0.1947    0.1138
    0.1089    0.1103    0.1340    0.0924    0.1448    0.1243    0.1555    0.1298
    0.0513    0.0771    0.1848    0.1079    0.1469    0.0696    0.2656    0.0967
    0.1074    0.1157    0.1429    0.0931    0.1487    0.1086    0.1619    0.1218
    0.0858    0.0975    0.1509    0.0994    0.1472    0.1076    0.1897    0.1219


ma(:,:,2) =

    0.1919    0.1731    0.1612    0.1553    0.0569    0.1427    0.0294    0.0893
    0.2033    0.1859    0.1534    0.1685    0.0341    0.1446    0.0212    0.0892
    0.2105    0.1639    0.1593    0.1595    0.0421    0.1418    0.0220    0.1009
    0.1970    0.1836    0.1591    0.1660    0.0394    0.1437    0.0246    0.0866
    0.1612    0.0743    0.0964    0.0826    0.2842    0.1061    0.0379    0.1574
    0.2002    0.1665    0.1527    0.1541    0.0561    0.1434    0.0266    0.1003
    0.1994    0.1015    0.1122    0.1111    0.1438    0.1266    0.0317    0.1738
    0.1817    0.1458    0.1516    0.1347    0.1090    0.1364    0.0382    0.1026

Classification Results

Despite all of this, the results were an improvement on the 1st and 2nd order MMs. The following table shows testing against all three datasets:

Dataset	Classification Rate
Training	58.1%
Validation	68.8%
Testing	58.6%

--->Final Results

Index