My thought when choosing this model was that a rhythm sometimes
transitions through several phases over time. For example, it might
spend a few steps in one state alternating between kicks and silence, then
switch to a new state that consists mostly of snares and kick+snare+clap.
That second state is more likely to happen following some states
than others:
75% probability
clap . . . . . . . X X . X X . X . .
snare . . . . . . X X X X X . X X . .
kick X . X . X . . X X . X . . X . .
25% probability
clap . . . . . . X . X X X . X X . .
snare . . . . . . X . X X X . X X . .
kick X . X . X . . . . . . . . . . .
Also, I believed that the concept of an internal state was similar to that of a human percussion player. Inside a musician's mind, the thoughts may mimic that of an HMM pattern. For example, "There's been enough of these drums for a bit, let's switch to some others."
Unfortunately, the HMM is still vulnerable to the problems of MMs in this context. It has no idea when a pattern is coming to a close, so it can not properly repeat patterns. Also, it is vulnerable to the mutation problem (mentioned in the sections on MMs).
Finally,
and perhaps worst of all, within an internal state, it has no idea
what the pattern is. For example, it can not tell the difference
between these two rhythms:
clap . . . . X . . . X . X . X . X .
snare . . . X X X . . . X . X . X . X
kick X . X X X X X . X . X . X . X .
state 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
symbol 1 0 1 3 7 3 1 0 5 2 5 2 5 2 5 2
clap . . X . . . . . X X X X . . . .
snare . . X X X . . . . . . . X X X X
kick X . X X X X X . X X X X . . . .
state 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2
symbol 1 0 7 3 3 1 1 0 5 5 5 5 2 2 2 2
a = 7/8 1/8 1/8 7/8
symbol density 0 2/8 1 3/8 3 2/8 7 1/8
symbol density 2 1/2 5 1/2
Other issues with HMMs: the number of internal states is not obvious and it can take a long time for them to train.
Given these limitations, I gave them a very brief treatment. I trained only once on each class with 8 internal states. I picked 8 because it was equal to the number of transition states in the earlier MM models, so it could be crudely compared to them. Also, it trained within a reasonable time (about 1/2 hour per class).
Each class was trained until all numbers in all matrices converged to within 5 percent of their previous values. This typically required about 28 iterations.
Given that the number of transition states was equal to the earlier MM models, I anticipated that this model would produce somewhat better results than MM, because it had at least as much information encoded into it.
However, when I looked at the final results for b,
the symbol distribution matrix, I was disappointed. In
both classes, extremely low-looking values were found in
several columns. The model had converged to a local
minimum in which states 4,7 and 8 in class 1, and 2,
6, 7 and 8 in class 2 were near zero. The lowest
value in class 1's b was 10^-7. In class 2 it was
10^-9. I do not believe this could actually represent
the symbol distribution:
mb(:,:,1) = 0.3260 0.0869 0.4682 0.0000 0.1181 0.0008 0.0000 0.0000 0.4737 0.1128 0.3293 0.0000 0.0840 0.0002 0.0000 0.0000 0.7893 0.0270 0.1119 0.0000 0.0718 0.0000 0.0000 0.0000 0.6469 0.0531 0.2775 0.0000 0.0214 0.0011 0.0000 0.0000 0.6442 0.0479 0.2277 0.0000 0.0800 0.0001 0.0000 0.0000 0.2791 0.0986 0.5953 0.0000 0.0267 0.0004 0.0000 0.0000 0.8248 0.0315 0.1259 0.0000 0.0173 0.0004 0.0000 0.0000 0.4712 0.0522 0.4056 0.0000 0.0707 0.0002 0.0000 0.0000 mb(:,:,2) = 0.9017 0.0000 0.0483 0.0236 0.0263 0.0000 0.0000 0.0000 0.8075 0.0000 0.0217 0.0561 0.1147 0.0000 0.0000 0.0000 0.7639 0.0000 0.0570 0.0906 0.0885 0.0000 0.0000 0.0000 0.8810 0.0000 0.0213 0.0542 0.0434 0.0000 0.0000 0.0000 0.1728 0.0000 0.7097 0.0280 0.0894 0.0000 0.0000 0.0000 0.7960 0.0000 0.0697 0.0758 0.0584 0.0000 0.0000 0.0000 0.3455 0.0001 0.3376 0.2912 0.0255 0.0000 0.0000 0.0000 0.8222 0.0000 0.1581 0.0154 0.0043 0.0000 0.0000 0.0000
ma(:,:,1) = 0.0803 0.0917 0.1472 0.1020 0.1437 0.1145 0.1973 0.1234 0.0849 0.0995 0.1504 0.0995 0.1475 0.1058 0.1922 0.1202 0.1278 0.1192 0.1229 0.0872 0.1412 0.1359 0.1321 0.1337 0.0851 0.1038 0.1574 0.1000 0.1496 0.0956 0.1947 0.1138 0.1089 0.1103 0.1340 0.0924 0.1448 0.1243 0.1555 0.1298 0.0513 0.0771 0.1848 0.1079 0.1469 0.0696 0.2656 0.0967 0.1074 0.1157 0.1429 0.0931 0.1487 0.1086 0.1619 0.1218 0.0858 0.0975 0.1509 0.0994 0.1472 0.1076 0.1897 0.1219 ma(:,:,2) = 0.1919 0.1731 0.1612 0.1553 0.0569 0.1427 0.0294 0.0893 0.2033 0.1859 0.1534 0.1685 0.0341 0.1446 0.0212 0.0892 0.2105 0.1639 0.1593 0.1595 0.0421 0.1418 0.0220 0.1009 0.1970 0.1836 0.1591 0.1660 0.0394 0.1437 0.0246 0.0866 0.1612 0.0743 0.0964 0.0826 0.2842 0.1061 0.0379 0.1574 0.2002 0.1665 0.1527 0.1541 0.0561 0.1434 0.0266 0.1003 0.1994 0.1015 0.1122 0.1111 0.1438 0.1266 0.0317 0.1738 0.1817 0.1458 0.1516 0.1347 0.1090 0.1364 0.0382 0.1026
Despite all of this, the results were an improvement
on the 1st and 2nd order MMs. The following table
shows testing against all three datasets:
Dataset | Classification Rate |
Training | 58.1% |
Validation | 68.8% |
Testing | 58.6% |