Features

What features best describe this dataset? Since this is a somewhat new problem, they are not set in stone. One point to consider while doing this, though, is that all features must be rotationally invariant along the time axis. This is because 16 different grids may all represent the same rhythm, beginning on a different phase. When we listen to rhythms repeating, the mind and body work together to get into the groove (or not), and assign the strongest part of the sound to the first step in the grid. For example, all three grids below sound exactly the same after a few repetitions:

clap   X . . X . . X . . . . . . . . . 

snare  . . . . . . . . X . . . . . X . 

kick   X . . . X . . . X . X . X . . . 

clap   . X . . X . . X . . . . . . . . 

snare  . . . . . . . . . X . . . . . X 

kick   . X . . . X . . . X . X . X . . 

clap   . . X . . X . . X . . . . . . . 

snare  X . . . . . . . . . X . . . . . 

kick   . . X . . . X . . . X . X . X . 

Density

After rating a number of rhythms, some features came to mind. For example, a track that was very dense, that is, had a large number of hits out of the 16, usually sounded bad. Therefore, three features were computed: the density of each track. In the following grid, the clap track is very dense (13/16). The snare track is very sparse (2/16), and the kick track is somewhere in between (6/16):

clap   X X X X . X X X X . X X X X . X 

snare  . . . . . . X . . . . . . . . X 

kick   . . . X X . . X . X . . X X . . 

Frequency Domain

Since rhythms are periodic signals, and often seem to embody periodic signals within themselves, it makes sense to examine them in the frequency domain. Important point: in this project, the frequency domain representations of the grids were examined, as opposed to the audio of the actual drum sounds being played.

Although the grid is two-dimensional, the experience of the sound is one-dimensional in time. Therefore, the 3 tracks were summed together to create a 16 step discrete signal, whose frequency domain was examined. The conversion to the frequency domain was done by replacing drum hits with 1's and silences with 0's. Then, the three tracks were summed in the vertical direction. For example:

clap   X . . . . . . . . . . . X . . X 

snare  X . X . . . . . X . X . X . . . 

kick   X . X . X X X X . . . . X . . . 

clap   1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 

snare  1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 

kick   1 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0 

sum    3 0 2 0 1 1 1 1 1 0 1 0 3 0 0 1 

fd=fft([3 0 2 0 1 1 1 1 0 0 1 0 3 0 0 1]); 
Since the signal "sum" is purely real, and the DC offset is accounted for in the track densities, the only relavent components are the 1st through 8th harmonics.

Harmonic Centroid

In addition to 8 features for each harmonic, several more metrics were generated. In order to differentiate rhythms with lots of high frequency components from those with low frequency components, a harmonic centroid feature was computed. This is the sum of the magnitudes of each harmonic multiplied by its index.

Sub Magnitude

Since rhythms often seem to embody subrhythms of half the length of their fundamental period, I computed a metric called "submags," which is the contribution of overtones 2, 4 and 8 divided by the sum of all 8 harmonic magnitudes.

Phase Correlation

It is clear that, although the phase of the overall rhythm is irrelavent, the phase of each track with respect to the other tracks is important. For example, the two following grids have quite different sounds. The only differences between them are the relative phases between the snare and clap tracks.

clap   X . . . X . . . X . . X . . X . 

snare  X . . . X . . . X . . . X . . . 

kick   X . . . X . . . X . . . X . . . 

clap   X . X . . . X . . . X . . X . . 

snare  . X . . . X . . . X . . . X . . 

kick   X . . . X . . . X . . . X . . . 

Therefore, the frequency domain representation of each of the tracks was taken. The angles of their first eight harmonics were computed. A feature called Phase Correlation was generated, which is the sum of the three shortest paths between the angles of each harmonic component. The phase correlation of the pattern on the left is higher than the one on the right.

Dissonance

In 1969, Kameoka and Kuriyagawa proposed an algorithm for calculating sensory dissonance of audio waveforms from their frequency domain representation. Essentially, it is a weighted sum of products of harmonic magnitudes whose freqency ratios are less than 2. Since some people, include the author, believe there is a connection between the mind's perception of harmony and rhythm, I decided to include this as a feature. The weights K&K used correspond to measured values having to do with the human ear. Since this is a different domain, I eschewed these. Given the time constraint, I made them all equal, although with more research time, I would like to explore the optimization of this feature.

Repetitiveness

While rating the rhythms for this project, it became clear that some patterns with large numbers of repetition were unpleasing. Therefore a metric was computed which represented how many times patterns of beats were sequentially repeated.

State Probability

While examining the results of Markov Modeling, a new feature, a variation on repetitiveness, came to mind: the probability density of each "state", where state refers to a number from 0-7 representing which drum hits were activated simultaneously. Each track has a power of two weight. Kick=1, Snare=2, Clap=4. For example, if all drums are sounded at once, that state is 7. Silence corresonds to 0.

Given the expected probability distribution of each track, deviations can be measured. Specifically, at the time of generation, each drum sound at each time step has a 1/3 chance of being enabled. Therefore, the expected probability density of each state is as follows:

state calculation answer
0 (2/3)^3*(1/3)^0 .2963
1 (2/3)^2*(1/3)^1 .1481
2 (2/3)^2*(1/3)^1 .1481
3 (2/3)^1*(1/3)^2 .0741
4 (2/3)^2*(1/3)^1 .1481
5 (2/3)^1*(1/3)^2 .1481
6 (2/3)^1*(1/3)^2 .1481
7 (2/3)^0*(1/3)^3 .0370

Probabilities for all 8 states were considered as features.

Final Thoughts on Features

Rhythms are very sensitive things. Changing them ever so slightly may completely ruin an otherwise excellent drum beat. But it might not. Most of the features considered are rather linear. That is, small changes to the grid, like an extra drum hit, or moving one to the side, cause small changes in most of the features. Yet, there is a strong possibility that such a small change will move a pattern from one class to the next.

The most linear features are the track densities. The least linear are state probability and phase correlation. The frequency domain features are somewhere in between. Given more time, I would have liked to explore more features, especially complex ones like "intra-interval distribution" and autocorrelation. But alas, this project is not about features, it is about classification, so let's move on!

--->Bayesian Classification

Index