Features
What features best describe this dataset? Since this is a somewhat new problem, they are not
set in stone. One point to consider while doing this, though, is that all features must be
rotationally
invariant along the time axis. This is because 16 different grids may all represent
the same rhythm, beginning on a different phase. When we listen to rhythms repeating,
the mind and body work together to get into the groove (or not), and assign the strongest
part of the sound to the first step in the grid. For example, all three grids below
sound exactly the same after a few repetitions:
clap X . . X . . X . . . . . . . . .
snare . . . . . . . . X . . . . . X .
kick X . . . X . . . X . X . X . . .
clap . X . . X . . X . . . . . . . .
snare . . . . . . . . . X . . . . . X
kick . X . . . X . . . X . X . X . .
clap . . X . . X . . X . . . . . . .
snare X . . . . . . . . . X . . . . .
kick . . X . . . X . . . X . X . X .
Density
After rating a number of rhythms, some features came to mind.
For example, a track that was very dense, that is, had a large number of hits out of
the 16, usually sounded bad. Therefore, three features were computed: the density
of each track. In the following grid, the clap track is very dense (13/16). The snare
track is very sparse (2/16), and the kick track is somewhere in between (6/16):
clap X X X X . X X X X . X X X X . X
snare . . . . . . X . . . . . . . . X
kick . . . X X . . X . X . . X X . .
Frequency Domain
Since rhythms are periodic signals, and often seem to embody periodic
signals within themselves, it makes sense to examine them in the
frequency domain. Important point: in this project, the frequency domain representations
of the grids were examined, as opposed to the audio of the actual
drum sounds being played.
Although the grid is two-dimensional, the experience of the sound
is one-dimensional in time. Therefore, the 3 tracks were summed together
to create a 16 step discrete signal, whose frequency domain was examined.
The conversion to the frequency domain was done by replacing drum hits with
1's and silences with 0's. Then, the three tracks were summed in the
vertical direction. For example:
clap X . . . . . . . . . . . X . . X
snare X . X . . . . . X . X . X . . .
kick X . X . X X X X . . . . X . . .
clap 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
snare 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0
kick 1 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0
sum 3 0 2 0 1 1 1 1 1 0 1 0 3 0 0 1
fd=fft([3 0 2 0 1 1 1 1 0 0 1 0 3 0 0 1]);
Since the signal "sum" is purely real, and the DC offset is accounted
for in the track densities,
the only relavent components are the 1st through 8th harmonics.
Harmonic Centroid
In addition to 8 features for each harmonic, several more
metrics were generated. In order to differentiate rhythms
with lots of high frequency components from those with low
frequency components, a harmonic centroid feature was computed.
This is the sum of the magnitudes of each harmonic multiplied by its index.
Sub Magnitude
Since rhythms often seem to embody subrhythms of
half the length of their fundamental period, I computed a metric
called "submags," which is the contribution of overtones 2, 4
and 8 divided by the sum of all 8 harmonic magnitudes.
Phase Correlation
It is clear that, although the phase of the overall rhythm
is irrelavent, the phase of each track with respect to the other
tracks is important. For example, the two following grids have
quite different sounds. The only differences between
them are the relative phases between the snare and clap tracks.
clap X . . . X . . . X . . X . . X .
snare X . . . X . . . X . . . X . . .
kick X . . . X . . . X . . . X . . .
|
clap X . X . . . X . . . X . . X . .
snare . X . . . X . . . X . . . X . .
kick X . . . X . . . X . . . X . . .
|
Therefore, the frequency domain representation of each of the
tracks was taken. The angles of their first eight harmonics
were computed. A feature called Phase Correlation was generated,
which is the sum of the three shortest paths between
the angles of each harmonic component. The phase correlation
of the pattern on the left is higher than the one on the right.
Dissonance
In 1969, Kameoka and Kuriyagawa proposed an algorithm for calculating
sensory dissonance of audio waveforms from their frequency domain
representation. Essentially, it is a weighted sum of products of
harmonic magnitudes whose freqency ratios are less than 2. Since some
people, include the author, believe there is a connection between the
mind's perception of harmony and rhythm, I decided to include this
as a feature. The weights K&K used correspond to measured values
having to do with the human ear. Since this is a different domain,
I eschewed these. Given the time constraint, I made them all equal,
although with more research time, I would like to explore the optimization
of this feature.
Repetitiveness
While rating the rhythms for this project, it became clear that
some patterns with large numbers of repetition were unpleasing.
Therefore a metric was computed which represented how many times
patterns of beats were sequentially repeated.
State Probability
While examining the results of Markov Modeling, a new feature,
a variation on repetitiveness, came to mind: the probability
density of each "state", where state refers to a number from
0-7 representing which drum hits were activated simultaneously.
Each track has a power of two weight. Kick=1, Snare=2, Clap=4.
For example, if all drums are sounded at once, that state is 7.
Silence corresonds to 0.
Given the expected probability distribution of each track,
deviations can be measured. Specifically, at the time of
generation, each drum sound at each time step has a 1/3
chance of being enabled. Therefore, the expected probability density
of each state is as follows:
state
|
calculation
|
answer
|
0 | (2/3)^3*(1/3)^0 | .2963 |
1 | (2/3)^2*(1/3)^1 | .1481 |
2 | (2/3)^2*(1/3)^1 | .1481 |
3 | (2/3)^1*(1/3)^2 | .0741 |
4 | (2/3)^2*(1/3)^1 | .1481 |
5 | (2/3)^1*(1/3)^2 | .1481 |
6 | (2/3)^1*(1/3)^2 | .1481 |
7 | (2/3)^0*(1/3)^3 | .0370 |
Probabilities for all 8 states were considered as features.
Final Thoughts on Features
Rhythms are very sensitive things. Changing them ever so slightly may
completely ruin an otherwise excellent drum beat. But it might not. Most
of the features considered are rather linear. That is, small changes to the grid, like an
extra drum hit, or moving one to the side, cause small changes in most
of the features. Yet, there is a strong possibility that such a small
change will move a pattern from one class to the next.
The most linear features are the track densities. The least linear are
state probability and phase correlation. The frequency domain features
are somewhere in between. Given more time, I would have liked to explore
more features, especially complex ones like "intra-interval distribution"
and autocorrelation. But alas, this project is not about features, it is
about classification, so let's move on!
--->Bayesian Classification
Index