Automatic Drum Samples Classification

A final project for Pattern Recogintion MAS 622J/1.126J

Eyal Shahar, MIT Media Lab


All quotes are taken from the the sketch "More Cowbell" as performed in "Saturday Night Live"

Background

"I put my pants on just like the rest of you -- one leg at a time. Except, once my pants are on, I make gold records."

Musicians today, both professional and hobbyist, who rely heavily on their computers to make music, usually find themselves with hard drives full of music samples of all sorts. The majority of these samples are of individual drum samples, often called “hits” or “one shots”. Arranging these samples in folders is usually done manually by listening to every sample and moving it into a desired folder. While in the process of making music, retrieval of these samples is done, once more, by tedious auditions of each and every sample. This project is a first step towards making the life of the computer-based musician a little bit easier by automatically classifying these samples and allowing better methods of retrieval.

Objective

"Before we're done here.. y'all be wearing gold-plated diapers."

The goal of this project is to automatically classify drum samples, compare classification techniques and optimal features sets.

Training And Testing Sets

"I gotta have more cowbell, baby!"

The training set consists of 1000 samples, divided to 6 classes: bassdrums, snares, hi-hats, cymbals, tom-toms and claps.
The testing set consists of 1200 samples.
The following table describes the distribution of the sets.

Features

"... The last time I checked, we don't have a whole lot of songs that feature the cowbell."

Most of the feature extraction was done using the MIRtoolbox for Matlab, by the University of Jyväskylä. These are:

In addition, two more features were extracted using custom algorithms:

The Matlab GUI

"... and, Gene - Really explore the studio space this time."

To manage the learning and testing processes, a Matlab GUI was created. It provides quick and intuitive access to feature extraction, loading and saving of the training and testing data sets, saving and loading of classification models, selection of active model, calling the testing and learning routines and graphic visualization of the feature space.

Classification methods

"Let's just do the thing."

Support vector machine

For this method Matlab’s SVM tools were used. The main drawback of this implementation is that with the absence of the optimization package, as in my computer, the algorithm uses a linear kernel.

Six SVM were trained, one for each class, using a one-versus-all approach.

For validation a leave-one-out method was used.

K Nearest Neighbors

custom KNN algorithm was written for this method and was trained to find the optimal K for 1<k<15 for odd values of k. Validation was done using a leave-one-out approach.

Neural Network

Matlab’s neural networks tools were used for this algorithm, testing both one and two hidden layers, with each layers tested for 5 to 10 units. Validation is a part of the toolbox’s features and therefore no additional validation was done, while the MSE as calculated during the learning process was used as a measure of performance to determine the best net and features set configuration.

Feature Selection

"Well, it's just that I find Gene's cowbell playing distracting."

In all the classification methods learning process, forward feature selection was implemented: At first, the algorithm was with one feature as input. The feature that performed best remained in the features set the algorithm was tested again with each of the remaining features as a second feature. This process repeated it self until the performance did not show improvement of over 0.5%.

Results And Performance

"...And I'd be doing myself a disservice and every member of this band, if I didn’t perform the hell out of this!"

The K-nearest neighbors gave the best results with k = 9. The selected features were Brightness, Irregularity, Decay, MFCC 1, MFCC 2, MFCC 3 and MFCC 5.

The neural-network learning algorithm produced a 2 hidden layer network, with 9 and 7 units respectively.

The SVM learning algorithm found these features to be optimal:

The following table and graph show the accuracy of detection of the testing set:



Conclusions

"Guess what? I got a fever! And the only prescription.. is more cowbell!"

Random Insights

Possible Improvements

The following steps can be considered in order to improve recognition results :

Future Work

As stated earlier, this work can be a framework of a system with stronger capabilities, such as:


Final project presentation (.pdf)
Project proposal presentation (.pdf)