Approach to Hand Gesture Recognition


This page contains the detailed description of the methodology followed while exploring two different supervised machine learning algorithms namely Hidden Markov Model and Neural Nets for the hand gesture recognition problem.

Feature extraction and selection.

Since the data have no correspondence between the six markers from one frame to another it is not possible just by looking at the raw data to identify the three fingers or the back of the hand. Therefore the raw points cannot be used as features. Hence a need to extract other features from the data was necessary. After brainstorming, the features that we decided to use those features capture the net movement of the fingers and used a local reference instead of an absolute one. Since it was not possible to find a particular marker, we cannot use traditional approaches like finding distance between two or more finger markers; therefore we came up with an idea to extract features which give us a net movement of the points with respect to a local point on the hand. To find a local reference we observed that the three markers on the back of the hand were arranged in a triangular shape. So we developed a small algorithm which takes into account that the triangular shape of the three markers does not change much from frame to frame. In this way we were able to identify the three markers on the back of the hand, and the centroid of these three points became our local reference.
Following pictures show different features that were extracted.



                                         Figure 1                                              Figure 2                                            Figure 3                                            Figure 4

The feature in figure 1 corresponds to the sum of all the possible distances between markers. This feature has the advantage of not needing to know to where each marker is placed. Figure 2, corresponds to the sum of the distance between the three fingers and the centroid. The feature in figure 3 corresponds to the sum of all distances between the three fingers. Also we extracted 3 more features based on the relative displacement in X, Y and Z directions with respect to the identified centroid for each frame.

Based on these features we explain below the two approaches that we used for recognizing different hand gesture.

Hidden Markov Model

MutiLayer Perceptron


  1. Feature Selection and Feature Extraction is a crucial process of Pattern Recognition problems. Extracting relevant features is a challenge.
  2. This feature selection was made more difficult due to the fact that the experiment did not identify the markers and there was no correspondence from one frame to the next. One solution would be to devise an experiment that is able to do that.
  3. Raw data from real world scenario is difficult to work with.
  4. One can apply variety of models once features are extracted.
  5. Results from HMM and MLP are in contrast with respect to Pushing gesture.
  6. Combination of algorithms can be used to come with a more robust model.