Clustering of Home Based Activities via Simple Sensors
Presentation: (622 Slides)
MAS622J, Fall 2004


Motivation

This part of the project project aims to explore how simple clustering methods can help alleviate an area of complexity in this research that is often taken for granted. Namely, the complexity and expense that arise from having to label activities from these simple sensor firings. The manual labeling process is inexact and may take as much as five hours of labeling per one hour of sensor activations for even the most basic activity labeling. This leads to the strong desire for a method to reduce the amount of data that must be labeled. This project begins to tackle this problem by using the most elementary of semi-supervised techniques, namely cluster and label. The idea is to asses how some of the more basic unsupervised clustering methods cluster the sensor activity. Based on these clusters and a small set of labeled data, we are curious to see if the clusters can be automatically labeled and used with reasonable accuracy. Basically to see if we can find structure in the data. This project is just a first approach to this problem. 

Features

The clustering methods used in this project are well defined and studied. However, there are still some free variables in these algorithms, the selection of which requires an artistic touch. The first such design choice is what features to use with each clustering method. The second design question is how to measure similarity of features. The first question is discussed here, while discussion of the second question is put off until the discussion of the clustering methods used . 

  1. Time
    Two measures of time are relevant to our data. Both features were recorded using seconds as units of measurement.

    - The first has to do with the period of time in which the sensor activation happens. It is reasonable to consider sensor activations that happen close to each other in time belong to the same activity. Specifically, we considered both the start and stop time of sensor activations as a feature.       

    - The second time based feature that was used was the length of a sensor activation. This differs from the first feature in that this feature is independent of when in a day or week a sensor was fired. This duration feature depends only on how long the sensor is activated. It was believed that distinguishing activities may be based on the length of activity of that sensor. For instance, washing hands may have a ten second duration while washing face may take forty or sixty seconds.   

  2. Location
    As with time, two measures of location were considered useful.

    - The first measure was that of the x,y Cartesian coordinates of the location of the sensors. These were extracted from a JPEG top view map of the home where the sensors were placed. The motivation here is that an activity happens with the majority of action in a closed radius location. 

    - The second measure is based on room adjacency. Distance is measured based on the room location of two sensors and the adjacency of the rooms. Thus sensors in the same room are closer than sensors in separate rooms joined by a passage way which are closer than two sensors in two separate rooms not joined by a passage way, and so forth. Distances are pre-computed based on home layout and stored in a lookup matrix.

  3. Frequency
    The final measure is based on frequency of a sensor activation within a defined period of time. Thus if a sensor is fired three times in one minute during an activity, then it has a frequency of three and may be used to help identify an activity.

Approaches

All code is in Matlab. All methods were trained with a set of test data held out during training. 10-fold cross validation was used to estimate variables. 

1. K Means clustering 

Method Overview: K-Means is one of the most basic clustering methods used to solve the an unsupervised classification problem. The procedure follows a simple and easy way to classify a given data set through  k clusters fixed a priori. The main idea is to define k centroids, one for each cluster.  The next step is to take each point belonging to a given data set and associate it to the nearest centroid. After all points have been assigned, the first step is completed. At this point k new centroids are calculated based on the groupings in the last step.  After k new centroids are formed, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until convergence.
 

The main thing we played around with here was the number of clusters and the similarity measures. The two metrics we used were MINKOWSKI distance and WEIGHTED SUM. The motivation behind weighted sum was modeling the competition between certain feature types such as x,y location and period of sensor activation. While both may be good measures, a combination of the two may be insightful, although not with equal contributions.  

 We decided to play with the following free parameters to see how they effected our results:

  1. K: the total number of clusters

  2. W: the weight matrix which influences the nature of the overall grouping We modified the weight matrix to reflect the competitiveness between different feature types. This was done by using the inverse covariance matrix. This is based on the belief that a particular activities are best described by a number of features whose scarcity can greatly effect the overall classifying. The weight vector will help alleviate this problem slightly by scaling each feature by the appropriate part of the covariance of the entire feature matrix .

  3. SM: similarity metric (MINKOWSKI)

    The primary similarity measure used was the MINKOWSKI distance for each feature. This is the generalized family of metrics based in the Euclidean Distance measure, which is the MILOWSKI distance for a two dimensional feature set.

The detailed results show that the best measure seems to be based on location. As would be expected, the k-means tends to be very sensitive to feature scaling issues and the weight matrix seems to increase accuracy when two features of different scales, such as time and distance, were considered together, but decreases accuracy when just using time or just using distance as a metric. This is logical since in the single type metric case, all the scaling is doing is clumping the data together, thus making distinguishing the differences more difficult.     

2. Fuzzy C-Means Clustering

Method Overview: A generalization to K-Means is fuzzy C-Means. Fuzzy c-means clustering is a form of overlapping clustering. In other words, a single feature may belong to more than one cluster. This method is based on optimizing the following objective function:

    ,    

where m is any real number greater than 1, uij is the degree of membership of xi in the cluster j, xi is the ith of d-dimensional measured data, cj is the d-dimension center of the cluster, and ||*|| is any norm expressing the similarity between any measured data and the center.

We decided to play with the following free parameters to see how they effected our results:

  1. C: the total number of clusters

  2. M: This variable adjusts the amount of blending of different clusters. As M approaches 0 the amount of blending approaches 0 and fuzzy c-means becomes discrete and thus becomes k-means. For anyM>1, a feature is allowed to belong to multiple clusters. 

  3. SM: similarity metric (MINKOWSKI)

    The primary similarity measure used was the MINKOWSKI distance for each feature. This is the generalized family of metrics based in the Euclidean Distance measure, which is the MILOWSKI distance for a two dimensional feature set.


The detailed results show that for our data set the fuzziness didn't add anything. This is probably so to the unbalanced nature of our data acting as a black whole pulling all clusters towards a few classes.

It was also interesting to note that  the degree of fuzziness (M) was helpful for features that were more stochastic with respect to what activities they belong to. Thus for the duration feature we saw an increase in accuracy from k-means. This is because a sensor activation of ten seconds can belong to any activity more readily than say being in the kitchen. I (probably) won't be in the kitchen and taking a bath. Thus the fuzziness hindered classification using features that tend to be more strongly correlated with certain activities, such as rooms. Our data doesn't seem to overlap much, such as duration of activities. Activities seemed to be pretty sequential, thus, once again, low values of M (referenced as b in the figures) tend to be more accurate due to less blending and k-means ( basically a blending of 0) did better. 
 

3. Hierarchical Clustering

Method Overview: Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering is as follows:

  • Start by assigning each item to a cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters be the same as the distances (similarities) between the items they contain.
  • Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one cluster less.
  • Compute distances (similarities) between the new cluster and each of the old clusters.
  • Repeat steps 2 and 3 until all items are clustered into a single cluster of size N. (*)

This type of hierarchical clustering is called agglomerative clustering because we cluster from bottom up. Starting with N number of clusters and working our way towards one cluster.

As with k-means, the way in which the distances between methods are clustered can be changed to change what clusters are formed. The specific form of clustering we chose was complete -linkage clustering. In complete-linkage clustering (also called the diameter or maximum method), we consider the distance between one cluster and another cluster to be equal to the greatest distance from any member of one cluster to any member of the other cluster. This was used in order to discourage the growth of elongated clusters. The complete-linkage method is so named since it basically produces a complete subgraph. 

We implemented hierarchical agglomerative clustering with the following features:

  1. Period 
  2. Duration
  3. XY Location

The main thing we play with here was finding what number of clusters was best. That is, what level of clustering was optimal.

The results show that our success with hierarchical clustering was less than pleasing. It actually makes sense for our features however. We see that x,y did the best. This is because the distances could be thought of as hierarchical clusters such as the sensors around the fridge and sink and these can be grouped to the kitchen.

Conversely, the way duration was created can not really be hierarchically clustered.

A major problem with agglomerative clustering is that it is O(N^2) and it is highly susceptible to local minima since stages build on one another. 

4. Mixture Modeling

Method Overview: The model approach using mixture Gaussians. Basically K Gaussians and use EM to learn the parameters for each of the K Gaussian components and try to use this model to fit the data.  

We implemented hierarchical agglomerative clustering with the following features:

  1. Period 
  2. Duration
  3. Number of Sensor Activations
  4. XY Location

The main thing we play with here was finding what number of components was best. That is, what level of clustering was optimal.

The stopping criterion used for this algorithm was the log-likelihood of the data. This means the log of the probability given the model parameters that we have found tells what likelihood these parameters have in explaining the given data. The higher the log-likelihood, the better is the fit to the data.

The results show that almost all of our features did better with smaller K. This is again most likely due to the large number of sensors in one area of the home and the sparse number in the rest of the home. The accuracies tended to be slightly worse than the K-means algorithm, but does much better than fuzzy c-means. Also, mixture Gaussians has the advantage of using prior information which may come in handy in later work.

Conclusions

Overall the methods explored here seemed mediocre, but given the nature of the data, which seems close to how real data is as opposed to toy data problems, some result were encouraging. The distance features seem to be the best features considering accuracy of all the methods.

The most promising methods seem to be the simplistic k-means clustering and the more robust mixture Gaussians.  The nature of the data did not seem to mesh with the hierarchical clustering, but different features may be more successful with this method.

Future directions

Future work will need to focus on better feature selection and combination. Also, it would be very useful to try more robust semi-supervised learning algorithms instead of using unsupervised methods and merely doing dumb cluster and classifying. Balancing the data may bring around better results as well.