Search my stuff

Wednesday, September 1, 2010

Reading #2: GRANDMA lover

Comments:
The Grove Master!

Summary:
GRANDMA, or Gesture Recognizers Automated in a Novel Direct Manipulation Architecture, is Dean Rubine's spark that ignited the field of gestured-based sketch recognition. It is based on single-strokes, as demonstrated by the GDP (Gesture-based Drawing Program) that he references throughout this paper. Adding gestures is done via training sets, with a recommended 15 examples being given. The background functionality of these gestures is set by the user, and then the GDP is ready for testing.

Strokes are composed of sets of points in the fashion which denotes X-coordinate, Y-coordinate, and time, respectively. Using this information, Rubine calculates 13 key features of a stroke. Given this new set of features, Rubine then finds the best match as compared to a preset gesture. This is done with a linear classifier. With 15 examples for each class to match against, Rubine found that he could achieve a 96% success rate.

Discussion:
Rubine is obviously a complete beast. It might seem trivial to us now that he used a classifier to best-match strokes, but at the time it must have been an insane idea. I also prefer the idea of rejection over undo. It makes more sense to me to set a certain lower threshold for recognition, and reject strokes that do not reach it. In this way, user inattention will not lead to bad examples being added to the test set. But then again, I'm not sure if accepted recognitions are used to increase the training set or not. If it was in there, I don't remember seeing it. All in all, I think this paper was a good introduction to gesture-based recognition.

4 comments:

  1. I agree with the fact that Rubine was way ahead of his time with this idea of using a simple linear classifier to solve a complex problem at the time. I wish I have a moment of inspiration like that some day.

    As for the undo/reject I think it really depends much on the application, also in the fact that you mentioned (if every succesful recognition feeds the training set). In palm pilots for instance undo seems very natural, every stroke gives a visual feedback: instead of a blank space for rejection a letter or symbol appears even if it was not the meant one. And in this case the undo is very simple so it does not harm the user experience.

    ReplyDelete
  2. It's a little bit surprising when I found Rubine did not explain much detail about the intuition of those features, while Dr. Hammond's tutorial does. I also kind of doubt the high precision it demonstrated, the examples he was using seems to be pretty trivial, how about more complicated cases?

    ReplyDelete
  3. I think, high accuracy can occur in several ways. For example, users always make standard gestures, which are very similar to examples. And the selection of dataset is also important. For example, it can work well in dataset A, but not in dataset B.

    Researchers always publish results in the datasets where they get best results. Sometimes, it is important to find good datasets not algorithms.

    ReplyDelete
  4. Hi. I'm Dean Rubine. Thanks for the kind words, Chris Aikens. It never occurred to me to have test gestures added to the training set, so let's call that your idea. I like it. If the user does undo, you can always remove the test gesture.

    Jianjie Zhang, I didn't cherry pick the recognition rate studies. I set up the program, had my friend Peter (who had never used the system before) do the training and testing examples, and then I reported the results. It wasn't a large sample, but the results are unbiased. I always thought the rates were pretty low, anyway.

    The idea of a linear classifier came from Duda and Hart, which I was reading in preparation for doing the work. I recall trying a few different classifiers from that book. As for which features, I went by intuition and just experimented to find a small set that worked. I don't remember all that well, but I started from the basic set and added a few more as I found gestures that looked different but had close feature vectors. Yue Li is right that I should have written more about this, and I can think of lots of other stuff I could have talked about as well. Of course, there's plenty that I could have left out too. Unfortunately, when you're in graduate school long enough, the question changes from "what would people really like to know about this work?" to "how can I get this document past my committee so I can graduate?"

    I saw a paper recently that classified just by doing a raw, point by point comparison of the input to each training gesture, which the author claimed performed better than my classifier. My way was of course more efficient (important as when I started I was doing the work on a slow Microvax II) but that wouldn't matter these days.

    Of course Yue Li is again right that the examples are relatively trivial, but I think that's reflective of real applications where the user can't really remember too many gestures, and it's hard remind users which gestures are available (unlike, say, menus entries). Gordon Kurtenbach had this idea of pie menus which would make gesturing more transparent.

    Thanks for your interest in my work from 20 or so years ago.

    ReplyDelete