Comments:
Hong-Hoe Kim
Summary:
Protractor is a template-based, single-stroke gesture recognizer. It calculates the similarities between gestures via a nearest neighbors classifier. This means that when a new gesture is input by the user, Protractor compares it to the stored examples within the system to determine the best (nearest) match. Gestures are processed to be equal-length vectors, and the nearest neighbor is thus the one with the smallest angle between it and the input gesture (think text documents in the vector space model). Protractor differs from its closest companion, the $1 recognizer, in the fact that it takes up 1/4 of the memory and boasts a faster recognition time. The smaller space requirements can be attributed to the set 16 points that are equally spaced throughout the length of the gesture. Yang Li, the creator, stresses that this increased efficiency makes Protractor ideal for mobile touch-based systems.
Discussion:
I was somewhat familiar with the idea of using vector space for comparisons thanks to Dr. Caverlee and his Information Retrieval course. Regardless, the equations in the paper owned me at first. A potential issue with Protractor is the fact that only 16 points are sampled from each gesture. What if the stroke is extremely long, and therefore a lot of crazy stuff happens in between the sample points? In this case, the standardized length of the resulting vector would be based on an incorrect set of assumed points. This might limit the gestures recognized to broader, simpler patterns. But even this could be a good thing because people can learn more when faced with less complexity. Catch 22?
also I guess use of just a single template for the similarity comparison is questionable! What about when we have enough templates to compare, say more than 20. The nearest neighbor can easily accommodate k=3 or k=5 and probably result in more accurate similarity than just using a single template comparison. May worth to give it a try............... :)
ReplyDelete->chris aikens :
ReplyDeleteI also agree with your idea about using 16 sample points. In long gesture, some feature points may be ommitted instead other less important points would be chosen. I his paper he does not say why using 16 points instead of 32,or 64poitns. In order to convince us, he should have give some empirical results that using 16 points still gives good result comparing with usign 32, or 64, I think.
->sampath :
ReplyDeleteIn my point of view, single template should be enough to get a good result, in general case. But more importantly, if one gesutre recognized as one category class, it should have VERY HIGH convidence value for correct class, much higher than other class labels. So right gesture class must be in the top of candidate lists, if it is in top 2 or top 3 instead of top 1, it would be ambiguous, we need furthur classfication.
How do you guys think of the preprocessing step? Li showed two kinds of rotation, namely, orientation -invariant and senstive way. I am not sure what his purpose of doing in this way. Any idea ?
ReplyDelete