CSCE 624: Sketch Recognition: September 2010

Monday, September 27, 2010

Reading #10: Graphical Input through Machine Recognition of Sketches

Comments:
Jorge

Summary:
This paper, written in 1976, outlines the things that a sketch-based system must be able to do in order to be of use to humans. It outlines three experiments in computer processing of sketches.

HUNCH: Can a computer interpret a sketch without knowing the domain? Such was the question behind experiment one, known as HUNCH. Testing was done using an actual pen and big piece of paper set over the surface of a drawing tablet (this was done to mimic a natural sketching environment). HUNCH essentially found corners by marking a slower drawing speed as a corner. It originally "latched" together endpoints, meaning that if two endpoints were within a certain radius of each other, they were said to meet. The system was then augmented to use speed to determine 'intended' endpoints. Experiments were also conducted to determine and interpret over-traced lines, project into the third dimension, and find paths between rooms in a hand-drawn floor plan. The conclusion was that context is important- even at the lowest levels of interpretation.

Experiment 2: Specifying the context of drawings was tested next. They found that "... the system is only as good as matching machinery". In other words, context can help, but it doesn't guarantee that the user will draw everything that the computer needs to identify the proper context.

Experiment 3: An interactive system that combines the strengths of HUNCH and Experiment 2 was tested next. For a user to interactive with the system, basic recognition must be done in real time. Measuring the speed of a stroke ad how bent it is can help with this.

The conclusion of the paper mentions how the goal is to allow users to modify interpretations as needed. A sketch-based system should be smart, adaptable, and easy to interact with.

Discussion:
Part of my truss recognition algorithm uses latching, and has similar issues. Adjusting the radius used to the length of the strokes in question could definitely help me out!

Really my one issues with this paper is that it seemed to end abruptly. I liked where it was going and the concise discussions of experiments and findings, but then all the sudden it was over. The conclusion could have been a page or more in length! I am thinking that it was cut short to meet some sort of page cap. Regardless, I liked this paper because it made me think of new ways to solve problems and create interfaces.

Reading #9: PaleoSketch

Comments:
Danielle

Summary:
PaleoSketch provides highly accurate low-level primitive recognition and beautification. It places very few constraints on the user, thus allowing them to focus on their drawings and not on having to learn the software that is implementing the recognizer. Paleo takes a stroke and runs it through a series of 8 different classifiers. Each classifiers then returns whether or not the stroke could be of that primitive type along with a beautified stroke if it matches. Paleo also incorporates two new features: the normalize distance between direction extremes (NDDE) and the direction change ratio (DCR). The NDDE is basically the percentage of the stroke that lies between the two extremes of slope. The DCR is the maximum change in direction divided by the average change in direction, with the first and last 5% of the stroke ignored.

The different possible interpretations for a stroke are stored in an interpretation hierarchy. This hierarchy is based on the minimum number of corners that would result when classifying a stroke a certain way. The resulting stroke with the fewest corners is chosen as the best interpretation. Because each interpretation is computed, however, users can easily choose an alternate interpretation if the chosen best match is not what they meant to draw.

Above you can see a blurry PaleoSketch do work on some recognition tests.

Discussion:
98.56% of the time, it works every time. But really, PaleoSketch pushes it to the limit. It seems like the only think Paleo cannot do is force users to not be lazy. Some of the recognition errors resulted because a tester drew a circle that looked exactly like an ellipse. The other errors were from complex shapes that were combinations of primitives. Can you really fault something for doing its job too well? I, sirs and madams, cannot. PaleoSketch is awesome.

Thursday, September 9, 2010

Reading #8: Lightweight Multistroke Recognizer for UI Prototypes

Comments:
Yue Li

Summary:
When the $1 Recognizer grew up, it evolved into the $N Recognizer. $N is a lightweight, multistroke recognizer that can provide increased accuracy and decreased time via the addition of optional optimizers. Again, the focus is on providing designers with an easy to implement and use recognition system for augmenting their software and designs. Here are the changes from $1:

A novel way to represent a multistroke as a set of unistrokes representing all possible stroke orders and directions
The conception of a multistroke as inherently a unistroke, but where part of this unistroke is made away from the sensing surface
The recognition of 1D gestures (e.g., lines)
The use of bounded rotation invariance to support recognition of more symbols
An evaluation of $N on a set of handwritten algebra symbols made in situ by middle and high school students working with a math tutor prototype on Tablet PCs

In order to not be super annoying, and thus allow programmers to just draw a template with a single orientation and stroke order, $N automatically calculates and stores the "unistroke permutations" of the provided multistroke templates. This permutation treats the template as a unistroke design where part of the stroke occurs off of the drawing surface (think about it being invisible). They provide a nice example of this here:

Through their user study, it was found that $N had a 96.6% accuracy when using 15 templates per shape. Additionally, a 96.7% accuracy was obtained using 9 templates of the original gestures tested with $1.

Discussion:
If stroke order and direction are not important, then the $N recognizer seems to be pretty awesome. In some cases, as in accommodating left and right-handed users, the ability to match the final gesture is very important because it minimizes user frustration and increases system accuracy.

Reading #7: Sketch Based Inferfaces

Comments:
THE GROVE MASTER (CAPSLOCK IS STUCK)

Summary:
A user interface that feels like smart paper. Combined with the goal of direct manipulation, you have the basis for this paper by Sezgin and company. The paper's focus is on the first step of the sketch recognition process- converting pixels into geometric shapes. This process is broken up into three phases:

1. Approximation: Minimize error and avoid overfitting.
The first part of approximation is vertex detection. Taking advantage of features such as slower speeds and increased curvature at corners, the authors find outliers above a computed threshold and treat them as potential vertices. Computed features are then combined in an attempt to further drop false positives.

The second part of approximation is curve handling... read that for yourself...

2. Beautification: Modify output to be more visually appealing
This step is basically a line straightener. Lines that are in groups are rotated by their midpoints to try and maintain close connections at vertices.

3. Basic Object Recognition: Produce stroke interpretations
Ovals, circles, rectangles, and squares are basic objects. Template matching is employed to detect these geometric objects.

The authors found that people liked being able to use multiple strokes to draw a single object (go figure). The shapes used in the study were pretty crazy, thus proving the system was capable of being awesome.

Discussion:
I didn't really connect with this paper... I'm not sure why. Maybe they just didn't stress the impact that their system had enough for me to identify with it. Can anyone clear that up for me?

Tuesday, September 7, 2010

Reading #4: Master Sutherland vs. the Machine

Comments:
liwenzhe

Summary:
Ivan Sutherland created a sketch-input system named Sketchpad before even the mouse was being used. Sketchpad is a light pen / giant board of buttons combination that allows users to draw items (known as symbols), and place constraints upon them. The computer stores these constraints and symbols so that their properties and the visual drawings themselves can be reproduced. Users can even add new symbols to the library to use in the future. That's basically the origins of object-oriented programming we are talking about, people! Not to mention GUIs, Computer-Aided Drafting (CAD), and computer graphics in general. The display had some crazy zoom for adding details (Sutherland mentions that a 7 inch section of a 1/4 mile long side), and the reusable nature of symbols, and the fact that information attached to them was retained, allowed Sutherland to show off some impressive examples.

Sutherland does note that in the instance of electrical circuit diagrams the user felt that drawing by hand would be faster. But again thanks to the stored symbol library, once the needed objects were correctly drawn and constrained, even this medium could be expanded upon and prove useful to engineers. A positive example that stands out to me is the bridge idea that he tested. He essentially created a library and set up constraints to handle free body diagrams using Sketchpad. This allowed him to test how different loads and supports would affect members of the truss structure. You're welcome, AutoCAD!

Discussion:

"It is only worthwhile to make drawings on the computer if you get something more out of the drawing than just a drawing."

That quote should be considered by anyone who creates something based on a sketch system. Sutherland himself stresses this in his paper, and sadly it has yet to be accepted as a field standard.

I wonder if Sutherland realized what he was really doing when he designed Sketchpad... did he just create something to get his PhD thesis taken care of? Or did he really set out to revolutionize computer design and capabilities? This paper was awesome. Sutherland is awesome. Man and machine are friends.

Sunday, September 5, 2010

Reading #5: $1 Recognition

Comments:
liwenzhe

Summary:
97% accuracy with only one provided example. Such is the boast of the $1 Recognizer. This paper by Jacob Wobbrock, Andrew Wilson, and Yang Li (the man behind Protractor) explains the implementation and reasoning behind the creation of an easy and cheap recognizer designed for novice designers. The authors stress that gesture-based interaction can be very useful, but that it was previously difficult for non-experts in the field to implement a system. The authors even provide this very nice summary of contributions:

1. be resilient to variations in sampling due to
movement speed or sensing;
2. support optional and configurable rotation, scale, and
position invariance;
3. require no advanced mathematical techniques (e.g.,
matrix inversions, derivatives, integrals);
4. be easily written in few lines of code;
5. be fast enough for interactive purposes (no lag);
6. allow developers and application end-users to
“teach” it new gestures with only one example;
7. return an N-best list with sensible [0..1] scores that
are independent of the number of input points;
8. provide recognition rates that are competitive with
more complex algorithms previously used in HCI.

So what are the limitations of this $1 Recognizer? Because it is rotation, scale, and position invariant, it cannot tell the difference between squares and rectangles, ovals and circles, vertical and horizontal lines, etc. In some instances, the differences between these sorts of things may be critical, and the authors are quick to stress this to readers. By testing $1 against Rubine and Dynamic Time Warping (not a form of space travel), the authors found that their solution was highly accurate. Not too shabby for a $1 charge.

Discussion:
The authors themselves brought up my own biggest issue - was the $1 Recognizer actually easy for novice users to implement? As of this paper, that fact was yet to be determined. If the whole focus was to help more people integrate and use gesture recognition, then it might be important to actually have some people implement it and evaluate it themselves. Regardless, I think that by using a few simple modification tricks, the $1 Recognizer is a great example of gesture recognition for the laymen programmer.

Thursday, September 2, 2010

Reading #1: Hammond Blog (apparently it's required)

Comments:
Yue Li

Summary:
Dr. Hammond's paper entitled 'Gestured Recognition' is essentially an introduction to... well... gesture recognition. The first thing to keep in mind is that a gesture represents the path of the pen. As such gestures must be drawn in a single stroke and in the same direction or they will not match up to other preset or example gestures. This paper discusses a few of the key research done in the field of gesture recognition. It begins with a discussion of Rubine's recognition method which is based on 13 calculated features of a gesture. Next, it outlines Long's quill system, which used a total of 22 features to classify gestures on the fly, and to provide system designers with feedback about the gestures they are using. Finally, Hammond's paper talks about the $1 recognizer developed by Jacob Wobbrock. This recognizer standardizes gestures to make matching between templates and new input a faster process.

Discussion:
Dr. Hammond's paper serves as a great compliment piece to the other papers by each respective author that we {have, will, are supposed to} read during the first week or so of class. The section focusing on Rubine's features was especially helpful to me. My only problem is that it wasn't really made clear that we had to blog about it! Get on that, Paul!

Reading #6: I've got a Protractor!

Comments:
Hong-Hoe Kim

Summary:
Protractor is a template-based, single-stroke gesture recognizer. It calculates the similarities between gestures via a nearest neighbors classifier. This means that when a new gesture is input by the user, Protractor compares it to the stored examples within the system to determine the best (nearest) match. Gestures are processed to be equal-length vectors, and the nearest neighbor is thus the one with the smallest angle between it and the input gesture (think text documents in the vector space model). Protractor differs from its closest companion, the $1 recognizer, in the fact that it takes up 1/4 of the memory and boasts a faster recognition time. The smaller space requirements can be attributed to the set 16 points that are equally spaced throughout the length of the gesture. Yang Li, the creator, stresses that this increased efficiency makes Protractor ideal for mobile touch-based systems.

Discussion:
I was somewhat familiar with the idea of using vector space for comparisons thanks to Dr. Caverlee and his Information Retrieval course. Regardless, the equations in the paper owned me at first. A potential issue with Protractor is the fact that only 16 points are sampled from each gesture. What if the stroke is extremely long, and therefore a lot of crazy stuff happens in between the sample points? In this case, the standardized length of the resulting vector would be based on an incorrect set of assumed points. This might limit the gestures recognized to broader, simpler patterns. But even this could be a good thing because people can learn more when faced with less complexity. Catch 22?

Reading #3: You're doing it wrong!

Comments:
Jianjie Zhang

Summary:
Long, Landay, and Rowe noticed that people were incorporating new interaction technologies into their user interfaces, but that they did so naively, causing issues with recognition and frustrating users. To alleviate these frustrations in a gesture-based system, they developed quill. quill continuously analyzes gestures and warns designers when they may be confused with other gestures by the computer or users. Long and

company use the same algorithm that Rubine developed to train their system (about 15 examples per class), but focus everything on the designer end.

Because quill is giving advice back to designers, Long and company found that the user interface was very important. Their three main areas of consideration were the timing of advice, the amount of advice to display, and the content of said advice. In the end, messages are most often displayed when the designer is testing gestures. They are kept concise, and include hyperlinks to more thorough explanations and solutions. Finally, the messages are written plainly in English to avoid ambiguity and frustration.

Discussion:
Cool story, bro! But really... I liked the idea of giving designers feedback on their purported gestures. Embracing new technologies is great, but if you don't consider users and interpretation in your system design then it's better to not even use the technology! I also liked how part of the paper focused on the feedback system and its challenges. That topic is still of great relevance today, and is something that we all have considered or will consider during our own projects and applications.

Wednesday, September 1, 2010

Reading #2: GRANDMA lover

Comments:
The Grove Master!

Summary:
GRANDMA, or Gesture Recognizers Automated in a Novel Direct Manipulation Architecture, is Dean Rubine's spark that ignited the field of gestured-based sketch recognition. It is based on single-strokes, as demonstrated by the GDP (Gesture-based Drawing Program) that he references throughout this paper. Adding gestures is done via training sets, with a recommended 15 examples being given. The background functionality of these gestures is set by the user, and then the GDP is ready for testing.

Strokes are composed of sets of points in the fashion which denotes X-coordinate, Y-coordinate, and time, respectively. Using this information, Rubine calculates 13 key features of a stroke. Given this new set of features, Rubine then finds the best match as compared to a preset gesture. This is done with a linear classifier. With 15 examples for each class to match against, Rubine found that he could achieve a 96% success rate.

Discussion:
Rubine is obviously a complete beast. It might seem trivial to us now that he used a classifier to best-match strokes, but at the time it must have been an insane idea. I also prefer the idea of rejection over undo. It makes more sense to me to set a certain lower threshold for recognition, and reject strokes that do not reach it. In this way, user inattention will not lead to bad examples being added to the test set. But then again, I'm not sure if accepted recognitions are used to increase the training set or not. If it was in there, I don't remember seeing it. All in all, I think this paper was a good introduction to gesture-based recognition.

Let me hit you with some knowledge

Contact: heychrisaikens@gmail.com

Standing: 1st year Masters

Why I'm here: Sketch Recognition is becoming a very key part of my studies. It's time that I learn the trade.

Experience: One semester of research in SketchRec via the SRL@TAMU.

10 years from now: I plan on running my own company. Pervasive Systems, anyone?

The next big thing in CS: My company. No not really... Probably advances in connectivity and the integration of computers into even more facets of our lives.

Favorite course of yesteryear: Information Retrieval.

Favorite movie: Sunshine. Science Fiction + The Human Condition -> Win.

Time traveling: I would find the inventor of the time machine and destroy his plans. We have enough to worry about without the time/space paradox thrown in.

Interesting fact: I love Sprite Zero. That stuff is off the chain. Also, in retrospect, this Grey is hard to read.