CSCE 624: Sketch Recognition: 2010

Monday, December 13, 2010

Reading #30: Tahuti

Comments:
Sam-bo

Summary:
Tahuti is a dual-view sketch recognition environment. It shows users both their original strokes and the interpreted UML view of them. Users are able to sketch just as they would on paper and have the system create the UML structures that would correspond to their sketches. Tahuti uses a multi-layer framework to process, select, recognize, and identify strokes. The paper provides some nice information as to how each step is performed and what algorithms are used.

Through user studies, the authors discovered that Tahuti's interpreted view was deemed to be easier to drawn in and easier to edit in than comparative systems.

Discussion:
This paper was written in 2002. Since that time, a number of different systems have been designed that aide users in creating UMLs. I am not sure that a sketch-based approach to this task is relevant or efficient anymore, but at the time it seems like it was an excellent idea. Limiting user frustration is a must!

Full Blog Completion Status Achieved!

Reading #29: Scratch Input

Comments:
Sam

Summary:
Scratch Input is an acoustic-based gesture recognizer. It uses a modified stethoscope with a single microphone embedded into it in order to capture the propagation of sound waves through a solid, flat surface. Scratches have a high frequency, and thus frequency thresholds are employed to eliminate almost all noise from the system. Gestures are distinguished mainly by the number of amplitude peaks in their signal. People slow down when approaching corners in a drawing, and thus the lowest points in a signal correspond to "corners" in a gesture. Strokes can thus be segmented as the peaks between these corners. The single input sensor used was unable to differentiate between gestures that contain the same number of strokes, but has an accuracy of 90% for the gestures tested. The hardware device designed by authors was extremely affordable, thus allowing for Scratch Input to be applied to a variety of large surfaces as needed by potential users.

Discussion:
Good ol' scratch input! Our Sound Board project was based heavily off of the ideas in this paper. I would love to see the authors return to this work and employ multiple sensors. That would essentially do what Drew, George, and I tried to do with our 3rd and Final projects.

Reading #27: K-Sketch: A “Kinetic” Sketch Pad for Novice Animators

Comments:
Francisco the Awesome Millionaire with the Diamond Suit

Summary:
K-Sketch is a 2D animation sketching system designed for novice users. The authors conducted interviews with both animators and non-animators in order to come up with a range of tasks that could be supported by K-Sketch. They then implemented a nice set of features that allow users to quickly modify their basic sketches so that they can carry out simple animations.

In a laboratory experiment that compared K-Sketch to a more formal animation tool (PowerPoint), participants worked three times faster, needed half the learning time, and had significantly lower cognitive load with K-Sketch.

That sums it up!

Discussion:
K-Sketch reminds me of a sketch-based Prezi tool. It includes lots of features, some of which are are mapped strangely, and allows users to create some pretty cool stuff given a little bit of time and some patience. It seems like the system would really allow users to express themselves, which is the best thing that could be hoped for.

Reading #26: Picturephone: A Game for Sketch Data Capture

Comments:
Marty

Summary:
Picturephone is a sketching game used to gather labeled data in a way that is fun (entertaining) for users. Picturephone works in a similar manner to that of the game Telephone, wherein players repeat phrases to each other in a linear fashion and see how the phrase evolves with each iteration. In Picturephone the players alternate between drawing a sketch and describing it. An additional player is then assigned the task of judging how similar the sketches are. In this way, the authors manage to get labeled data and a relevance/accuracy metric without having to do any of the work themselves.

Discussion:
Picturephone was discussed in reading #24. I liked it then and I like it now. Again, these types of sketching games seem like a good way to gather labeled data without having user repetitively draw the same shape over and over and over.

Reading #25: A descriptor for large scale image retrieval based on sketched feature lines

Comments:
Paco

Summary:
So you need to find an image online... do you describe it in words? What if you describe the wrong parts of the image because those are what you deem important? How do you quantify an image's importance? You could draw the image, but then why do you NEED it if you can draw?!

Ok, so obviously sketching to search for images would be cool. So cool in fact that the authors of this paper developed a system that does it. Their system is designed to query beastly databases containing millions of images. Actual images and the user's input sketch are preprocessed the same way which allows for matching based on similar descriptors.

Database image descriptors are cached in memory, and clusters are created based on similar colors. Searches take up to 3.5 seconds.

Discussion:
Awesome! Like Paco and I were talking about, this idea could be used to teach both users and the system words in different languages. If you draw simple items such as a tree or a cat, then you could also provide the written description (word) in your native language. Once you select a result, the system could then "learn" that your word describes that image and be able to employ cross-language searches in the future.

Reading #24: Games For Sketch Data Collection

Comments:
Kim!

Summary:
The authors of this paper are interested in allowing users to freely move between sketching domains rather than be restricted to a certain one. This allows for a more natural sketching session akin to the use of pen-and-paper. In order to gather data on sketches and user-provided descriptions, the authors implemented a multiplayer sketching game.

The two online games created for data gathering are called Picturephone and Stellasketch. With Picturephone, players switch between describing a scene and drawing it. The next user interprets either the drawing as a new description or the description as a new drawing. Players then rate the various drawings to denote similarity (the more similar the better). The Stellasketch game is similar to Pictionary. A single user is given a topic and begins to draw it. The other players privately label the sketch with their guesses at various stages in its design process. Because users enjoyed playing the game, the authors were able to gather labeled sketches in the background.

Discussion:
I like this idea. You can hide data gathering techniques in games that people enjoy playing. We should implement something like this into Sousa studies because sometimes the redundancy of providing examples is tiring. Make users label their own stuff! Reduce your workload!

Reading #23: InkSeine: In Situ Search for Active Note Taking

Comments:
SAM

Summary:
active note taking - capturing and extending creative ideas, reflecting on a topic, or sketching designs.

InkSeine is a fluid interface designed to allow users to engage in active note taking. It employs a handwriting recognizer in order to allow users to add a new depth to their notes with the incorporation of searches that can serve as extensions to their selected note or information feed. It also uses gestures such as lasso to trigger actions such as searching for the encircled, hand-written phrase. Sketch recognition techniques are used to aide users in their sensemaking tasks, and are done so intuitively. The authors took the time to conduct initial user studies with lo-fidelity prototypes in order to maximize usability and focus on potential user scenarios and tasks. Context-based searches minimize cognitive overhead and, based on the authors' formative studies, lead to happy users.

Discussion:
This paper seems like something you would read in Dr. Kerne's class. It is an excellent example of iterative design, user interface concerns, and affordances and mappings. Don Norman is probably using InkSeine right now trying to figure out how some poorly designed door opens. I also like the author's use of popup windows that do not require users to navigate away from their current tasks just to view initial results. Good design and good use of sketching.

Reading #22: Plushie: An Interactive Design System for Plush Toys

Kids made these!

Comments:
Geroge! Again!

Summary:
First things first... go read This Blog I wrote on Teddy. All done? Good.

Plushie is a pretty cool extension of a 2D to 3D modeling system that uses gestures to perform different editing options. Plushie affords (shocker!) creating original plush toys. A big contribution that the authors make is that they provide dynamic feedback as to what the user's model looks like and what its 2D textured pieces look like as well. They tested the system with kids, and even they were able to make some awesome new plush toys with minimal difficulty. A system that kids can use and enjoy is a success.

Discussion:
I had no idea that so many people were interested in creating plush toys from 3D models. I don't think it's particularly weird or anything... but it's definitely interesting. I liked the fact that the authors allow users to change their 2D textures and receive constant updates on them as they modify their 3D model. And as I mentioned, making something fun that kids can enjoy is great. Reel those kids into computer science as early as possible!

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design

Comments:
Geroge!

Summary:
Teddy affords the user drawing 2D strokes and then automatically constructs potential 3D polygonal surfaces based on said strokes. Users interactively specify the silhouettes of objects, and the system attempts to create a 3D model which would match that silhouette (yes, I kind of just repeated myself). Keep in mind that Teddy was designed for rapid approximations. And the authors succeeded in this goal! Users were able to create basic models after as little as 10 minutes of getting used to the system.

Once Teddy generates an initial 3D shape, users are able to view their model from different angles and can modify with various gestures (as shown above). The rest of the paper focuses on the algorithms used to perform the various modeling operations.

Discussion:
What I took away from this paper is that basic gestures can be used to perform some heavy back-end stuff... and to create animations that kids would love. What is important about this work is that the gestures seem intuitive, and thus users can understand how everything works quickly. I recommend that you look at the examples of each action for yourself, as this paper is filled with great screenshots of gestures in action.

Saturday, December 11, 2010

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches

Comments:
Sam

Summary:
MathPad lets users draw mathematical expressions, symbols, and diagrams. Some diagrams can even be animated by the system! The authors suggest that allowing users to visualize their problems (which they will naturally do on pen and paper anyway) can help them in their sensemaking tasks.

As shown in the table above, basic gestures are used to inform MathPad of user intent for various items. Expressions are further converted to strings that can be evaluated by MatLab. MathPad also includes a nice set of computational functions that can aide users. And you can change your stroke color to help organize your work. It's the little things that count.

Discussion:
I did not go into much detail about MathPad, but do not let that in any way discredit it. The system sounds great! It also reminded me of Mechanix because the authors gave some consideration to usability and the aides they could provide to users. I think Mechanix takes it a step further in terms of features and freedom of drawing (recognition is off the chain), but MathPad sounds like a very smartly designed system all around. Good show.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields

Comments:
Jonathan

Summary:
The recognition method discussed in this paper is based on Bayesian conditional random fields (BCRFs). BCRFs consider both spatial and temporal information, and can correlate features. CRFs are prone to overfitting, meaning that they do awesome for training data and horrible for new data. You could simulate this failnomenon by using the same training files over and over and over when building your feature set.

The authors are interested in discriminating between the containers and connectors in organization charts (see the figure at the top). They had 17 participants draw the chart shown, and ran 5 different algorithms to test the classification. The BCRFs proved to have the best recognition rates.

Discussion:
This paper was out of my league. Do you ever read something that makes you feel like you don't actually know anything about a given field? That was this paper. A little bit of side Googling returned some helpful links on BCRFs, etc., but I still felt lost. The results section basically showed me that everything they did was awesome and that it worked.

Reading #18: Spatial Recognition and Grouping of Text and Graphics

Comments:
Sam

Summary:
This paper discusses a spatial approach to the grouping and recognition of sketches. The process, as shown in the stolen image below, can be done in real-time. Strokes near each other are shown with the labeled graph in (b). Shapes are computed and matched to templates in (d). Templates return potential scores (0 to 1) which are used to determine best overall classification for the user's strokes.

I know what you're thinking... isn't speed an issue here?! The neighborhood graph in (b) helps to eliminate possible classifications based on vertex count and proximity. The authors also discredit potential strokes consisting of K components, where K is the number of strokes in the current largest template. Oh, and everything is based on machine learning (including the A* search). A user need only provide examples.

Discussion:
AdaBoost sounds like a deliciously nerdy energy drink. As the authors discuss in their... Discussion... an off the shelf system that is both efficient and accurate would be boss. If this work could be furthered to achieve similar results with fewer templates, then Rubine himself might rejoice and raise an AdaBoost toast to designer-accessible sketch recognition plug-ins.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink

Comments:
Kim

Summary:
What is the probability that a system can interpret your text from graphics when you draw with a stylus? Such is the question behind the work in this paper. The system described is broken into three main approaches.

Independent Strokes: Sequences of points between pen-down and pen-up events are taken to be strokes. 11 features are computed for each stroke. A multilayer perceptron is used to train a classifier as to which feature vectors correspond to either text or graphics.
Hidden Markov Model (HMM): The order of strokes can lend a clue as to what they should be classified as (unless the user jumps between a letter and a shape because they are weird). By looking at overall classification patterns, the HMM can be used to predict the current stroke givent he last stroke.
Bi-partite HMM: The gaps between strokes can lend additional information. A user will employ a different sized graph between two text strokes, two graphics, or a mixture therein.

Discussion:
I think I read this paper before... anyway! I did not like the way that they presented their results. Call me old fashioned, but I think you should always put your accuracies in plain old X (where X is your written language used). And was that plot drawn in paint? I felt like I was interpreting their findings rather than reading about them! Besides that, I thought the paper was very interesting.

Reading #16: An Efficient Graph-Based Symbol Recognizer

Comments:
Geroge

Summary:
This paper discusses (you guessed it) an efficient graph-based symbol recognizer. Using an Attributed Relational Graph (ARG), the authors can describe symbols in terms of their geometry and topology. A symbols geometry would include its primitive shapes and structures, which are treated as nodes. As the authors state, recognizing sketches involves matching graphs. And FYI, this can be hard as hell. So your user forgot to draw a primitive shape like a circle? There goes a node and a couple of edges. Although representing a symbol in terms of its topology allows for rotations and scaling to be matching, it cannot help the issue of missing components.

ARG for a perfect square.

Given training examples of each symbol class, the system constructs an "average ARG" for that class. To improve the average, the authors maintain stroke order and orientation across all examples. In testing, the system was proven to return the correct symbol in a top 3 list with over 93% accuracy.

Discussion:
I like how complex the symbols are that the authors tested in this paper. Some of them remind me of good old Civil Sketch. The trade-off between accuracy and speed of calculation is important, although even the longest recognition time only took 67.8 ms. Not too bad, but the fastest took 2.0 ms... but with accuracy around 79%. Accuracy vs. Time is an epic battle.

Saturday, December 4, 2010

Reading #15: An Image-Based Trainable Symbol Recognizer for Sketch-Based Interfaces

Comments:
JJ

Summary:
Image-based recognition with only a single provided template. Such is the boast of the system designed by Kara and Stahovich. This paper also outlines the unique and low-cost polar coordinate analysis which is used to achieve rotation invariance. A three step process that begins with rotational checks is used to prune possible templates for any given sketch. Each sketch is treated as a 48x48 bitmap image which preserves the input's aspect ratio. Template matching is then carried out through the use of four different techniques. Results of these four techniques are then "parallelized" and "normalized", resulting in values between 0 and 1 which are used to determine how close an input sketch is to the different templates. Through a series of tests, the authors proved that their system was able to recognize the sketches of amateurs using only one or two templates with an accuracy of over 90%.

Discussion:
Though I brushed over it in the summary, a great new idea coming out of this paper is the polar transformation used to handle rotations. It is very constricting to require users to always draw with the exact same rotation. You could always create templates for different rotations of the same gesture... but why waste the time when you can use something as efficient as the transformer presented here?

Wednesday, November 3, 2010

Reading #28: A leap through the reading list to iCanDraw?

Comments:
PaPaPaco

Summary:
iCanDraw? is the first system ever (EVER) to provide direction and feedback for drawing faces with a computer. The goal of iCanDraw? is to actually teach people how to draw the human face using actual metrics and techniques. It starts by generating a template of the face to be drawn so that the feedback can be tailored to the current task. Feedback is provided explicitly when the user finishes a step and asks for input. The system even includes some helpful features such as erase and undo gestures, reference line markers, and straight edges to provide guidelines. The user is led through drawing a face piece by piece, with help provided at the termination of each step. They can then choose to make changes or keep what they have. The users are not, however, constrained to certain steps and have the freedom to draw as much as they want or make past corrections.

The authors found 9 key design principles for assisting the act of drawing via sketch recognition. Also, they found out that users gained confidence after using the assistive system.

The omitted middle session of a different image with feedback turned on led to the user drawing a somewhat creepier and more correct baby.

Discussion:
iCanDraw? software is in our lab, and I have even seen people use it, but I have yet to use it myself. And after reading this paper I want to! I think that having someone or something actively (but unobtrusively) provide feedback is an excellent way to hone ones skills without getting frustrated. I am guilty of "teaching" myself something and it being horribly, horribly wrong (you do not drive with both feet). Teaching tools are awesome. The End.

Reading #14: Shape vs. Text. The Ultimate Showdown

Comments:
Sam

Summary:
Text vs. Shape returns in this paper, with an all star ink feature paving the way to high accuracy.
Entropy is the measure of the degree of randomness in a source (that is from the paper, I swear). Text strokes are normally more dense than shape strokes, thus giving them a higher entropy. And from here, the authors go beast mode.

The authors created an entropy model 'alphabet' that is used to assign a symbol to the angle a points makes with its two adjacent points in a given stroke. Printing out the entrobet (now that one I made up) provides a visual cue as to the changes that a stroke undergoes in terms of curvature. The points are measured 4 pixels apart and the assigned values are averaged over the bounding box of the stroke in order to ensure scale independence. Testing data was measured for the percentage of strokes that were classified as shape or text, and the accuracy of said classifications. Overall, the entrobet had an accuracy of 92.06%.

Discussion:
I did not know this was an SRL paper until they began talking about SOUSA in the data collection section (I skipped over the author's names somehow). Good thing I didn't say anything bad about it! But seriously, entropy as a measure for shape vs. text can be deemed a goto option based on the research presented in this paper. The issue of dashed lines in the COA data set accounted for a high level of the errors, so including a system that can pre-process out these dashes lines would lead to even greater accuracy. How would you remove shapes made up of dashes?

Reading #13: Ink Features for Diagram Recognition

Comments:
Amir

Summary:
Ink features are another name for... well ink features. Curvature, time, speed, intersection, and more are calculated and used to distinguish between different shapes and between shapes and text. In this paper, the authors look at different features to determine which ones actually aid in the shape/text division. A total of 46 ink features were tested over 1519 strokes drawn by 26 different participants. Each sample sketch included a mixture of text and shapes that the authors felt was representative of the overall use of computer-aided recognition.

In the end, 8 different features were found to really make a difference (as shown in this figure). Or do they? Upon testing, the authors found that using these ink features is beneficial, but that not all of them together provide the best results. Inter-stroke gaps, for instance, are much more helpful.

Discussion:
Making the distinction between shape and text is super easy for people, but super hard for computers. Constructing a feature set that can make this distinction with high accuracy would allow for crazy things to be done with computer-aided sketch recognition. It's frustrating when you have a domain that could benefit from the inclusion of handwriting and you find out that you suck at telling text apart from shapes. Someone needs to make this their thesis work.

Tuesday, November 2, 2010

Reading #12: Constellation Models for Sketch Recognition

Comments:
Sam!

Summary:
A constellation model is a 'pictorial structure' model used to recognize strokes of particular classes. Each model is trained with labeled data in order to provide a higher probability of a successful match with testing sketches. This model works by looking for required and optional parts of each sketch with consideration to how each part is related to others. It is based on two key assumptions: a single instance of a mandatory part is in a sketch, and that similar parts will be drawn with similar strokes.

The constellation model was tested on facial recognition. Parts of a face (eyes, mouth, beard) were checked for both existence and spacial relation to other parts. A probability distribution for each object was calculated by training the recognizer with labeled data. A maximum likelihood search is then run to determine what an object 'is'. A sketch is checked multiple times as new objects are labeled so as to take advantage of the relational nature of the recognizer.

Discussion:
At first, I did not understand how only one of a required object could ever get the job done. Cyclops-only facial recognizer? But the authors state that each eye is treated as a different required object, thus bypassing this limitation. If the authors carry out their idea of having primitives be constructed from multiple strokes, then this model-based approach would afford a larger degree of freedom. Regardless, I like the idea.

Reading #11: LADDER

Comments:
Danielle

Summary:
LADDER is a sketching language used to describe how diagrams in a domain are drawn, displayed, and edited. With LADDER, creating a sketch system for a new domain is simply a matter of writing a domain description. Such a description should include what the shapes look like and how they should be displayed and edited. Low-level shapes can be reused to create more complex shapes, thus simplifying the domain description. Users can also specify hard and soft constraints in order to better recognize different shapes or their subsets. LADDER shapes must have a graphical grammar, be distinguishable based only on LADDER's supported constraints, and have limited detail (thus aiding recognition and saving time).

The constraints that LADDER affords can be custom made or selected from the initial set. Examples include parallel, contains, above, and posSlope. Users can also specify editing options that override shape recognition, and can view beautified versions of drawn shapes. LADDER is the first language that allows users to specify how shapes are recognized, displayed, and edited.

Discussion:
I like the fact that some of the initial domains tested with LADDER are ones that we have worked with again in our class. The complexity of COA diagrams was definitely increased in the data sets we were viewing! Anyway, LADDER is pretty awesome. Every sketch assignment that I have worked on used it, and thus it is hard for me to imagine not thinking in terms of constraints and subshapes.

Monday, September 27, 2010

Reading #10: Graphical Input through Machine Recognition of Sketches

Comments:
Jorge

Summary:
This paper, written in 1976, outlines the things that a sketch-based system must be able to do in order to be of use to humans. It outlines three experiments in computer processing of sketches.

HUNCH: Can a computer interpret a sketch without knowing the domain? Such was the question behind experiment one, known as HUNCH. Testing was done using an actual pen and big piece of paper set over the surface of a drawing tablet (this was done to mimic a natural sketching environment). HUNCH essentially found corners by marking a slower drawing speed as a corner. It originally "latched" together endpoints, meaning that if two endpoints were within a certain radius of each other, they were said to meet. The system was then augmented to use speed to determine 'intended' endpoints. Experiments were also conducted to determine and interpret over-traced lines, project into the third dimension, and find paths between rooms in a hand-drawn floor plan. The conclusion was that context is important- even at the lowest levels of interpretation.

Experiment 2: Specifying the context of drawings was tested next. They found that "... the system is only as good as matching machinery". In other words, context can help, but it doesn't guarantee that the user will draw everything that the computer needs to identify the proper context.

Experiment 3: An interactive system that combines the strengths of HUNCH and Experiment 2 was tested next. For a user to interactive with the system, basic recognition must be done in real time. Measuring the speed of a stroke ad how bent it is can help with this.

The conclusion of the paper mentions how the goal is to allow users to modify interpretations as needed. A sketch-based system should be smart, adaptable, and easy to interact with.

Discussion:
Part of my truss recognition algorithm uses latching, and has similar issues. Adjusting the radius used to the length of the strokes in question could definitely help me out!

Really my one issues with this paper is that it seemed to end abruptly. I liked where it was going and the concise discussions of experiments and findings, but then all the sudden it was over. The conclusion could have been a page or more in length! I am thinking that it was cut short to meet some sort of page cap. Regardless, I liked this paper because it made me think of new ways to solve problems and create interfaces.

Reading #9: PaleoSketch

Comments:
Danielle

Summary:
PaleoSketch provides highly accurate low-level primitive recognition and beautification. It places very few constraints on the user, thus allowing them to focus on their drawings and not on having to learn the software that is implementing the recognizer. Paleo takes a stroke and runs it through a series of 8 different classifiers. Each classifiers then returns whether or not the stroke could be of that primitive type along with a beautified stroke if it matches. Paleo also incorporates two new features: the normalize distance between direction extremes (NDDE) and the direction change ratio (DCR). The NDDE is basically the percentage of the stroke that lies between the two extremes of slope. The DCR is the maximum change in direction divided by the average change in direction, with the first and last 5% of the stroke ignored.

The different possible interpretations for a stroke are stored in an interpretation hierarchy. This hierarchy is based on the minimum number of corners that would result when classifying a stroke a certain way. The resulting stroke with the fewest corners is chosen as the best interpretation. Because each interpretation is computed, however, users can easily choose an alternate interpretation if the chosen best match is not what they meant to draw.

Above you can see a blurry PaleoSketch do work on some recognition tests.

Discussion:
98.56% of the time, it works every time. But really, PaleoSketch pushes it to the limit. It seems like the only think Paleo cannot do is force users to not be lazy. Some of the recognition errors resulted because a tester drew a circle that looked exactly like an ellipse. The other errors were from complex shapes that were combinations of primitives. Can you really fault something for doing its job too well? I, sirs and madams, cannot. PaleoSketch is awesome.

Thursday, September 9, 2010

Reading #8: Lightweight Multistroke Recognizer for UI Prototypes

Comments:
Yue Li

Summary:
When the $1 Recognizer grew up, it evolved into the $N Recognizer. $N is a lightweight, multistroke recognizer that can provide increased accuracy and decreased time via the addition of optional optimizers. Again, the focus is on providing designers with an easy to implement and use recognition system for augmenting their software and designs. Here are the changes from $1:

A novel way to represent a multistroke as a set of unistrokes representing all possible stroke orders and directions
The conception of a multistroke as inherently a unistroke, but where part of this unistroke is made away from the sensing surface
The recognition of 1D gestures (e.g., lines)
The use of bounded rotation invariance to support recognition of more symbols
An evaluation of $N on a set of handwritten algebra symbols made in situ by middle and high school students working with a math tutor prototype on Tablet PCs

In order to not be super annoying, and thus allow programmers to just draw a template with a single orientation and stroke order, $N automatically calculates and stores the "unistroke permutations" of the provided multistroke templates. This permutation treats the template as a unistroke design where part of the stroke occurs off of the drawing surface (think about it being invisible). They provide a nice example of this here:

Through their user study, it was found that $N had a 96.6% accuracy when using 15 templates per shape. Additionally, a 96.7% accuracy was obtained using 9 templates of the original gestures tested with $1.

Discussion:
If stroke order and direction are not important, then the $N recognizer seems to be pretty awesome. In some cases, as in accommodating left and right-handed users, the ability to match the final gesture is very important because it minimizes user frustration and increases system accuracy.

Reading #7: Sketch Based Inferfaces

Comments:
THE GROVE MASTER (CAPSLOCK IS STUCK)

Summary:
A user interface that feels like smart paper. Combined with the goal of direct manipulation, you have the basis for this paper by Sezgin and company. The paper's focus is on the first step of the sketch recognition process- converting pixels into geometric shapes. This process is broken up into three phases:

1. Approximation: Minimize error and avoid overfitting.
The first part of approximation is vertex detection. Taking advantage of features such as slower speeds and increased curvature at corners, the authors find outliers above a computed threshold and treat them as potential vertices. Computed features are then combined in an attempt to further drop false positives.

The second part of approximation is curve handling... read that for yourself...

2. Beautification: Modify output to be more visually appealing
This step is basically a line straightener. Lines that are in groups are rotated by their midpoints to try and maintain close connections at vertices.

3. Basic Object Recognition: Produce stroke interpretations
Ovals, circles, rectangles, and squares are basic objects. Template matching is employed to detect these geometric objects.

The authors found that people liked being able to use multiple strokes to draw a single object (go figure). The shapes used in the study were pretty crazy, thus proving the system was capable of being awesome.

Discussion:
I didn't really connect with this paper... I'm not sure why. Maybe they just didn't stress the impact that their system had enough for me to identify with it. Can anyone clear that up for me?

Tuesday, September 7, 2010

Reading #4: Master Sutherland vs. the Machine

Comments:
liwenzhe

Summary:
Ivan Sutherland created a sketch-input system named Sketchpad before even the mouse was being used. Sketchpad is a light pen / giant board of buttons combination that allows users to draw items (known as symbols), and place constraints upon them. The computer stores these constraints and symbols so that their properties and the visual drawings themselves can be reproduced. Users can even add new symbols to the library to use in the future. That's basically the origins of object-oriented programming we are talking about, people! Not to mention GUIs, Computer-Aided Drafting (CAD), and computer graphics in general. The display had some crazy zoom for adding details (Sutherland mentions that a 7 inch section of a 1/4 mile long side), and the reusable nature of symbols, and the fact that information attached to them was retained, allowed Sutherland to show off some impressive examples.

Sutherland does note that in the instance of electrical circuit diagrams the user felt that drawing by hand would be faster. But again thanks to the stored symbol library, once the needed objects were correctly drawn and constrained, even this medium could be expanded upon and prove useful to engineers. A positive example that stands out to me is the bridge idea that he tested. He essentially created a library and set up constraints to handle free body diagrams using Sketchpad. This allowed him to test how different loads and supports would affect members of the truss structure. You're welcome, AutoCAD!

Discussion:

"It is only worthwhile to make drawings on the computer if you get something more out of the drawing than just a drawing."

That quote should be considered by anyone who creates something based on a sketch system. Sutherland himself stresses this in his paper, and sadly it has yet to be accepted as a field standard.

I wonder if Sutherland realized what he was really doing when he designed Sketchpad... did he just create something to get his PhD thesis taken care of? Or did he really set out to revolutionize computer design and capabilities? This paper was awesome. Sutherland is awesome. Man and machine are friends.

Sunday, September 5, 2010

Reading #5: $1 Recognition

Comments:
liwenzhe

Summary:
97% accuracy with only one provided example. Such is the boast of the $1 Recognizer. This paper by Jacob Wobbrock, Andrew Wilson, and Yang Li (the man behind Protractor) explains the implementation and reasoning behind the creation of an easy and cheap recognizer designed for novice designers. The authors stress that gesture-based interaction can be very useful, but that it was previously difficult for non-experts in the field to implement a system. The authors even provide this very nice summary of contributions:

1. be resilient to variations in sampling due to
movement speed or sensing;
2. support optional and configurable rotation, scale, and
position invariance;
3. require no advanced mathematical techniques (e.g.,
matrix inversions, derivatives, integrals);
4. be easily written in few lines of code;
5. be fast enough for interactive purposes (no lag);
6. allow developers and application end-users to
“teach” it new gestures with only one example;
7. return an N-best list with sensible [0..1] scores that
are independent of the number of input points;
8. provide recognition rates that are competitive with
more complex algorithms previously used in HCI.

So what are the limitations of this $1 Recognizer? Because it is rotation, scale, and position invariant, it cannot tell the difference between squares and rectangles, ovals and circles, vertical and horizontal lines, etc. In some instances, the differences between these sorts of things may be critical, and the authors are quick to stress this to readers. By testing $1 against Rubine and Dynamic Time Warping (not a form of space travel), the authors found that their solution was highly accurate. Not too shabby for a $1 charge.

Discussion:
The authors themselves brought up my own biggest issue - was the $1 Recognizer actually easy for novice users to implement? As of this paper, that fact was yet to be determined. If the whole focus was to help more people integrate and use gesture recognition, then it might be important to actually have some people implement it and evaluate it themselves. Regardless, I think that by using a few simple modification tricks, the $1 Recognizer is a great example of gesture recognition for the laymen programmer.

Thursday, September 2, 2010

Reading #1: Hammond Blog (apparently it's required)

Comments:
Yue Li

Summary:
Dr. Hammond's paper entitled 'Gestured Recognition' is essentially an introduction to... well... gesture recognition. The first thing to keep in mind is that a gesture represents the path of the pen. As such gestures must be drawn in a single stroke and in the same direction or they will not match up to other preset or example gestures. This paper discusses a few of the key research done in the field of gesture recognition. It begins with a discussion of Rubine's recognition method which is based on 13 calculated features of a gesture. Next, it outlines Long's quill system, which used a total of 22 features to classify gestures on the fly, and to provide system designers with feedback about the gestures they are using. Finally, Hammond's paper talks about the $1 recognizer developed by Jacob Wobbrock. This recognizer standardizes gestures to make matching between templates and new input a faster process.

Discussion:
Dr. Hammond's paper serves as a great compliment piece to the other papers by each respective author that we {have, will, are supposed to} read during the first week or so of class. The section focusing on Rubine's features was especially helpful to me. My only problem is that it wasn't really made clear that we had to blog about it! Get on that, Paul!

Reading #6: I've got a Protractor!

Comments:
Hong-Hoe Kim

Summary:
Protractor is a template-based, single-stroke gesture recognizer. It calculates the similarities between gestures via a nearest neighbors classifier. This means that when a new gesture is input by the user, Protractor compares it to the stored examples within the system to determine the best (nearest) match. Gestures are processed to be equal-length vectors, and the nearest neighbor is thus the one with the smallest angle between it and the input gesture (think text documents in the vector space model). Protractor differs from its closest companion, the $1 recognizer, in the fact that it takes up 1/4 of the memory and boasts a faster recognition time. The smaller space requirements can be attributed to the set 16 points that are equally spaced throughout the length of the gesture. Yang Li, the creator, stresses that this increased efficiency makes Protractor ideal for mobile touch-based systems.

Discussion:
I was somewhat familiar with the idea of using vector space for comparisons thanks to Dr. Caverlee and his Information Retrieval course. Regardless, the equations in the paper owned me at first. A potential issue with Protractor is the fact that only 16 points are sampled from each gesture. What if the stroke is extremely long, and therefore a lot of crazy stuff happens in between the sample points? In this case, the standardized length of the resulting vector would be based on an incorrect set of assumed points. This might limit the gestures recognized to broader, simpler patterns. But even this could be a good thing because people can learn more when faced with less complexity. Catch 22?

Reading #3: You're doing it wrong!

Comments:
Jianjie Zhang

Summary:
Long, Landay, and Rowe noticed that people were incorporating new interaction technologies into their user interfaces, but that they did so naively, causing issues with recognition and frustrating users. To alleviate these frustrations in a gesture-based system, they developed quill. quill continuously analyzes gestures and warns designers when they may be confused with other gestures by the computer or users. Long and

company use the same algorithm that Rubine developed to train their system (about 15 examples per class), but focus everything on the designer end.

Because quill is giving advice back to designers, Long and company found that the user interface was very important. Their three main areas of consideration were the timing of advice, the amount of advice to display, and the content of said advice. In the end, messages are most often displayed when the designer is testing gestures. They are kept concise, and include hyperlinks to more thorough explanations and solutions. Finally, the messages are written plainly in English to avoid ambiguity and frustration.

Discussion:
Cool story, bro! But really... I liked the idea of giving designers feedback on their purported gestures. Embracing new technologies is great, but if you don't consider users and interpretation in your system design then it's better to not even use the technology! I also liked how part of the paper focused on the feedback system and its challenges. That topic is still of great relevance today, and is something that we all have considered or will consider during our own projects and applications.

Wednesday, September 1, 2010

Reading #2: GRANDMA lover

Comments:
The Grove Master!

Summary:
GRANDMA, or Gesture Recognizers Automated in a Novel Direct Manipulation Architecture, is Dean Rubine's spark that ignited the field of gestured-based sketch recognition. It is based on single-strokes, as demonstrated by the GDP (Gesture-based Drawing Program) that he references throughout this paper. Adding gestures is done via training sets, with a recommended 15 examples being given. The background functionality of these gestures is set by the user, and then the GDP is ready for testing.

Strokes are composed of sets of points in the fashion which denotes X-coordinate, Y-coordinate, and time, respectively. Using this information, Rubine calculates 13 key features of a stroke. Given this new set of features, Rubine then finds the best match as compared to a preset gesture. This is done with a linear classifier. With 15 examples for each class to match against, Rubine found that he could achieve a 96% success rate.

Discussion:
Rubine is obviously a complete beast. It might seem trivial to us now that he used a classifier to best-match strokes, but at the time it must have been an insane idea. I also prefer the idea of rejection over undo. It makes more sense to me to set a certain lower threshold for recognition, and reject strokes that do not reach it. In this way, user inattention will not lead to bad examples being added to the test set. But then again, I'm not sure if accepted recognitions are used to increase the training set or not. If it was in there, I don't remember seeing it. All in all, I think this paper was a good introduction to gesture-based recognition.

Let me hit you with some knowledge

Contact: heychrisaikens@gmail.com

Standing: 1st year Masters

Why I'm here: Sketch Recognition is becoming a very key part of my studies. It's time that I learn the trade.

Experience: One semester of research in SketchRec via the SRL@TAMU.

10 years from now: I plan on running my own company. Pervasive Systems, anyone?

The next big thing in CS: My company. No not really... Probably advances in connectivity and the integration of computers into even more facets of our lives.

Favorite course of yesteryear: Information Retrieval.

Favorite movie: Sunshine. Science Fiction + The Human Condition -> Win.

Time traveling: I would find the inventor of the time machine and destroy his plans. We have enough to worry about without the time/space paradox thrown in.

Interesting fact: I love Sprite Zero. That stuff is off the chain. Also, in retrospect, this Grey is hard to read.

Search my stuff