CSCE 624: Sketch Recognition: November 2010

Wednesday, November 3, 2010

Reading #28: A leap through the reading list to iCanDraw?

Comments:
PaPaPaco

Summary:
iCanDraw? is the first system ever (EVER) to provide direction and feedback for drawing faces with a computer. The goal of iCanDraw? is to actually teach people how to draw the human face using actual metrics and techniques. It starts by generating a template of the face to be drawn so that the feedback can be tailored to the current task. Feedback is provided explicitly when the user finishes a step and asks for input. The system even includes some helpful features such as erase and undo gestures, reference line markers, and straight edges to provide guidelines. The user is led through drawing a face piece by piece, with help provided at the termination of each step. They can then choose to make changes or keep what they have. The users are not, however, constrained to certain steps and have the freedom to draw as much as they want or make past corrections.

The authors found 9 key design principles for assisting the act of drawing via sketch recognition. Also, they found out that users gained confidence after using the assistive system.

The omitted middle session of a different image with feedback turned on led to the user drawing a somewhat creepier and more correct baby.

Discussion:
iCanDraw? software is in our lab, and I have even seen people use it, but I have yet to use it myself. And after reading this paper I want to! I think that having someone or something actively (but unobtrusively) provide feedback is an excellent way to hone ones skills without getting frustrated. I am guilty of "teaching" myself something and it being horribly, horribly wrong (you do not drive with both feet). Teaching tools are awesome. The End.

Reading #14: Shape vs. Text. The Ultimate Showdown

Comments:
Sam

Summary:
Text vs. Shape returns in this paper, with an all star ink feature paving the way to high accuracy.
Entropy is the measure of the degree of randomness in a source (that is from the paper, I swear). Text strokes are normally more dense than shape strokes, thus giving them a higher entropy. And from here, the authors go beast mode.

The authors created an entropy model 'alphabet' that is used to assign a symbol to the angle a points makes with its two adjacent points in a given stroke. Printing out the entrobet (now that one I made up) provides a visual cue as to the changes that a stroke undergoes in terms of curvature. The points are measured 4 pixels apart and the assigned values are averaged over the bounding box of the stroke in order to ensure scale independence. Testing data was measured for the percentage of strokes that were classified as shape or text, and the accuracy of said classifications. Overall, the entrobet had an accuracy of 92.06%.

Discussion:
I did not know this was an SRL paper until they began talking about SOUSA in the data collection section (I skipped over the author's names somehow). Good thing I didn't say anything bad about it! But seriously, entropy as a measure for shape vs. text can be deemed a goto option based on the research presented in this paper. The issue of dashed lines in the COA data set accounted for a high level of the errors, so including a system that can pre-process out these dashes lines would lead to even greater accuracy. How would you remove shapes made up of dashes?

Reading #13: Ink Features for Diagram Recognition

Comments:
Amir

Summary:
Ink features are another name for... well ink features. Curvature, time, speed, intersection, and more are calculated and used to distinguish between different shapes and between shapes and text. In this paper, the authors look at different features to determine which ones actually aid in the shape/text division. A total of 46 ink features were tested over 1519 strokes drawn by 26 different participants. Each sample sketch included a mixture of text and shapes that the authors felt was representative of the overall use of computer-aided recognition.

In the end, 8 different features were found to really make a difference (as shown in this figure). Or do they? Upon testing, the authors found that using these ink features is beneficial, but that not all of them together provide the best results. Inter-stroke gaps, for instance, are much more helpful.

Discussion:
Making the distinction between shape and text is super easy for people, but super hard for computers. Constructing a feature set that can make this distinction with high accuracy would allow for crazy things to be done with computer-aided sketch recognition. It's frustrating when you have a domain that could benefit from the inclusion of handwriting and you find out that you suck at telling text apart from shapes. Someone needs to make this their thesis work.

Tuesday, November 2, 2010

Reading #12: Constellation Models for Sketch Recognition

Comments:
Sam!

Summary:
A constellation model is a 'pictorial structure' model used to recognize strokes of particular classes. Each model is trained with labeled data in order to provide a higher probability of a successful match with testing sketches. This model works by looking for required and optional parts of each sketch with consideration to how each part is related to others. It is based on two key assumptions: a single instance of a mandatory part is in a sketch, and that similar parts will be drawn with similar strokes.

The constellation model was tested on facial recognition. Parts of a face (eyes, mouth, beard) were checked for both existence and spacial relation to other parts. A probability distribution for each object was calculated by training the recognizer with labeled data. A maximum likelihood search is then run to determine what an object 'is'. A sketch is checked multiple times as new objects are labeled so as to take advantage of the relational nature of the recognizer.

Discussion:
At first, I did not understand how only one of a required object could ever get the job done. Cyclops-only facial recognizer? But the authors state that each eye is treated as a different required object, thus bypassing this limitation. If the authors carry out their idea of having primitives be constructed from multiple strokes, then this model-based approach would afford a larger degree of freedom. Regardless, I like the idea.

Reading #11: LADDER

Comments:
Danielle

Summary:
LADDER is a sketching language used to describe how diagrams in a domain are drawn, displayed, and edited. With LADDER, creating a sketch system for a new domain is simply a matter of writing a domain description. Such a description should include what the shapes look like and how they should be displayed and edited. Low-level shapes can be reused to create more complex shapes, thus simplifying the domain description. Users can also specify hard and soft constraints in order to better recognize different shapes or their subsets. LADDER shapes must have a graphical grammar, be distinguishable based only on LADDER's supported constraints, and have limited detail (thus aiding recognition and saving time).

The constraints that LADDER affords can be custom made or selected from the initial set. Examples include parallel, contains, above, and posSlope. Users can also specify editing options that override shape recognition, and can view beautified versions of drawn shapes. LADDER is the first language that allows users to specify how shapes are recognized, displayed, and edited.

Discussion:
I like the fact that some of the initial domains tested with LADDER are ones that we have worked with again in our class. The complexity of COA diagrams was definitely increased in the data sets we were viewing! Anyway, LADDER is pretty awesome. Every sketch assignment that I have worked on used it, and thus it is hard for me to imagine not thinking in terms of constraints and subshapes.

Search my stuff