Search my stuff

Monday, December 13, 2010

Reading #30: Tahuti

Comments:
Sam-bo

Summary:
Tahuti is a dual-view sketch recognition environment. It shows users both their original strokes and the interpreted UML view of them. Users are able to sketch just as they would on paper and have the system create the UML structures that would correspond to their sketches. Tahuti uses a multi-layer framework to process, select, recognize, and identify strokes. The paper provides some nice information as to how each step is performed and what algorithms are used.

Through user studies, the authors discovered that Tahuti's interpreted view was deemed to be easier to drawn in and easier to edit in than comparative systems.

Discussion:
This paper was written in 2002. Since that time, a number of different systems have been designed that aide users in creating UMLs. I am not sure that a sketch-based approach to this task is relevant or efficient anymore, but at the time it seems like it was an excellent idea. Limiting user frustration is a must!


Full Blog Completion Status Achieved!

Reading #29: Scratch Input

Comments:
Sam

Summary:
Scratch Input is an acoustic-based gesture recognizer. It uses a modified stethoscope with a single microphone embedded into it in order to capture the propagation of sound waves through a solid, flat surface. Scratches have a high frequency, and thus frequency thresholds are employed to eliminate almost all noise from the system. Gestures are distinguished mainly by the number of amplitude peaks in their signal. People slow down when approaching corners in a drawing, and thus the lowest points in a signal correspond to "corners" in a gesture. Strokes can thus be segmented as the peaks between these corners. The single input sensor used was unable to differentiate between gestures that contain the same number of strokes, but has an accuracy of 90% for the gestures tested. The hardware device designed by authors was extremely affordable, thus allowing for Scratch Input to be applied to a variety of large surfaces as needed by potential users.

Discussion:
Good ol' scratch input! Our Sound Board project was based heavily off of the ideas in this paper. I would love to see the authors return to this work and employ multiple sensors. That would essentially do what Drew, George, and I tried to do with our 3rd and Final projects.

Reading #27: K-Sketch: A “Kinetic” Sketch Pad for Novice Animators

Comments:
Francisco the Awesome Millionaire with the Diamond Suit

Summary:
K-Sketch is a 2D animation sketching system designed for novice users. The authors conducted interviews with both animators and non-animators in order to come up with a range of tasks that could be supported by K-Sketch. They then implemented a nice set of features that allow users to quickly modify their basic sketches so that they can carry out simple animations.

In a laboratory experiment that compared K-Sketch to a more formal animation tool (PowerPoint), participants worked three times faster, needed half the learning time, and had significantly lower cognitive load with K-Sketch.
That sums it up!

Discussion:
K-Sketch reminds me of a sketch-based Prezi tool. It includes lots of features, some of which are are mapped strangely, and allows users to create some pretty cool stuff given a little bit of time and some patience. It seems like the system would really allow users to express themselves, which is the best thing that could be hoped for.

Reading #26: Picturephone: A Game for Sketch Data Capture

Comments:
Marty

Summary:
Picturephone is a sketching game used to gather labeled data in a way that is fun (entertaining) for users. Picturephone works in a similar manner to that of the game Telephone, wherein players repeat phrases to each other in a linear fashion and see how the phrase evolves with each iteration. In Picturephone the players alternate between drawing a sketch and describing it. An additional player is then assigned the task of judging how similar the sketches are. In this way, the authors manage to get labeled data and a relevance/accuracy metric without having to do any of the work themselves.
Discussion:
Picturephone was discussed in reading #24. I liked it then and I like it now. Again, these types of sketching games seem like a good way to gather labeled data without having user repetitively draw the same shape over and over and over.

Reading #25: A descriptor for large scale image retrieval based on sketched feature lines

Comments:
Paco

Summary:
So you need to find an image online... do you describe it in words? What if you describe the wrong parts of the image because those are what you deem important? How do you quantify an image's importance? You could draw the image, but then why do you NEED it if you can draw?!

Ok, so obviously sketching to search for images would be cool. So cool in fact that the authors of this paper developed a system that does it. Their system is designed to query beastly databases containing millions of images. Actual images and the user's input sketch are preprocessed the same way which allows for matching based on similar descriptors.


Database image descriptors are cached in memory, and clusters are created based on similar colors. Searches take up to 3.5 seconds.

Discussion:
Awesome! Like Paco and I were talking about, this idea could be used to teach both users and the system words in different languages. If you draw simple items such as a tree or a cat, then you could also provide the written description (word) in your native language. Once you select a result, the system could then "learn" that your word describes that image and be able to employ cross-language searches in the future.

Reading #24: Games For Sketch Data Collection

Comments:
Kim!

Summary:
The authors of this paper are interested in allowing users to freely move between sketching domains rather than be restricted to a certain one. This allows for a more natural sketching session akin to the use of pen-and-paper. In order to gather data on sketches and user-provided descriptions, the authors implemented a multiplayer sketching game.

The two online games created for data gathering are called Picturephone and Stellasketch. With Picturephone, players switch between describing a scene and drawing it. The next user interprets either the drawing as a new description or the description as a new drawing. Players then rate the various drawings to denote similarity (the more similar the better). The Stellasketch game is similar to Pictionary. A single user is given a topic and begins to draw it. The other players privately label the sketch with their guesses at various stages in its design process. Because users enjoyed playing the game, the authors were able to gather labeled sketches in the background.

Discussion:
I like this idea. You can hide data gathering techniques in games that people enjoy playing. We should implement something like this into Sousa studies because sometimes the redundancy of providing examples is tiring. Make users label their own stuff! Reduce your workload!

Reading #23: InkSeine: In Situ Search for Active Note Taking

Comments:
SAM

Summary:
active note taking - capturing and extending creative ideas, reflecting on a topic, or sketching designs.

InkSeine is a fluid interface designed to allow users to engage in active note taking. It employs a handwriting recognizer in order to allow users to add a new depth to their notes with the incorporation of searches that can serve as extensions to their selected note or information feed. It also uses gestures such as lasso to trigger actions such as searching for the encircled, hand-written phrase. Sketch recognition techniques are used to aide users in their sensemaking tasks, and are done so intuitively. The authors took the time to conduct initial user studies with lo-fidelity prototypes in order to maximize usability and focus on potential user scenarios and tasks. Context-based searches minimize cognitive overhead and, based on the authors' formative studies, lead to happy users.

Discussion:
This paper seems like something you would read in Dr. Kerne's class. It is an excellent example of iterative design, user interface concerns, and affordances and mappings. Don Norman is probably using InkSeine right now trying to figure out how some poorly designed door opens. I also like the author's use of popup windows that do not require users to navigate away from their current tasks just to view initial results. Good design and good use of sketching.

Reading #22: Plushie: An Interactive Design System for Plush Toys

Kids made these!


Comments:
Geroge! Again!

Summary:
First things first... go read This Blog I wrote on Teddy. All done? Good.

Plushie is a pretty cool extension of a 2D to 3D modeling system that uses gestures to perform different editing options. Plushie affords (shocker!) creating original plush toys. A big contribution that the authors make is that they provide dynamic feedback as to what the user's model looks like and what its 2D textured pieces look like as well. They tested the system with kids, and even they were able to make some awesome new plush toys with minimal difficulty. A system that kids can use and enjoy is a success.

Discussion:
I had no idea that so many people were interested in creating plush toys from 3D models. I don't think it's particularly weird or anything... but it's definitely interesting. I liked the fact that the authors allow users to change their 2D textures and receive constant updates on them as they modify their 3D model. And as I mentioned, making something fun that kids can enjoy is great. Reel those kids into computer science as early as possible!

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design

Comments:
Geroge!

Summary:
Teddy affords the user drawing 2D strokes and then automatically constructs potential 3D polygonal surfaces based on said strokes. Users interactively specify the silhouettes of objects, and the system attempts to create a 3D model which would match that silhouette (yes, I kind of just repeated myself). Keep in mind that Teddy was designed for rapid approximations. And the authors succeeded in this goal! Users were able to create basic models after as little as 10 minutes of getting used to the system.Once Teddy generates an initial 3D shape, users are able to view their model from different angles and can modify with various gestures (as shown above). The rest of the paper focuses on the algorithms used to perform the various modeling operations.

Discussion:
What I took away from this paper is that basic gestures can be used to perform some heavy back-end stuff... and to create animations that kids would love. What is important about this work is that the gestures seem intuitive, and thus users can understand how everything works quickly. I recommend that you look at the examples of each action for yourself, as this paper is filled with great screenshots of gestures in action.

Saturday, December 11, 2010

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches

Comments:
Sam

Summary:
MathPad lets users draw mathematical expressions, symbols, and diagrams. Some diagrams can even be animated by the system! The authors suggest that allowing users to visualize their problems (which they will naturally do on pen and paper anyway) can help them in their sensemaking tasks.

As shown in the table above, basic gestures are used to inform MathPad of user intent for various items. Expressions are further converted to strings that can be evaluated by MatLab. MathPad also includes a nice set of computational functions that can aide users. And you can change your stroke color to help organize your work. It's the little things that count.

Discussion:
I did not go into much detail about MathPad, but do not let that in any way discredit it. The system sounds great! It also reminded me of Mechanix because the authors gave some consideration to usability and the aides they could provide to users. I think Mechanix takes it a step further in terms of features and freedom of drawing (recognition is off the chain), but MathPad sounds like a very smartly designed system all around. Good show.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields


Comments:
Jonathan

Summary:
The recognition method discussed in this paper is based on Bayesian conditional random fields (BCRFs). BCRFs consider both spatial and temporal information, and can correlate features. CRFs are prone to overfitting, meaning that they do awesome for training data and horrible for new data. You could simulate this failnomenon by using the same training files over and over and over when building your feature set.

The authors are interested in discriminating between the containers and connectors in organization charts (see the figure at the top). They had 17 participants draw the chart shown, and ran 5 different algorithms to test the classification. The BCRFs proved to have the best recognition rates.

Discussion:
This paper was out of my league. Do you ever read something that makes you feel like you don't actually know anything about a given field? That was this paper. A little bit of side Googling returned some helpful links on BCRFs, etc., but I still felt lost. The results section basically showed me that everything they did was awesome and that it worked.

Reading #18: Spatial Recognition and Grouping of Text and Graphics

Comments:
Sam

Summary:
This paper discusses a spatial approach to the grouping and recognition of sketches. The process, as shown in the stolen image below, can be done in real-time. Strokes near each other are shown with the labeled graph in (b). Shapes are computed and matched to templates in (d). Templates return potential scores (0 to 1) which are used to determine best overall classification for the user's strokes.

I know what you're thinking... isn't speed an issue here?! The neighborhood graph in (b) helps to eliminate possible classifications based on vertex count and proximity. The authors also discredit potential strokes consisting of K components, where K is the number of strokes in the current largest template. Oh, and everything is based on machine learning (including the A* search). A user need only provide examples.

Discussion:
AdaBoost sounds like a deliciously nerdy energy drink. As the authors discuss in their... Discussion... an off the shelf system that is both efficient and accurate would be boss. If this work could be furthered to achieve similar results with fewer templates, then Rubine himself might rejoice and raise an AdaBoost toast to designer-accessible sketch recognition plug-ins.

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink

Comments:
Kim

Summary:
What is the probability that a system can interpret your text from graphics when you draw with a stylus? Such is the question behind the work in this paper. The system described is broken into three main approaches.

  1. Independent Strokes: Sequences of points between pen-down and pen-up events are taken to be strokes. 11 features are computed for each stroke. A multilayer perceptron is used to train a classifier as to which feature vectors correspond to either text or graphics.
  2. Hidden Markov Model (HMM): The order of strokes can lend a clue as to what they should be classified as (unless the user jumps between a letter and a shape because they are weird). By looking at overall classification patterns, the HMM can be used to predict the current stroke givent he last stroke.
  3. Bi-partite HMM: The gaps between strokes can lend additional information. A user will employ a different sized graph between two text strokes, two graphics, or a mixture therein.


Discussion:
I think I read this paper before... anyway! I did not like the way that they presented their results. Call me old fashioned, but I think you should always put your accuracies in plain old X (where X is your written language used). And was that plot drawn in paint? I felt like I was interpreting their findings rather than reading about them! Besides that, I thought the paper was very interesting.

Reading #16: An Efficient Graph-Based Symbol Recognizer

Comments:
Geroge

Summary:
This paper discusses (you guessed it) an efficient graph-based symbol recognizer. Using an Attributed Relational Graph (ARG), the authors can describe symbols in terms of their geometry and topology. A symbols geometry would include its primitive shapes and structures, which are treated as nodes. As the authors state, recognizing sketches involves matching graphs. And FYI, this can be hard as hell. So your user forgot to draw a primitive shape like a circle? There goes a node and a couple of edges. Although representing a symbol in terms of its topology allows for rotations and scaling to be matching, it cannot help the issue of missing components.


ARG for a perfect square.

Given training examples of each symbol class, the system constructs an "average ARG" for that class. To improve the average, the authors maintain stroke order and orientation across all examples. In testing, the system was proven to return the correct symbol in a top 3 list with over 93% accuracy.

Discussion:
I like how complex the symbols are that the authors tested in this paper. Some of them remind me of good old Civil Sketch. The trade-off between accuracy and speed of calculation is important, although even the longest recognition time only took 67.8 ms. Not too bad, but the fastest took 2.0 ms... but with accuracy around 79%. Accuracy vs. Time is an epic battle.

Saturday, December 4, 2010

Reading #15: An Image-Based Trainable Symbol Recognizer for Sketch-Based Interfaces

Comments:
JJ

Summary:
Image-based recognition with only a single provided template. Such is the boast of the system designed by Kara and Stahovich. This paper also outlines the unique and low-cost polar coordinate analysis which is used to achieve rotation invariance. A three step process that begins with rotational checks is used to prune possible templates for any given sketch. Each sketch is treated as a 48x48 bitmap image which preserves the input's aspect ratio. Template matching is then carried out through the use of four different techniques. Results of these four techniques are then "parallelized" and "normalized", resulting in values between 0 and 1 which are used to determine how close an input sketch is to the different templates. Through a series of tests, the authors proved that their system was able to recognize the sketches of amateurs using only one or two templates with an accuracy of over 90%.

Discussion:
Though I brushed over it in the summary, a great new idea coming out of this paper is the polar transformation used to handle rotations. It is very constricting to require users to always draw with the exact same rotation. You could always create templates for different rotations of the same gesture... but why waste the time when you can use something as efficient as the transformer presented here?