Sketch Recognition Blog: September 2007

Tuesday, September 25, 2007

Recognizing and Beautifying Low-Level Sketch Shapes with Two New Features and Ranking Algorithms

by

Brandon Paulson and Tracy Hammond

Summary

The authors discuss a new low-level recognizer which is able to recognize eight primitive shapes, and achieve recognition rates of 98.6%. The recognizer is capable of returning multiple interpretations, thus allowing for quick selection of alternate interpretations. The system can recognize: line, polyline, circle, ellipse, arc, curve, spiral and helix. The author gives a brief overview of the previous work; this section includes many if not all of the papers we have read previously.

The stroke goes through a pre-recognition process in which graphs and values are comptued for the stroke's direction, speed, curvature and corners. Corners are calculated using a multi-pass approach. Two new features are calculated: NDDE (Normalized Distance between Direction Extremes) and DCR (Direction Change Ratio). Curved shapes have a high NDDE, polylines lower; polylines have higher DCR values than curved strokes. Two additional test are done to determine if the stroke is overtraced or if it is a closed figure. For each of the the eight shapes there is a set of test where various thresholds for the various values must be compared. A hierarchy is used to determine whether a stroke is a stroke is interpreted as complex or polyline. Seven of the low-level descriptors are given scores. These scores are then added with the lower score being selected.

The recognizer was compared against the authors implementation of Sezgin, their recognizer, their recognizer without the two new features, and their recognizer without rankings. The recognizer is capable of running in real time. Correct shape interpretation was returned 99.89% of time (list being on average 2.69), 98.56% of the time the correct interpretation was the top. Circles misclassified as ellipses, complex shapes, and polylines being classified as complex shapes were the oft errors.

Discussion

I noticed that in the complex test sub-strokes are attempted to be combines into strokes, and those strokes are tested against the primitive. Just by looking at my Sezgin results I can see when this would be helpful. The future work section was good, it gave me ideas to cherry-pick for possible class projects.

Citation

Monday, September 24, 2007

Streaming Algorithms for Line Simplification

by

Mohammad Ali Abam, Mark de Berg, Peter Hachenberger and Alireza Zarei

Summary

Given a streaming input of data points, it may be impossible to store the complete set. A compromise would be to create an simplification or approximation ot those points. The simplification problem is give a path P with points p0, p1,..pn to find a path Q with fewer vertices. If Q is a subset of P it is a restricted version, in an unrestricted version this does not have to be so. There are two type of optimization problems min-k and min-delta. Min-k is given a maximum error find a simplification with as few vertices as possible that is below the error threshold. With min-delta, given a maximum of K vertices find the simplification with the smallest possible error that uses at most K vertices. Iterative algorithms are needed due to the streaming nature, and thus many existing line simplification algorithms cannot be used for streaming.

The error function between the simplified version and the original is calculated using Hausdorff and Frechet distances. The goal is to get competitive ratio in comparison between the optimal offline algorithm and the simplification algorithm.

The algorithm uses a priority queue to hold the vertex links. We compute the error of the incoming link, and insert it into the queue at the appropriate priority. Then we extract the vertex from the from the queue with the minimum priority and update the priorities of the vertices on both sides. The paper goes on to define c-montone as the error of a link cannot be worse that c times the error of any link that encloses it.

The paper goes on to discuss the Hausdorff error function, and discuss how the oracle makes use of that function. The same treatment if given to Freceht, who uses two parameters: width of the points and length of the largest back path. Each distance function is successful at differing type of point distributions.

Discussion

I had no trouble understanding the basic algorithm structure. I was a bit confused by the use of the oracles and the two distance function, but with a closer look and more examples I feel it should be come clear. I wonder how much of streaming simplification/beautification we will actually do. I cannot perceive us wanting to beautify the stroke as the user goes. If we can do such a simplification in the background it would give a head start when it comes time to process the stroke, and reduce the delay significantly. Can this be applied to arcs at all (I know they put it for a polygonal path), could the basic algorithm (not the distance functions) work?

Citation

Abam, M. A., de Berg, M., Hachenberger, P., and Zarei, A. 2007. Streaming algorithms for line simplification. In Proceedings of the Twenty-Third Annual Symposium on Computational Geometry (Gyeongju, South Korea, June 06 - 08, 2007). SCG '07. ACM Press, New York, NY, 175-183. DOI= http://doi.acm.org/10.1145/1247069.1247103

Thursday, September 20, 2007

A curvature estimation for pen input segmentation in sketch-based modeling

by

Dae Hyun Kim and Myoung-Jun Kim

Summary

The paper presents a new curvature estimation method that uses local convexity and monotonicity. The paper notes that for proper stroke segmentation good curvature estimation is needed. They note two classes discussed in previous work for accomplishing this goal: scale space decomposition (Sezgin) and estimating curvature directly from input data. Their contribution is a new means to determine region of support. They discuss the sliding method window. They provide two algorithms: one which provides curvature estimation with local shape information and the other which provides curvature estimation with local convexity. They were able to achieve a success rate at 95%. They mention comparing the four algorithms, but never give success rates.

Discussion

I had difficulty reading this paper; the organization left a lot to be desired. I have no idea why they discussed an evaluation test, and didn't feel the need to share the results. This is one of those concepts I feel I will better understand once I start to implement it. The change in curvature (convexity) makes sense as a means of recognizing points of curvature.

Citation

Kim, D. H., and Kim, M.-J. 2006. A curvature estimation for pen input segmentation in sketch-based modeling. Computer-Aided Design 38, 3, 238--248.

Wednesday, September 12, 2007

A Domain-Independent System for Sketch Recognition

by

Bo Yu and Shijie Cai

Summary

The paper notes that the two main deficiencies of existing sketch recognition systems are their strict restriction on the manner in which the user sketches and their inability to analyze hybrid or smooth curves and decompose them into primitive shapes. The authors list six attributes of a practical sketch recognition system: users can draw naturally, consistent recognition results, understands hierarchical relations among graphic objects, should try and guess what the user intended to draw, drawing modification should be possible, and easily integrateable into other systems.

The author's system has a two stage process. First stage is imprecise stroke approximation; second stage is post-process. The system also has three levels of hierarchical output; lowest is raw data (the original stroke points), middle level is the syntactic level (vertexes and primitive shapes), the highest level is semantic level (recognition of primitive shapes and basic objects). Vertex detection is combined with primitive shape approximation in their system. A stroke is taken and compared to a primitive shape. If the primitive shape is not a match, the stroke is divided at the highest point on the curvature curve. These sections are thus compared to primitives, and the process repeats. Their system makes no use of the speed of the data stroke.

Strokes can be recognized as lines due to a horizontal line on the direction curve and simply if the points can be fitted to an actual line. In circle approximation when the stroke is put onto a direction graph it tends to form a line with the slope (2*pi)/n. An arc should have a line less than a circle. As the slope decreases the line becomes flatter, approaching the horizontal slope of a line segment. Self intersecting strokes pose a problem. Their system addresses overlapping by merging the strokes.

The authors have created a "module" that can recognize basic objects. This module sounds similar to LADDER in that it describes using more complex shapes using simple shapes. Editing strokes are briefly discussed, no real mention of how to switch between modes is given.

The overall recognition rate of primitive shapes and polylines was near 98%. Arcs alone was 94%, and hybrid shapes with smooth connections between lines and arcs had a recognition rate of 70%.

Discussion

The authors note "another noticeable feature of our system is the convient and efficient user interface for modifying recognition result." I am still unsure how the system differentiates drawn strokes versus strokes used for editing. How does one switch into modification state?

The idea of being able to recognize primitives, and thus breaking strokes down until it becomes a set of primitives seems very intuitive. Also the notion of the module for recognition being built for specific domains based on low-level descriptions of objects seems familiar.

Decent paper, the 70% doesn't bode well for us though. In many of these papers the recognition rates seem a bit inflated, so for them to openly admit 70% makes me conservative about our ability to recognize certain shapes.

Citation

Yu, B. and Cai, S. 2003. A domain-independent system for sketch recognition. In Proceedings of the 1st international Conference on Computer Graphics and interactive Techniques in Australasia and South East Asia (Melbourne, Australia, February 11 - 14, 2003). GRAPHITE '03. ACM Press, New York, NY, 141-146. DOI= http://doi.acm.org/10.1145/604471.604499

Tuesday, September 11, 2007

Sketch Based Interfaces: Early Processing for Sketch Understanding

by

Tevfik Metin Sezgin, Thomas Stahovich, Randall Davis

Summary

This approach taken in this paper is a shift from the past few we have read (Rubine, Long). Sezgin et. al. believe that a sketch based system should respond to how an object looks, and not how it was drawn. The goal of these sketch systems is to allow people to work with computation tool in the same manner they work with people (messy, informal), but without losing the nice computational gains. The authors state that the first step in accomplishing this goal is to be able to convert from pixels to geometric objects. This early processing must be able to find corners and fit lines and curves.

Their early processing approach has three phases: approximation, beautification and basic recognition. To accomplish stroke approximation first vertex must be detected. Their scheme uses both stroke direction and speed data to accomplish the detection. The extremas are compared against an average mean threshold to remove noise. This works under the principle that pen speed drops when going around a corner. On certain drawings though pen speed will not recognize corners. Thus the intersection of the vertex found by each method is used. The paper goes into further detail on how a hybrid fit is calculated, and how curves are handled. Beautification, as described in the paper, is the minor adjustments of the lines to insure that the intent of the drawing is true. An example of intent could be parallel, which is difficult to draw freehand, but the intent is not hard to recognize, and thus correct strokes for. Recognition is accomplished with hand tailored templates that examine the properties.

A user study was performed comparing this system to drawing using XFig. Participant were almost unanimous in their selection of the sketching system as the preferred choice. Once again it is pointed out that this system puts no limitations on how the user wishes to draw the object. It is for the computer to understand the user, not for the user to sketch in a way the computer understands.

Discussion

"Just drink the Kool-Aid already." - Brian David Eoff Approx. 17 Minutes Ago

Does it seem bizarre this idea that the interactions between a user and a computer could be "messy"? Interaction between a computer seems to be about removing ambiguities, and if there was a problem it was caused by the user right? Maybe it doesn't have to be rigid? Maybe we can have the best of both worlds; the ability to take all the good capabilities computation provides coupled with natural human interaction in all its messy imperfect yet nuanced glory. Really think about this, the GUI was huge compared to the command line, but that was ~30 years ago. Aren't we due for something different? Even if it isn't always better (and for some things I still go to the command line) shouldn't there be something else? The next. If not, what have we done with all this computational power and connectivity we've been chasing?

Well now onto the paper. The paper makes an interesting observation that if user's draw slowly they are most likely being precise, and the system should respect that precision when recognizing the drawing. Also, the authors make not of their handling of overtracing, a natural movement that happens when one draws, that gesture recognition could not possibly recognize. This is a movement towards more looking at what the user drew and using how they drew it to clear up ambiguities versus solely depending on the physical act of the user drawing the object. Now that tablets can record tilt metrics on pen inputs could this be used along with speed and curvature to discover corners? The same with pressure. Were these values not available when this system was created, or were they deemed unnecessary?

Citation

Sezgin, T. M., Stahovich, T., and Davis, R. 2001. Sketch based interfaces: early processing for sketch understanding. In Proceedings of the 2001 Workshop on Perceptive User interfaces (Orlando, Florida, November 15 - 16, 2001). PUI '01, vol. 15. ACM Press, New York, NY, 1-8. DOI= http://doi.acm.org/10.1145/971478.971487

Tuesday, September 4, 2007

MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned From a Single Example Using a Dual-Classifier

by

Brandon Paulson and Tracy Hammond

Summary

The authors propose an algorithm that can recognize unconstrained symbols. Other systems put constraints on the sketches - they must be single stroke, oriented in the same way, or drawn at the same scale. The authors propose a system in which images can be retrieved using sketches the authors have connected to the image. There are three goals associated with the algorithm: recognize unconstrained sketch symbols, recognize symbols from only a single training example, and learn from successful queries to improve accuracy over time. The application created to demonstrate this algorithm is MARQS, Media Albums Retrieved by a Query Sketch. The system is able to use the user's search queries over time to train itself.

The system uses a linear classifier, with a set of global features, when enough examples are available. The system normalizes the orientation of each drawing, by putting an axis through the two farthest points. The system uses four global features: a bounding box, pixel density, average curvature and the number of perceived corners. When only a single sketch is available the features for it and the user contributed query sketch are compared. The error is calculated by the difference of the values, the sketch with the lowest error is chosen.

The single classifier was used 27% of the time, linear 73%. Average search rank of the single classifier was 1.7, the linear classifier 1.44, and the hybrid system 1.51.

The paper notes that stroke length and speed are not good features when symbols are drawn in multiple strokes.

Discussion

The system, while not requiring the user to train the system, does require the user to create a sketch for each item that they want to search for (whole albums, or single images). Users are often hesitant, or lazy, about creating metadata to go with their images. Also, the responsibility is on the user to create an image that they can mentally connect to the image (not an easy task). It could be argued though that this is a more personal system than trying to compare a sketch to the actual image.

I was surprised that Brandon and Dr. Hammond were able to get such performance out of a small number of features. It would seem like there would come a tradeoff involving how much performance could improve versus the computation and the time needed to calculate additional features.

Citation

Monday, September 3, 2007

Visual Similarity of Pen Gestures

by

A. Chris Long, Jr. , James A. Landay, Lawrence A. Rowe and Joseph Michiels

Summary

The paper is a precursor to the prior paper. This paper goes into more detail about the experiment to determine why users find gesture similar. The researches hope to roll these findings into a tool (quill) that is capable of notifying designers of gestures that are likely to be perceived as similar by users, that will be difficult for users to learn and remember and may be unrecognized by a computer. Similarity can affect how easily users can learn and recall gestures.

A brief overview of pen based interfaces is given. The authors point out that the handwriting recognition on the Newton was initially criticized (it recognized Cyrllic great by the way). Also noted is the power of gestures in that they can identify the operator and the operand simultaneously. The paper continues with an overview of perceptual similarity focusing on the psychology aspect and multi-dimension scaling.

The researchers ran two separate experiments. The goals of the first experiment were to determine what measurable geometric properties influenced similarity and to produce a model that could predict how similar two gestures would be viewed by a user. The similarity of the gestures is determined by the Euclidean distance between the feature sets of two gestures. Long etc. does propose additional predictors beyond Rubine, which they list as only having 11. The paper notes that the participants clumped into two separate groups, but gives no information about what distinguishes the two. The second study focused on varying different types of features affected perceived similarity.

Discussion

Why don't they list feature 12 and 13 of Rubine? Those are the features that take into consideration time data, and none of their additional features consider time. Also, Long etc. make note of the fact that users seem to fall into two separate groups, yet they give no additional information about those groups (i.e. the criteria for placement in said groups). I wonder how additional physical metrics of input (pressure of the pen, tilt of the pen) could be used in creating additional Rubin features?

Citation

Long, A. C., Landay, J. A., Rowe, L. A., and Michiels, J. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands, April 01 - 06, 2000). CHI '00. ACM Press, New York, NY, 360-367. DOI= http://doi.acm.org/10.1145/332040.332458

"Those Look Similar!" Issues in Automating Gesture Design Advice

by

A. Chris Long, Jr. , James A. Landay and Lawrence A. Rowe

Summary

This paper presents quill, an application that aids interface designers in determining if gestures are similar. The tool is also capable of advising designers on how to makes their gestures less similar, not only from a users view but also gestures that a computer might confuse as similar.

The researchers conducted three user studies to determine a model of gesture similarity. In the first two 42 participants choose the most different gesture from a group of three; in the third study 266 participants judged the similarity of pairs of gestures on an absolute scale.

The goals of quill is to allow designers with no advance knowledge of gesture recognition to create gestures that a computer can recognize, but won't confuse the user as too similar. The quill system learns gestures using the Rubine algorithm, and a training set provided by the user. Quill analyzes the gestures in the background and provides feedback if those gestures are too similar to other gesture classes. The paper goes on to discuss concerns about when to give feedback, and how much feedback to provide. The paper also discusses implementation concerns, particularly how to lock the user's actions so analysis can be completed.

The researchers not that their models to overestimate gesture similarity when gestures were or contained a letter. The author also propose that future work could include automatically repair similar gestures by morphing them into a new form.

Discussion

Just thinking about the fact that letters were deemed to be similar raises an interesting thought. We view letters as dissimilar because of the metaphor attached to the shape (a sound), but the shapes of letter do have a lot of similarity ("u" and "v", "m" and "n"). Many times when looking at peoples handwritten notes the only way I could determine an individual letter was in the context of a word. The question becomes how similar do we want gestures to be so that users can mentally group them as being in a category (all editing strokes), versus making them dissimilar enough that they can be recognized and drawn easily? For example the cut/paste gesture needs to convey similarity (both editing gestures), but at the same time be inherently opposite. This seems like a difficult thing to accomplish. Sure, quill can tell if gestures are similar, but I don't think it can help in making a metaphorical connection between gestures and the actions they are suppose to produce.

Citation

Long, A. C., Landay, J. A., and Rowe, L. A. 2001. "Those look similar!" issues in automating gesture design advice. In Proceedings of the 2001 Workshop on Perceptive User interfaces (Orlando, Florida, November 15 - 16, 2001). PUI '01, vol. 15. ACM Press, New York, NY, 1-5. DOI= http://doi.acm.org/10.1145/971478.971510

Tuesday, September 25, 2007

by

Summary

Discussion

Citation

Monday, September 24, 2007

by

Summary

Discussion

Citation

Thursday, September 20, 2007

by

Summary

Discussion

Citation

Wednesday, September 12, 2007

by

Summary

Discussion

Citation

Tuesday, September 11, 2007

by

Summary

Discussion

Citation

Tuesday, September 4, 2007

by

Summary

Discussion

Citation

Monday, September 3, 2007

by

Summary

Discussion

Citation

by

Summary

Discussion

Citation

About Me

Blog Archive

Links