Sketch Recognition Blog: 2007

Friday, December 14, 2007

Final Project Presentation

Thursday, December 6, 2007

Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

by

Jacob O. Wobbrock, Andrew D. Wilson, Yang Li

Summary

The paper begins with stating that gesture recognition has mostly been the domain of AI and pattern matching experts. Wobbrock et. al. believes that the techniques used are too difficult for HCI experts to incorporate into their designs, and thus gesture recognition isn't getting the proper attention from the HCI community due to the high learning curve. Undergraduate HCI students often do not have the experience to implement traditional recognizers. Wobbrock et. al. describe in this paper a simple recognizer that is easy to implement and is accurate, and to boot they provide the complete algorithm design. The recognizer must work with consideration of certain goals dealing with resilience and ease of creation.

The $1 recognizer is a template based recognizer. For each symbol to be recognized at least one sample must be provided. The first step in the $1 recognizer is to resample the gesture so that it has 64 equidistantly space points. Next an the indicitive angle is found for the purposes of rotation. After rotation the gesture is scaled to a reference square and translated to a reference point. The distance between each point in the candidate gesture is calculated to its corresponding point in the template gesture. The best match is the template with the minimal distance (which is normalized to between 0 and 1). The indicative angle though is not necessarily the optimal angle to make the comparison from. Wobbrock et. al. want to fine-tune the angle of the candidate so that its path distance to the template is minimized. First a hill climbing approach was used, and this worked well for the number of rotations needed when the candidate and template were similar (7.2 on average). Unfortunately, the number of rotations needed for dissimilar gestures was too high. Golden Section Search was used in place of the hill climber.

The $1 recognizer does have some short comings. It cannot distinguish gestures whose identity depends on orientation. Gestures must be drawn in the style of the template, or multiple templates for each gesture must be provided.

4800 gestures were collected from ten subjects. There were sixteen gesture types. Each gesture was drawn by the user at three different speeds. The $1 recognizer has 0.98% errors. As the number of templates for each symbol increased the number of errors decreased. The authors note that speed has an impact on errors. The $1 recognizer is faster than Rubine and significantly faster than DTW. They were able to achieve a 99% accuracy, and 97% accuracy with a single template.

Discussion

Why the concern over the three different speeds? I do believe this would have been a good first algorithm to implement if for no other reason than there are no unknown thresholds to bother with. Nothing was worse than trying to "find" the thresholds that got certain authors their amazing recognition rates, and constantly wondering why they weren't in the paper.

It is a fine paper, I am just not sure why it was in UIST? It seems more of a paper for CS education than anything else. No real new technique is proposed.

Citation

Wobbrock, J. O., Wilson, A. D., and Li, Y. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th Annual ACM Symposium on User interface Software and Technology (Newport, Rhode Island, USA, October 07 - 10, 2007). UIST '07. ACM, New York, NY, 159-168. DOI= http://doi.acm.org/10.1145/1294211.1294238

Properties of Real World Digital Logic Diagrams

by

Christine Alvarado and Michael Lazzareschi

Summary

Pure sketch recognition is difficult. To get around this fact (which should be painfully obvious to us all by now), many designers put restrictions on how the user can draw. This becomes a balancing act between making the recognition easier and providing a natural drawing experience. These restrictions are often done so to make the developer's life easy, and there is no attempt to understand natural drawing behavior.

The authors wanted to observe the natural drawing behavior of students. They focused primarily on three aspects: stroke order, strokes timing, and stroke number. In their study students were given a tablet to use for their digital design course. They were instructed to use this tablet in lieu of paper. Thirteen students participated, 98 diagrams were collected and each one was labeled by hand. The authors looked at the temporally contiguous nature of the strokes, and how consistent each student was. 19% of all the symbols drawn by the students were done with non-consecutive strokes. These strokes were often corrections. Also noted was how long the students paused between strokes overlaps greatly with how long they pause between strokes in the same shapes, thus this pause is not a clear indicator of when the user is drawing a new symbol. The researchers observed that many students did not draw symbols consistently with the same number of strokes. However non consistent students were consistently non consistent, and consistent students were consistently consistent. If someone can say the prior sentence five times real fast I'll give them a cookie. Few properties were consistent across all students.

Discussion

Judging from the chart the stroke timing overlap is minimal, they do not overlap greatly as the author states. Is it perfect? No, but I think categorizing it as great overlap is off. Also the statement of focusing on digital logic diagrams because little work as done in the domain seems strange to me. Almost every sketch recognition paper I have ever read uses this as an example. The fact that drawing mechanics seem to not be similar across users bodes well for my semester project.

Citation

Christine Alvarado and Michael Lazzareschi. Properties of Real World Digital Logic Diagrams. 1st International Workshop on Pen-based Learning Technologies (PLT) 2007.

What are Intelligence? And Why?

by

Randall Davis

Summary

Various fields have attempted to adopt AI as their own, and thus by doing that they put their assumptions, models and metaphors onto the field. Many times this doesn't mesh well, and well, that is how academic religious wars start. Davis begins by defining the four behaviors used to distinguish intelligent behavior, and also list the five fields contributing to "reason". Davis then goes on the give a brief summary of each fields approach to AI (machine logic, intelligence as a biological function....). Davis then transitions to the why of intelligence, discussing how evolution got use here. Evolution is blind search, is isn't engineering. Davis also notes that "Biology provides no definition of a problem until it has been revealed by the advantage of a solution." In essence nature is a fairly lousy engineer, and it is a satisfier not an optimizer. The paper points out that the human mind might not be the pentacle of achievement comparing it to a legacy application, designed with add on. I think he is trying to compare it to Microsoft Word, I really do. He finishes this section by pondering the question what if the human mind was written for another purpose and it was adopted for its current usage only after the fact?

At this point the paper switches to intelligence in the animal community. Vervet monkeys are discussed and how they can use transitive inference to establish place in the hierarchy. The monkeys also have a rudimentary vocabulary (TIGER!!!), but they don't have language (Hey, remember that tiger, man, that thing was scary.).

Davis pontificates (overtly word, but I got nothing else) that thinking could be a form of internal visualization. He thus gives two suggestions: 1) notion of visual reasoning, reasoning with diagrams as things that we look at, whose visual nature is a central part of the representation, 2) Thinking is a form of reliving. Davis notes that sense, vision and language are not simply I/O, but part of the thought process itself. Perception can be useful for thinking. A nice quote is when Davis writes about "creating for ourselves a concrete mini-world where we carry out mental actions and then examine the results."

Davis concludes by stating that intelligence are many things (it pains me to not type "is"). AI should be the study of the design spaces of intelligences (note the plural). Thinking is visual, and diagrams are useful to reason with not just about.

Discussion

This is one of those papers that is not about understanding an algorithm. It is a decade old, but from a historical stand point it is interesting. This isn't taking anything away from what he said, it is a good read, insightful, balanced and funny. The thing is, the more he talked about visual intelligence, and the use of diagrams the only thing I could think of was "He means sketch recognition." The whole "reasoning with diagrams" seems to be fundamental to what sketch recognition hopes to accomplish. The pen is the simplest tool for getting these designs into the the system. It is more natural than using a pen-menu based system.

Citation

Advances in Mathematical Sketching: Moving Toward the Paradigm's Full Potential

by

Joe LaViola

Summary

The paper focuses on mathematical sketching, specifically with the building of an application that will allow for sketching of expressions and the animation of drawn diagrams that correspond to those applications. The system is MathPad2. The UI depends heavily on the user to do numerous actions by way of a lasso and tap. The expressions must be manually segmented by the user. Associations can be made between the mathematical equations and the diagrams. These can be implicit such as a variable in an expression appears on a diagram. Or they can be explicit, by having the user draw a line between two entities that should be associated. MathPad is capable of graphing functions, solving equations and evaluating expressions. The system uses Matlab as its backend.

To aid with the recognition users must provide a sample (10 to 20) of each symbol they will use for the system to train on. Key points in the stroke are used for recognition. Pairwise examination provided an improvement in recognition over past versions of the system. LaViola is able to achieve 90.8% parsing accuracy.

The need for fixing diagrams to match what the user drew to what the math specifies is also covered. Sketching cannot have the precision that math must. Angle, location and size of the various parts of the diagram are adjusted.

The author wishes this system to be used by students. A mathematical notebook of sort. One in which the diagrams can be animated depending on the expressions we compose.

Discussion

There seems to be a lot of dependency on the lasso as a means of interaction. This paper reminds me of the work Adler is doing. It is difficult to be precise when sketching, and in the engineering world precision is required for the computation produced from these sketches to be accurate. Adler is attempting to do this by using an additional mode of interaction (voice). LaViola is doing something similar by using the equations as an additional mode.

Citation

LaViola, J. "Advances in Mathematical Sketching: Moving Toward the Paradigm's Full Potential", IEEE Computer Graphics and Applications, 27(1):38-48, January/February 2007.

Sunday, November 25, 2007

Ink Features for Diagram Recognition

by

Patel, R., Plimmer, B., Grundy, J., Ihaka, R.

Summary

The paper begins by noting that there are three approaches to recognition: bottom-up, top-down, or a combination of both. Ink features - measured aspects of an ink stroke - are used by a variety of gesture recognition algorithms (Rubine, Long). Despite their oft use no one has studied the effectiveness of the various features. Patel et. al. conducted an experiment to find distinguishing ink features of text and shape strokes to be used in a divider application. Divider here being the act of determining if a stroke is text or a shape. They experimented with 46 features over seven categories: size, time, intersections, curvature, pressure, operating system recognition values, inter-stroke gaps. The goal was to find the most significant features to divide text from shape.

26 people participated in the study; nine sketches were taken from each user, and 46 features were calculated on each sketch. The authors took a decision tree or statistical partitioning approach to determine the most useful feature. The feature at the root of the decision tree is the most optimal. Eight features were identified as being significant. Testing on the training data their eight feature approach misclassified shapes 10.8%, and text 8.8%. On testing data shapes were misclassified 42.1%, and 21.4% on text. Interstroke gap is the most important feature in determining the split, followed by the size of the shape. Pressure, intersections, and time were not useful for the purposes of separating the strokes. The authors suggest using an HMM technique to provide more flexibility than their statistical partitioning approach.

Discussion

The paper was interesting because finally in one place there is a listing of a large sent of ink features. I know of Rubine's initial set, and Long's follow up additions, but they don't come close to the 46 given in this paper. The accuracy statistics were disappointing.

Citation

Wednesday, November 7, 2007

Speech and Sketching: An Empirical Study of Multi-modial Interaction

by

Aaron Adler and Randal Davis

Summary

Oltman attempted to use speech as a way to overcome the ambiguities of sketching. The system proposed was limited though. It was very domain specific, and communication only went one way. Adler and Davis hope to create a white board system that incorporates both sketch and speech to aid in early design work. The system should be able to engage the user in natural dialog.

A user study was performed in an almost Wizard-of-Oz setup. The participant was given a variety of sketching goals to accomplish. The experimented had an identical tablet and engaged the user in dialog while the user was attempting to accomplish their goals. The participant was able to change the color of their strokes, and noted was that strokes fall into one of four categories: creation, modification, selection or writing. The authors were able to make three observations about the participants speech. First, they were difluent and often repeated words or phrases. Second, when prompted with a question from the experimenter they often responded with words that were used in the original question. Third, the speech utterance were related to what the user was drawing at the moment. Other observations were that the user would often list objects and then sketch them in that order, they wrote out words they used in their speech, and participants often paused their speeches so as to finish the drawing they were describing.

When prompted with a question from the experimenter the user often gave much more elaborate answers than were necessary. Often they would spot errors or ambiguities when giving these responses. Participants also made comments not related to the sketch, but in relation to the domain. The paper goes on to give details about the connection between the time of the sketch and word/phrase groupings, noting that speech phrases preceded sketching.

Discussion

There were a lot of observations presented that could be incorporated into a system. This paper seems to be the ground work for an implementation. I question whether having the experimenter in the room would have caused the participant to interact more? If the participants would be so giving if it was only them and a machine? I liked the use of color as a way to give context about the drawing. There could be additional layers to provide context that wouldn't necessarily need to show up on the sketch. Interested to see where they take this.

Citation

A. Adler and R. Davis. Speech and sketching: An empirical study of multimodal interaction. In Fourth Eurographics Conference on Sketch Based Interfaces and Modeling, Riverside, California, August 2-3 2007.

Monday, November 5, 2007

Three Main Concerns in Sketch Recognition and an Approach to Addressing Them

by

James V. Mahoney, Markus P. J. Fromherz

Summary

The paper begins by noting that sketch recognition technology must meet three requirements: it must be able to cope with ambiguity, it must have interactive performance, and it must be extensible. To demonstrate their approach to solving these problems Mahoney and Fromherz took on the task of labeling stick figure drawings (i.e. labeling legs and arms). Mahoney and Fromherz believe there are three sources of ambiguity: sloppy drawing, articulation, and interaction with a background context. The sloppiness causes problems with segmentation, thus matching to a model becomes difficult. The authors propose a variety of preprocessing techniques to get around this (proximity linking, virtual junction splitting, spurious segment jumping). They further deal with the issues involved in sub-graph matching and model acquisition.

Discussion

I was expecting something much more from the title. A bit disappointed. It all seemed too focused on stick figures.

Citation

Mahoney, JV, & Fromherz, MPJ (2002). Three main concerns in sketch recognition and an approach to addressing them. AAAI Spring Symposium on Sketch Understanding, pp 105---112, March 25-27 2002.

Tuesday, October 23, 2007

Naturally Conveyed Explanations of Device Behavior

by

Michael Oltmans and Randall Davis

Summary

The paper combines speech recognition techniques and sketch recognition into a system titled ASSISTANCE. The system is able to recognize 2-D kinematic diagrams through the visuals provided through sketching and the context given by the user voicing additional information about what should occur in the sketch. Sketch recognition is in constant pursuit of supplementing the process that designers use in their early stages. This means low-level quick sketches, and also vocal discussions. The designer should be able to explain the intended actions by drawing additional context (arrows to indicate direction), pointing and giving spoken feedback. All of this information is used to build a model of the device, and the user could ask the system questions about the model. The paper tackles how to handle overlapping descriptions to infer meanings. ASSISTANCE takes three inputs, the structural model of the sketch generated by ASSIST, the phrases recognized by the speech recognition (ViaVoice) and the sketched arrows. ASSISTANCE must be able to unify multiple description instances, and resolve deictic references. The system must also understand casual links.

Discussion

I like this paper because it presents the combination of multiple natural methods of interaction to create something that would often require a much more restrictive technique. It is a good starting point. I am unsure how the difference between pointing and drawing is done. Also, does one simply draw, switches modes, and then begins to explain (using the pointer to make references)? Or can the user give verbal interaction throughout the whole experience? What happens if a mistake has made? Can statements be made contradictory and only the latter statement is used?

Citation

Oltmans, M. and Davis, R. 2001. Naturally conveyed explanations of device behavior. In Proceedings of the 2001 Workshop on Perceptive User interfaces (Orlando, Florida, November 15 - 16, 2001). PUI '01, vol. 15. ACM Press, New York, NY, 1-8. DOI= http://doi.acm.org/10.1145/971478.971498

Wednesday, October 17, 2007

Ambiguous Intentions: a Paper-like Interface for Creative Design

by

Mark D. Gross and Ellen Yi-Luen Do

Summary

"While most computer interfaces are text-based, we wonder how far we can push pen based, freehand additions."

Gross and Do begin by stating that most designers when working in the early design stages prefer pen and paper to a computer. It makes sens; pen and paper provides more fluidity than menu based computer applications. Computers are definitive and precise. The authors propose that a pen based computer application could have ambiguity, imprecision and be created in such a way to encourage incremental formalization of ideas. I had often viewed ambiguity and imprecision as something to be avoided, but the authors point out that ambiguity allows for alternatives to be considered and that imprecision allows for decisions to be postponed.

Gross and Do developed an Electronic Cocktail Napkin (They used Macintosh Common Lisp, my heart is a flutter). The program was suppose to function like a normal drawing application, but to also allow for retrieval, simulation, design critiquing and collaborative work. ECN can recognize "configurations", which are user defined patterns and beautification rules. Recognition of these configurations depends on the context in which the user is drawing in. The system allows for alternative recognitions, it is capable of waiting for more information to clarify another recognition. In the systems contexts are chained, the system checks for the glyph to be recognized by the first and then works its way down. Recognition of the glyphs and configurations takes place with consideration of the context. The system is able to recognize which context the author is creating in.

A variety of user studies were performed, focusing on how architectural student would use such as system. Gross and Do wanted ECN to be a general purpose drawing diagram that end users could make more specific.

Discussion

The retrieval component seemed similar to Brandon's MARQS application. The whole "general purpose that users make more specific" is something that LADDER accomplishes with domain descriptions. Architecture is such a drawing based field, I would find it more interesting if they perused a discipline that didn't so heavily rely on drawing and show how they could work more efficiently through a pen based input. All of these types of applications depend heavily on the concept of diagrams. They don't want people to actually sketch what they want, the want them to sketch simple shapes that have a metaphorical connection to what the author is visualizing.

Citation

Gross, M. D. and Do, E. Y. 1996. Ambiguous intentions: a paper-like interface for creative design. In Proceedings of the 9th Annual ACM Symposium on User interface Software and Technology (Seattle, Washington, United States, November 06 - 08, 1996). UIST '96. ACM Press, New York, NY, 183-192. DOI= http://doi.acm.org/10.1145/237091.237119

Monday, October 15, 2007

Graphical input through machine recognition of sketches

by

Christopher F. Herot

Summary

1976, was the year of the United States bicentennial. It was also the year that Apple Computer was formed. So one might ask why we are reading a paper from 1976? Our readings so far have lead us to believe - wrongly - that sketch recognition went from Sutherland's SketchPad straight to Rubine's gesture recognizer. This paper shatters that belief.

The paper opens up by noting that the research done was motivated by the "desire to involve the computer in the early stages of the design process, where the feedback generated by the machine can be most useful." If this sounds familiar it is because ever paper on sketch recognition uses a very similar motivation to frame their work. Herot states that the machine observes the sketches of the subject while they are being created, and this information could be used to make inferences about the user's attitude about the sketch. He also presents the question could a machine make useful interpretations of sketches without knowing the domain in which the sketches are made.

Herot also notes - which I wrongly assumed Sezgin discovered - that the speed of the stroke descends at the corners. Herot even plots this onto a graph noticing the minimum of speeds as the corners. He also notes that the recognition of the system is influenced by the drawing styles of those who've created it. During demonstrations the system would work well for some, but not for others. The paper also discusses latching - the connecting of near miss edges - and overtracing. The paper states that context should be used at the lowest levels of recognition, and that the program must be tuned to the user.

Herot states that the user should not be removed from the recognition process, noting that a promising approach involves "the user to make decisions of which the machine is not
capable, but still affording the unobtrusive input method of sketching."

Discussion

I didn't realize that Negroponte was involved in sketch recognition until I looked through the works cited section. It is weird to see the same language used to describe the problem in 1976 still used in 2007. The metric bentness and the use of speed seem to be very similar to the metrics that Sezgin used. More work still needs to be done with latching and overtracing.

Citation

Herot, C. F. 1976. Graphical input through machine recognition of sketches. In Proceedings of the 3rd Annual Conference on Computer Graphics and interactive Techniques (Philadelphia, Pennsylvania, July 14 - 16, 1976). SIGGRAPH '76. ACM Press, New York, NY, 97-102. DOI= http://doi.acm.org/10.1145/563274.563294

Sunday, October 7, 2007

Perceptually Based Learning of Shape Descriptions for Sketch Recognition

by

Olya Veselova and Randall Davis

Summary

A huge chunk of this paper can be summarized through one quote, "people pay unequal attention to different features". The goal for the authors is to have a system be able to learn descriptions from a single example. The system should only capture the relevant features that users would care about. The paper made heavy use of Goldmeier's work on human perception. Goldmeier had properties called singularities, which were features that small variations in had a qualitative difference. Goldmeier's singularities include vertically, symmetry, parallelism, horizontally and straightness. The paper lists the importance of these different constraints. This allows for a score. The score can be adjusted by obstruction, tension lines, and grouping. An example is if two lines are near they being parallel is an important constraint. If they are far away and a number of primitives are between them that constraint isn't so important.

To test their system the created a study and measured how often their system agreed with people's perceptual judgments on near-perfect drawings.

Discussion

The paper reminds me of those standardized test in which they student is given multiple shapes in a group and must recognize which shape does not belong. The work is interesting and has a direct connection to how we make our domain descriptions in LADDER. We should only focus on the constraints that the user believes to be relevant.

Citation

Veselova, O. and Davis, R. 2006. Perceptually based learning of shape descriptions for sketch recognition. In ACM SIGGRAPH 2006 Courses (Boston, Massachusetts, July 30 - August 03, 2006). SIGGRAPH '06. ACM Press, New York, NY, 28. DOI= http://doi.acm.org/10.1145/1185657.1185789

Saturday, October 6, 2007

Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples Intelligent User Interfaces

by

Tracy Hammond and Randall Davis

Summary

The paper tackles the problem of creating shape descriptors for sketch recognition systems. The paper builds on LADDER, which is a languages to describe how shapes are drawn, displayed and edited. The author point out that recognition is based on what the shapes look like, not how they were drawn. The problem is that descriptors can be under constrained, recognizing shapes that the designer did not intend, or over constrained, not recognizing shapes that the author did intend.The big problem is that it is much easier to draw a shape that to create a formal description.
The authors propose a visual debugger of shape descriptions. The system uses a hand-drawn positive example to build the initial structural description. The system then asks the user to identify near-miss shapes as positive or negative. To check if the description is over constrained a description is created with the constraint negated, and then a shape is generated. If the user views this shape to be correct the constraint is unnecessary. To test for the under constraint a list of possible restraints is created. The negation of the constraint is added and users are again queried. This can determine if a constraint should be included. The paper also describes parameters that the authors used when generating new shapes.
The end goal is to produce descriptions that contain no unnecessary constraints, also aren't missing needed constraints.

Discussion

The section about the users drawing similar examples was spot on. As I did my user study most of the participants would draw the same line over and over again. So figure 1 gets a special place in my heart. I choose this paper to read for my user study report and also my final project. In my final project I need users to draw shapes that will be game pieces. Looking back, in my Plinko description, I softened constraints because the system seemed to be unable to recognize my drawing as what I intended. A system like this would be better, I could draw the shape I was interested in, and then through an iterative process the shape description was perfected. I could see this being a way to get new users started in creating descriptions. Let them draw a shape, build a description and then let them modify it. It gets around the blank page problem. When I am writing there is always a fear of the blank page, I usually write on top of an outline, and that gets me over the initial hump. I could see new users intimidate creating new shapes from scratch.

Citation

Hammond, T. and Davis, R. 2006. Interactive learning of structural shape descriptions from automatically generated near-miss examples. In Proceedings of the 11th international Conference on intelligent User interfaces (Sydney, Australia, January 29 - February 01, 2006). IUI '06. ACM Press, New York, NY, 210-217. DOI= http://doi.acm.org/10.1145/1111449.1111495

Wednesday, October 3, 2007

LADDER, a sketching language for user interface developers

by

Tracy Hammond and Randall Davis

Summary

There are people who are good at making sketch recognition systems, unfortunately those people aren't usually domain experts. The opposite is true, some people really know their domains, but it is difficult for them to transfer this knowledge into a sketch recognition system . LADDER allows for the description of a domain's shapes at a simple level. This description consists of describing the primitive shapes, and the constraints on those primitive shapes. This description should also include information about how the shape should be edited and how it should be displayed to the user.

In LADDER shapes are built hierarchically re-using low-level shapes. Both hard and soft constraints can be put on shapes; thus the way a user draws the shape must not be identical to be recognized, but if a shape is often drawn in a common progression, this information could be used in recognition.

A user study was done with thirty users to gain an understanding of how people go about describing shapes and constraints in common language. These concepts were built itno the description language for LADDER.

LADDER is unable to describe abstract shapes, shapes must be composed of primitive shapes, and curves are inherently difficult for the system. The paper goes on to discuss the shape definitions, how shapes can be grouped, and the predefined shapes, conditions and display methods. Also, vectors is discussed. Vectors are used to recognize shapes with a variable number of sub-shapes. The example given is a dashed-line. Shapes are recognized using a bottom-up approach.

Discussion

The geometric recognition is a movement away from Rubine's feature based (Which is more gesture) into a system that doesn't constrain the user. The way domain descriptions are built is nice in that it is a very reusable system. I can see that when trying to create complex shapes it could get very frustrating.

Citation

Hammond, T. and Davis, R. 2007. LADDER, a sketching language for user interface developers. In ACM SIGGRAPH 2007 Courses (San Diego, California, August 05 - 09, 2007). SIGGRAPH '07. ACM Press, New York, NY, 35. DOI= http://doi.acm.org/10.1145/1281500.1281546

Tuesday, September 25, 2007

Recognizing and Beautifying Low-Level Sketch Shapes with Two New Features and Ranking Algorithms

by

Brandon Paulson and Tracy Hammond

Summary

The authors discuss a new low-level recognizer which is able to recognize eight primitive shapes, and achieve recognition rates of 98.6%. The recognizer is capable of returning multiple interpretations, thus allowing for quick selection of alternate interpretations. The system can recognize: line, polyline, circle, ellipse, arc, curve, spiral and helix. The author gives a brief overview of the previous work; this section includes many if not all of the papers we have read previously.

The stroke goes through a pre-recognition process in which graphs and values are comptued for the stroke's direction, speed, curvature and corners. Corners are calculated using a multi-pass approach. Two new features are calculated: NDDE (Normalized Distance between Direction Extremes) and DCR (Direction Change Ratio). Curved shapes have a high NDDE, polylines lower; polylines have higher DCR values than curved strokes. Two additional test are done to determine if the stroke is overtraced or if it is a closed figure. For each of the the eight shapes there is a set of test where various thresholds for the various values must be compared. A hierarchy is used to determine whether a stroke is a stroke is interpreted as complex or polyline. Seven of the low-level descriptors are given scores. These scores are then added with the lower score being selected.

The recognizer was compared against the authors implementation of Sezgin, their recognizer, their recognizer without the two new features, and their recognizer without rankings. The recognizer is capable of running in real time. Correct shape interpretation was returned 99.89% of time (list being on average 2.69), 98.56% of the time the correct interpretation was the top. Circles misclassified as ellipses, complex shapes, and polylines being classified as complex shapes were the oft errors.

Discussion

I noticed that in the complex test sub-strokes are attempted to be combines into strokes, and those strokes are tested against the primitive. Just by looking at my Sezgin results I can see when this would be helpful. The future work section was good, it gave me ideas to cherry-pick for possible class projects.

Citation

Monday, September 24, 2007

Streaming Algorithms for Line Simplification

by

Mohammad Ali Abam, Mark de Berg, Peter Hachenberger and Alireza Zarei

Summary

Given a streaming input of data points, it may be impossible to store the complete set. A compromise would be to create an simplification or approximation ot those points. The simplification problem is give a path P with points p0, p1,..pn to find a path Q with fewer vertices. If Q is a subset of P it is a restricted version, in an unrestricted version this does not have to be so. There are two type of optimization problems min-k and min-delta. Min-k is given a maximum error find a simplification with as few vertices as possible that is below the error threshold. With min-delta, given a maximum of K vertices find the simplification with the smallest possible error that uses at most K vertices. Iterative algorithms are needed due to the streaming nature, and thus many existing line simplification algorithms cannot be used for streaming.

The error function between the simplified version and the original is calculated using Hausdorff and Frechet distances. The goal is to get competitive ratio in comparison between the optimal offline algorithm and the simplification algorithm.

The algorithm uses a priority queue to hold the vertex links. We compute the error of the incoming link, and insert it into the queue at the appropriate priority. Then we extract the vertex from the from the queue with the minimum priority and update the priorities of the vertices on both sides. The paper goes on to define c-montone as the error of a link cannot be worse that c times the error of any link that encloses it.

The paper goes on to discuss the Hausdorff error function, and discuss how the oracle makes use of that function. The same treatment if given to Freceht, who uses two parameters: width of the points and length of the largest back path. Each distance function is successful at differing type of point distributions.

Discussion

I had no trouble understanding the basic algorithm structure. I was a bit confused by the use of the oracles and the two distance function, but with a closer look and more examples I feel it should be come clear. I wonder how much of streaming simplification/beautification we will actually do. I cannot perceive us wanting to beautify the stroke as the user goes. If we can do such a simplification in the background it would give a head start when it comes time to process the stroke, and reduce the delay significantly. Can this be applied to arcs at all (I know they put it for a polygonal path), could the basic algorithm (not the distance functions) work?

Citation

Abam, M. A., de Berg, M., Hachenberger, P., and Zarei, A. 2007. Streaming algorithms for line simplification. In Proceedings of the Twenty-Third Annual Symposium on Computational Geometry (Gyeongju, South Korea, June 06 - 08, 2007). SCG '07. ACM Press, New York, NY, 175-183. DOI= http://doi.acm.org/10.1145/1247069.1247103

Thursday, September 20, 2007

A curvature estimation for pen input segmentation in sketch-based modeling

by

Dae Hyun Kim and Myoung-Jun Kim

Summary

The paper presents a new curvature estimation method that uses local convexity and monotonicity. The paper notes that for proper stroke segmentation good curvature estimation is needed. They note two classes discussed in previous work for accomplishing this goal: scale space decomposition (Sezgin) and estimating curvature directly from input data. Their contribution is a new means to determine region of support. They discuss the sliding method window. They provide two algorithms: one which provides curvature estimation with local shape information and the other which provides curvature estimation with local convexity. They were able to achieve a success rate at 95%. They mention comparing the four algorithms, but never give success rates.

Discussion

I had difficulty reading this paper; the organization left a lot to be desired. I have no idea why they discussed an evaluation test, and didn't feel the need to share the results. This is one of those concepts I feel I will better understand once I start to implement it. The change in curvature (convexity) makes sense as a means of recognizing points of curvature.

Citation

Kim, D. H., and Kim, M.-J. 2006. A curvature estimation for pen input segmentation in sketch-based modeling. Computer-Aided Design 38, 3, 238--248.

Wednesday, September 12, 2007

A Domain-Independent System for Sketch Recognition

by

Bo Yu and Shijie Cai

Summary

The paper notes that the two main deficiencies of existing sketch recognition systems are their strict restriction on the manner in which the user sketches and their inability to analyze hybrid or smooth curves and decompose them into primitive shapes. The authors list six attributes of a practical sketch recognition system: users can draw naturally, consistent recognition results, understands hierarchical relations among graphic objects, should try and guess what the user intended to draw, drawing modification should be possible, and easily integrateable into other systems.

The author's system has a two stage process. First stage is imprecise stroke approximation; second stage is post-process. The system also has three levels of hierarchical output; lowest is raw data (the original stroke points), middle level is the syntactic level (vertexes and primitive shapes), the highest level is semantic level (recognition of primitive shapes and basic objects). Vertex detection is combined with primitive shape approximation in their system. A stroke is taken and compared to a primitive shape. If the primitive shape is not a match, the stroke is divided at the highest point on the curvature curve. These sections are thus compared to primitives, and the process repeats. Their system makes no use of the speed of the data stroke.

Strokes can be recognized as lines due to a horizontal line on the direction curve and simply if the points can be fitted to an actual line. In circle approximation when the stroke is put onto a direction graph it tends to form a line with the slope (2*pi)/n. An arc should have a line less than a circle. As the slope decreases the line becomes flatter, approaching the horizontal slope of a line segment. Self intersecting strokes pose a problem. Their system addresses overlapping by merging the strokes.

The authors have created a "module" that can recognize basic objects. This module sounds similar to LADDER in that it describes using more complex shapes using simple shapes. Editing strokes are briefly discussed, no real mention of how to switch between modes is given.

The overall recognition rate of primitive shapes and polylines was near 98%. Arcs alone was 94%, and hybrid shapes with smooth connections between lines and arcs had a recognition rate of 70%.

Discussion

The authors note "another noticeable feature of our system is the convient and efficient user interface for modifying recognition result." I am still unsure how the system differentiates drawn strokes versus strokes used for editing. How does one switch into modification state?

The idea of being able to recognize primitives, and thus breaking strokes down until it becomes a set of primitives seems very intuitive. Also the notion of the module for recognition being built for specific domains based on low-level descriptions of objects seems familiar.

Decent paper, the 70% doesn't bode well for us though. In many of these papers the recognition rates seem a bit inflated, so for them to openly admit 70% makes me conservative about our ability to recognize certain shapes.

Citation

Yu, B. and Cai, S. 2003. A domain-independent system for sketch recognition. In Proceedings of the 1st international Conference on Computer Graphics and interactive Techniques in Australasia and South East Asia (Melbourne, Australia, February 11 - 14, 2003). GRAPHITE '03. ACM Press, New York, NY, 141-146. DOI= http://doi.acm.org/10.1145/604471.604499

Tuesday, September 11, 2007

Sketch Based Interfaces: Early Processing for Sketch Understanding

by

Tevfik Metin Sezgin, Thomas Stahovich, Randall Davis

Summary

This approach taken in this paper is a shift from the past few we have read (Rubine, Long). Sezgin et. al. believe that a sketch based system should respond to how an object looks, and not how it was drawn. The goal of these sketch systems is to allow people to work with computation tool in the same manner they work with people (messy, informal), but without losing the nice computational gains. The authors state that the first step in accomplishing this goal is to be able to convert from pixels to geometric objects. This early processing must be able to find corners and fit lines and curves.

Their early processing approach has three phases: approximation, beautification and basic recognition. To accomplish stroke approximation first vertex must be detected. Their scheme uses both stroke direction and speed data to accomplish the detection. The extremas are compared against an average mean threshold to remove noise. This works under the principle that pen speed drops when going around a corner. On certain drawings though pen speed will not recognize corners. Thus the intersection of the vertex found by each method is used. The paper goes into further detail on how a hybrid fit is calculated, and how curves are handled. Beautification, as described in the paper, is the minor adjustments of the lines to insure that the intent of the drawing is true. An example of intent could be parallel, which is difficult to draw freehand, but the intent is not hard to recognize, and thus correct strokes for. Recognition is accomplished with hand tailored templates that examine the properties.

A user study was performed comparing this system to drawing using XFig. Participant were almost unanimous in their selection of the sketching system as the preferred choice. Once again it is pointed out that this system puts no limitations on how the user wishes to draw the object. It is for the computer to understand the user, not for the user to sketch in a way the computer understands.

Discussion

"Just drink the Kool-Aid already." - Brian David Eoff Approx. 17 Minutes Ago

Does it seem bizarre this idea that the interactions between a user and a computer could be "messy"? Interaction between a computer seems to be about removing ambiguities, and if there was a problem it was caused by the user right? Maybe it doesn't have to be rigid? Maybe we can have the best of both worlds; the ability to take all the good capabilities computation provides coupled with natural human interaction in all its messy imperfect yet nuanced glory. Really think about this, the GUI was huge compared to the command line, but that was ~30 years ago. Aren't we due for something different? Even if it isn't always better (and for some things I still go to the command line) shouldn't there be something else? The next. If not, what have we done with all this computational power and connectivity we've been chasing?

Well now onto the paper. The paper makes an interesting observation that if user's draw slowly they are most likely being precise, and the system should respect that precision when recognizing the drawing. Also, the authors make not of their handling of overtracing, a natural movement that happens when one draws, that gesture recognition could not possibly recognize. This is a movement towards more looking at what the user drew and using how they drew it to clear up ambiguities versus solely depending on the physical act of the user drawing the object. Now that tablets can record tilt metrics on pen inputs could this be used along with speed and curvature to discover corners? The same with pressure. Were these values not available when this system was created, or were they deemed unnecessary?

Citation

Sezgin, T. M., Stahovich, T., and Davis, R. 2001. Sketch based interfaces: early processing for sketch understanding. In Proceedings of the 2001 Workshop on Perceptive User interfaces (Orlando, Florida, November 15 - 16, 2001). PUI '01, vol. 15. ACM Press, New York, NY, 1-8. DOI= http://doi.acm.org/10.1145/971478.971487

Tuesday, September 4, 2007

MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned From a Single Example Using a Dual-Classifier

by

Brandon Paulson and Tracy Hammond

Summary

The authors propose an algorithm that can recognize unconstrained symbols. Other systems put constraints on the sketches - they must be single stroke, oriented in the same way, or drawn at the same scale. The authors propose a system in which images can be retrieved using sketches the authors have connected to the image. There are three goals associated with the algorithm: recognize unconstrained sketch symbols, recognize symbols from only a single training example, and learn from successful queries to improve accuracy over time. The application created to demonstrate this algorithm is MARQS, Media Albums Retrieved by a Query Sketch. The system is able to use the user's search queries over time to train itself.

The system uses a linear classifier, with a set of global features, when enough examples are available. The system normalizes the orientation of each drawing, by putting an axis through the two farthest points. The system uses four global features: a bounding box, pixel density, average curvature and the number of perceived corners. When only a single sketch is available the features for it and the user contributed query sketch are compared. The error is calculated by the difference of the values, the sketch with the lowest error is chosen.

The single classifier was used 27% of the time, linear 73%. Average search rank of the single classifier was 1.7, the linear classifier 1.44, and the hybrid system 1.51.

The paper notes that stroke length and speed are not good features when symbols are drawn in multiple strokes.

Discussion

The system, while not requiring the user to train the system, does require the user to create a sketch for each item that they want to search for (whole albums, or single images). Users are often hesitant, or lazy, about creating metadata to go with their images. Also, the responsibility is on the user to create an image that they can mentally connect to the image (not an easy task). It could be argued though that this is a more personal system than trying to compare a sketch to the actual image.

I was surprised that Brandon and Dr. Hammond were able to get such performance out of a small number of features. It would seem like there would come a tradeoff involving how much performance could improve versus the computation and the time needed to calculate additional features.

Citation

Monday, September 3, 2007

Visual Similarity of Pen Gestures

by

A. Chris Long, Jr. , James A. Landay, Lawrence A. Rowe and Joseph Michiels

Summary

The paper is a precursor to the prior paper. This paper goes into more detail about the experiment to determine why users find gesture similar. The researches hope to roll these findings into a tool (quill) that is capable of notifying designers of gestures that are likely to be perceived as similar by users, that will be difficult for users to learn and remember and may be unrecognized by a computer. Similarity can affect how easily users can learn and recall gestures.

A brief overview of pen based interfaces is given. The authors point out that the handwriting recognition on the Newton was initially criticized (it recognized Cyrllic great by the way). Also noted is the power of gestures in that they can identify the operator and the operand simultaneously. The paper continues with an overview of perceptual similarity focusing on the psychology aspect and multi-dimension scaling.

The researchers ran two separate experiments. The goals of the first experiment were to determine what measurable geometric properties influenced similarity and to produce a model that could predict how similar two gestures would be viewed by a user. The similarity of the gestures is determined by the Euclidean distance between the feature sets of two gestures. Long etc. does propose additional predictors beyond Rubine, which they list as only having 11. The paper notes that the participants clumped into two separate groups, but gives no information about what distinguishes the two. The second study focused on varying different types of features affected perceived similarity.

Discussion

Why don't they list feature 12 and 13 of Rubine? Those are the features that take into consideration time data, and none of their additional features consider time. Also, Long etc. make note of the fact that users seem to fall into two separate groups, yet they give no additional information about those groups (i.e. the criteria for placement in said groups). I wonder how additional physical metrics of input (pressure of the pen, tilt of the pen) could be used in creating additional Rubin features?

Citation

Long, A. C., Landay, J. A., Rowe, L. A., and Michiels, J. 2000. Visual similarity of pen gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (The Hague, The Netherlands, April 01 - 06, 2000). CHI '00. ACM Press, New York, NY, 360-367. DOI= http://doi.acm.org/10.1145/332040.332458

"Those Look Similar!" Issues in Automating Gesture Design Advice

by

A. Chris Long, Jr. , James A. Landay and Lawrence A. Rowe

Summary

This paper presents quill, an application that aids interface designers in determining if gestures are similar. The tool is also capable of advising designers on how to makes their gestures less similar, not only from a users view but also gestures that a computer might confuse as similar.

The researchers conducted three user studies to determine a model of gesture similarity. In the first two 42 participants choose the most different gesture from a group of three; in the third study 266 participants judged the similarity of pairs of gestures on an absolute scale.

The goals of quill is to allow designers with no advance knowledge of gesture recognition to create gestures that a computer can recognize, but won't confuse the user as too similar. The quill system learns gestures using the Rubine algorithm, and a training set provided by the user. Quill analyzes the gestures in the background and provides feedback if those gestures are too similar to other gesture classes. The paper goes on to discuss concerns about when to give feedback, and how much feedback to provide. The paper also discusses implementation concerns, particularly how to lock the user's actions so analysis can be completed.

The researchers not that their models to overestimate gesture similarity when gestures were or contained a letter. The author also propose that future work could include automatically repair similar gestures by morphing them into a new form.

Discussion

Just thinking about the fact that letters were deemed to be similar raises an interesting thought. We view letters as dissimilar because of the metaphor attached to the shape (a sound), but the shapes of letter do have a lot of similarity ("u" and "v", "m" and "n"). Many times when looking at peoples handwritten notes the only way I could determine an individual letter was in the context of a word. The question becomes how similar do we want gestures to be so that users can mentally group them as being in a category (all editing strokes), versus making them dissimilar enough that they can be recognized and drawn easily? For example the cut/paste gesture needs to convey similarity (both editing gestures), but at the same time be inherently opposite. This seems like a difficult thing to accomplish. Sure, quill can tell if gestures are similar, but I don't think it can help in making a metaphorical connection between gestures and the actions they are suppose to produce.

Citation

Long, A. C., Landay, J. A., and Rowe, L. A. 2001. "Those look similar!" issues in automating gesture design advice. In Proceedings of the 2001 Workshop on Perceptive User interfaces (Orlando, Florida, November 15 - 16, 2001). PUI '01, vol. 15. ACM Press, New York, NY, 1-5. DOI= http://doi.acm.org/10.1145/971478.971510

Thursday, August 30, 2007

Specifying Gestures By Example

by Dean Rubine

Summary

The paper proposes a system of creating a gesture recognizer automatically from example gestures. This system is incorporated into GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture) which allowed the gestures to be used in a direct manipulation environment. The paper describes the interactions needed to control a drawing application, GDP. Of note is that GDP is not sketch recognition, the user utilizes a gesture to create a shapes starting point and then finishes the creation of the shape by dragging (rubberband). This system is single stroke only; the limitation avoids a segmentation issue and single strokes coincide with a single tensing and relaxing muscle movement. The author notes that 15 examples per gesture class is adequate. Also the author describes how gesture handlers are created for each gesture, flaunting the object-oriented nature of the project. The strokes are recorded by taking in x and y positions in addition to the time. A set of 13 features is calculated for each stroke. The feature hold that a small change in the shape should correspond to a small change in the feature value. The sample estimate is determined by the average of the feature for a class. At around 50 training examples the performance gained plateaus, also as the number of possible classes increases the recognition rate drops, but not beyond 90%. Two extensions to the system are proposed; one in which the system does eager recognition, and multi-finger recognition.

Discussion

The paper makes mention of multi-finger recognition which surprised me a little. Such devices have become popular recently, with Jeff Han's TED presentation and the popularity of the iPhone, but this paper was written in 1991. Sixteen years is quite a gap for a idea incubated in research to begin appearing in consumer products.

The algorithm is quite nice in that it doesn't limit features in being added, and also that the feature values appear to be weighted. As I read this paper I kept thinking about Palm's Grafitti language and the similarity in design. It did seem odd that the delete 'x' gesture used the start of the gesture not what object the gesture overlapped to determine what to delete.

Citation

Rubine, D. 1991. Specifying gestures by example. In Proceedings of the 18th Annual Conference on Computer Graphics and interactive Techniques SIGGRAPH '91. ACM Press, New York, NY, 329-337. DOI= http://doi.acm.org/10.1145/122718.122753

Wednesday, August 29, 2007

Sketchpad : A Man-Machine Graphical Communication System

by Ivan Sutherland

Summary

This paper is about Sutherland's graphical interaction system, it should be noted that the paper was published in 1963. The paper opens with a great quote "Heretofore, most interaction between man and computers has been slowed down by the need to reduce all communications to written statements that can be typed; in the past, we have been writing letters to rather than conferring with computers." The paper opens with an example of drawing a hexagon shape: the drawing is accomplished with a light pen and a variety of buttons and knobs. The light pen is used to draw the line, by way of rubber bad, and the other inputs are used to constrain the drawing. Throughout the paper Sutherland makes many references that are similar to object oriented programming concepts (such as "instance" and inheritance). He also notes that not only is the drawing stored, but information about its creation (constraints is stored). Changing the basic shape, if instances of that shape have been used ripples through the document. The paper notes four broad areas in which such a system is useful: storing and updating drawings, gaining scientific or engineering understanding of operations that can be described graphically, topological input for circuit simulators and highly repetitive drawings. The paper describes the ring structure, light pen and display technology that was used in the system. Also discussed is the abstraction, the recursive merging and deleting. Of note is the discussion of how constraints can be put onto already created drawings. The paper lists patterns, linkages, dimensional lines, bridges, artistic drawings and electrical circuit diagrams as domain examples.

Discussion

Sutherland, could for see the issue I would have when he put the sentence, "It is only worthwhile to make drawings on the computer if you can get something more out of the drawing than just a drawing" into the conclusion. This is one of those papers that I always get a weird feeling about. Sure, 1963, and it was an incredible break through. He did amazing things. My concern is that for one reason or another these ideas did not catch on. This wasn't some nameless researcher, Sutherland is widely respected. He was at MIT, he had funding. It is because the possible uses are limited? Have we become so use to keyboard/mouse interaction that what was once considered a more natural input technique (pen) is no longer natural?

Citation

Sutherland, I. E. 1964. Sketch pad a man-machine graphical communication system. In Proceedings of the SHARE Design Automation Workshop DAC '64. ACM Press, New York, NY, 6.329-6.346. DOI= http://doi.acm.org/10.1145/800265.810742

Introduction to Sketch Recognition

by Tracy Hammond and Kenneth Mock

Summary

The paper discusses pen based computing and sketch recognition with particular attention paid towards educational environments. Background is given on the creation of the field, specifically Ivan Sutherland's early contributions. Tablet PCs are described as well as the distinction between passive (that which requires contact) and active (that which requires a special stylus) digitizers. Also described are different pen input devices (tablets, UMPC, drawing tablets) and the software that can be used with such input means. The paper focuses then on this type of input in educational situations. How lectures can be annotated, edited, and prepared using pen input is discussed.

The paper then moves into a discussion of sketch recognitions and lists domains - art, music, chemistry, mathematics, finite-state machines, UML, COA - in such recognition makes sense. The paper also gives a brief overview of FLUID (LADDER and GUILD), and comments on how sketches can be tied to higher-level functionality. The authors describe how shape definitions are created in LADDER using primitive shapes and constraints. Also noted are two challenges with pen-input: 1) recognization systems being able to handle varied input, 2) administrative concerns. The authors then discuss two case studies of tablet use in classrooms. The paper concludes noting the increase in TabletPC's adoption in educational markets.

Discussion

Both of the case studies take place in Math classes: it would have been better to have each case study focus on a different subject. The paper gives a good overview of current pen input devices, those devices' roles in an education environment, and what is happening in sketch recognition. The work was a bit fluffy, but the author openly admits that.

Something that concerns me about sketch recognition is that it seems heavily domain specific. I have difficulty moving beyond the "that's neat" feeling into viewing it as something that can have a fundamental shift in the way people interact with computers.

Play this game, how would you want to interact with a computer if anything was possible? Paper has some great benefits, but far too many of them I feel are physical not the user interface. By physical I mean the lightness, the contrast. In pen input I lose some of those nice physical qualities but I gain the nice digital ones (the ability to edit, the ability to reproduce, the ability to collaborate over a large distance).

Introduction

Name: Brian David Eoff

Year: PhD 3rd

Email: firstname.lastname@gmail.com

Academic Interest: User Interface, HCI, IR, Semantic Web, Personal Information Management,Ubiquitous Computing, Information Discovery

Relevant Experience: Hammond's HCI class, Choe's AI class, Worked with LADDER and GUILD before.

Why Are You Taking This Course: To explore an alternate way for people to interact with computers.

What do you hope to gain?: Be exposed to new ideas and paradigms of interaction, to stimulate thought about how a user interacts with a machine, and trying to make that process more natural.

What do you think you will be doing in 5 years? Finished Ph.D or fled academia and started a small company with friends.

What do you think you will be doing in 10 years? Living in Seward, Alaska. In a compound. Preparing a small revolutionary army.

What are your non-academic interests? Reading, Basketball, Swimming, Sleeping.

Tell a fun story about yourself: As a young boy I really wanted to be a soldier (GI Joe hooked me hard). For some reason I associated the word "Vietnam" with being a soldier. At a dinner party one of my parent's friends asked me what I wanted to be when I grew up. I responded I wanted to be Vietnam. They laughed. And I never became a soldier.