Thursday, December 6, 2007

Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

by

Jacob O. Wobbrock, Andrew D. Wilson, Yang Li

Summary

The paper begins with stating that gesture recognition has mostly been the domain of AI and pattern matching experts. Wobbrock et. al. believes that the techniques used are too difficult for HCI experts to incorporate into their designs, and thus gesture recognition isn't getting the proper attention from the HCI community due to the high learning curve. Undergraduate HCI students often do not have the experience to implement traditional recognizers. Wobbrock et. al. describe in this paper a simple recognizer that is easy to implement and is accurate, and to boot they provide the complete algorithm design. The recognizer must work with consideration of certain goals dealing with resilience and ease of creation.

The $1 recognizer is a template based recognizer. For each symbol to be recognized at least one sample must be provided. The first step in the $1 recognizer is to resample the gesture so that it has 64 equidistantly space points. Next an the indicitive angle is found for the purposes of rotation. After rotation the gesture is scaled to a reference square and translated to a reference point. The distance between each point in the candidate gesture is calculated to its corresponding point in the template gesture. The best match is the template with the minimal distance (which is normalized to between 0 and 1). The indicative angle though is not necessarily the optimal angle to make the comparison from. Wobbrock et. al. want to fine-tune the angle of the candidate so that its path distance to the template is minimized. First a hill climbing approach was used, and this worked well for the number of rotations needed when the candidate and template were similar (7.2 on average). Unfortunately, the number of rotations needed for dissimilar gestures was too high. Golden Section Search was used in place of the hill climber.

The $1 recognizer does have some short comings. It cannot distinguish gestures whose identity depends on orientation. Gestures must be drawn in the style of the template, or multiple templates for each gesture must be provided.

4800 gestures were collected from ten subjects. There were sixteen gesture types. Each gesture was drawn by the user at three different speeds. The $1 recognizer has 0.98% errors. As the number of templates for each symbol increased the number of errors decreased. The authors note that speed has an impact on errors. The $1 recognizer is faster than Rubine and significantly faster than DTW. They were able to achieve a 99% accuracy, and 97% accuracy with a single template.

Discussion

Why the concern over the three different speeds? I do believe this would have been a good first algorithm to implement if for no other reason than there are no unknown thresholds to bother with. Nothing was worse than trying to "find" the thresholds that got certain authors their amazing recognition rates, and constantly wondering why they weren't in the paper.

It is a fine paper, I am just not sure why it was in UIST? It seems more of a paper for CS education than anything else. No real new technique is proposed.

Citation

Wobbrock, J. O., Wilson, A. D., and Li, Y. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th Annual ACM Symposium on User interface Software and Technology (Newport, Rhode Island, USA, October 07 - 10, 2007). UIST '07. ACM, New York, NY, 159-168. DOI= http://doi.acm.org/10.1145/1294211.1294238

Properties of Real World Digital Logic Diagrams

by

Christine Alvarado and Michael Lazzareschi

Summary

Pure sketch recognition is difficult. To get around this fact (which should be painfully obvious to us all by now), many designers put restrictions on how the user can draw. This becomes a balancing act between making the recognition easier and providing a natural drawing experience. These restrictions are often done so to make the developer's life easy, and there is no attempt to understand natural drawing behavior.

The authors wanted to observe the natural drawing behavior of students. They focused primarily on three aspects: stroke order, strokes timing, and stroke number. In their study students were given a tablet to use for their digital design course. They were instructed to use this tablet in lieu of paper. Thirteen students participated, 98 diagrams were collected and each one was labeled by hand. The authors looked at the temporally contiguous nature of the strokes, and how consistent each student was. 19% of all the symbols drawn by the students were done with non-consecutive strokes. These strokes were often corrections. Also noted was how long the students paused between strokes overlaps greatly with how long they pause between strokes in the same shapes, thus this pause is not a clear indicator of when the user is drawing a new symbol. The researchers observed that many students did not draw symbols consistently with the same number of strokes. However non consistent students were consistently non consistent, and consistent students were consistently consistent. If someone can say the prior sentence five times real fast I'll give them a cookie. Few properties were consistent across all students.

Discussion

Judging from the chart the stroke timing overlap is minimal, they do not overlap greatly as the author states. Is it perfect? No, but I think categorizing it as great overlap is off. Also the statement of focusing on digital logic diagrams because little work as done in the domain seems strange to me. Almost every sketch recognition paper I have ever read uses this as an example. The fact that drawing mechanics seem to not be similar across users bodes well for my semester project.

Citation

Christine Alvarado and Michael Lazzareschi. Properties of Real World Digital Logic Diagrams. 1st International Workshop on Pen-based Learning Technologies (PLT) 2007.

What are Intelligence? And Why?

by

Randall Davis

Summary

Various fields have attempted to adopt AI as their own, and thus by doing that they put their assumptions, models and metaphors onto the field. Many times this doesn't mesh well, and well, that is how academic religious wars start. Davis begins by defining the four behaviors used to distinguish intelligent behavior, and also list the five fields contributing to "reason". Davis then goes on the give a brief summary of each fields approach to AI (machine logic, intelligence as a biological function....). Davis then transitions to the why of intelligence, discussing how evolution got use here. Evolution is blind search, is isn't engineering. Davis also notes that "Biology provides no definition of a problem until it has been revealed by the advantage of a solution." In essence nature is a fairly lousy engineer, and it is a satisfier not an optimizer. The paper points out that the human mind might not be the pentacle of achievement comparing it to a legacy application, designed with add on. I think he is trying to compare it to Microsoft Word, I really do. He finishes this section by pondering the question what if the human mind was written for another purpose and it was adopted for its current usage only after the fact?

At this point the paper switches to intelligence in the animal community. Vervet monkeys are discussed and how they can use transitive inference to establish place in the hierarchy. The monkeys also have a rudimentary vocabulary (TIGER!!!), but they don't have language (Hey, remember that tiger, man, that thing was scary.).

Davis pontificates (overtly word, but I got nothing else) that thinking could be a form of internal visualization. He thus gives two suggestions: 1) notion of visual reasoning, reasoning with diagrams as things that we look at, whose visual nature is a central part of the representation, 2) Thinking is a form of reliving. Davis notes that sense, vision and language are not simply I/O, but part of the thought process itself. Perception can be useful for thinking. A nice quote is when Davis writes about "creating for ourselves a concrete mini-world where we carry out mental actions and then examine the results."

Davis concludes by stating that intelligence are many things (it pains me to not type "is"). AI should be the study of the design spaces of intelligences (note the plural). Thinking is visual, and diagrams are useful to reason with not just about.

Discussion

This is one of those papers that is not about understanding an algorithm. It is a decade old, but from a historical stand point it is interesting. This isn't taking anything away from what he said, it is a good read, insightful, balanced and funny. The thing is, the more he talked about visual intelligence, and the use of diagrams the only thing I could think of was "He means sketch recognition." The whole "reasoning with diagrams" seems to be fundamental to what sketch recognition hopes to accomplish. The pen is the simplest tool for getting these designs into the the system. It is more natural than using a pen-menu based system.

Citation

Advances in Mathematical Sketching: Moving Toward the Paradigm's Full Potential

by

Joe LaViola

Summary

The paper focuses on mathematical sketching, specifically with the building of an application that will allow for sketching of expressions and the animation of drawn diagrams that correspond to those applications. The system is MathPad2. The UI depends heavily on the user to do numerous actions by way of a lasso and tap. The expressions must be manually segmented by the user. Associations can be made between the mathematical equations and the diagrams. These can be implicit such as a variable in an expression appears on a diagram. Or they can be explicit, by having the user draw a line between two entities that should be associated. MathPad is capable of graphing functions, solving equations and evaluating expressions. The system uses Matlab as its backend.

To aid with the recognition users must provide a sample (10 to 20) of each symbol they will use for the system to train on. Key points in the stroke are used for recognition. Pairwise examination provided an improvement in recognition over past versions of the system. LaViola is able to achieve 90.8% parsing accuracy.

The need for fixing diagrams to match what the user drew to what the math specifies is also covered. Sketching cannot have the precision that math must. Angle, location and size of the various parts of the diagram are adjusted.

The author wishes this system to be used by students. A mathematical notebook of sort. One in which the diagrams can be animated depending on the expressions we compose.

Discussion

There seems to be a lot of dependency on the lasso as a means of interaction. This paper reminds me of the work Adler is doing. It is difficult to be precise when sketching, and in the engineering world precision is required for the computation produced from these sketches to be accurate. Adler is attempting to do this by using an additional mode of interaction (voice). LaViola is doing something similar by using the equations as an additional mode.

Citation

LaViola, J. "Advances in Mathematical Sketching: Moving Toward the Paradigm's Full Potential", IEEE Computer Graphics and Applications, 27(1):38-48, January/February 2007.

Sunday, November 25, 2007

Ink Features for Diagram Recognition

by

Patel, R., Plimmer, B., Grundy, J., Ihaka, R.

Summary

The paper begins by noting that there are three approaches to recognition: bottom-up, top-down, or a combination of both. Ink features - measured aspects of an ink stroke - are used by a variety of gesture recognition algorithms (Rubine, Long). Despite their oft use no one has studied the effectiveness of the various features. Patel et. al. conducted an experiment to find distinguishing ink features of text and shape strokes to be used in a divider application. Divider here being the act of determining if a stroke is text or a shape. They experimented with 46 features over seven categories: size, time, intersections, curvature, pressure, operating system recognition values, inter-stroke gaps. The goal was to find the most significant features to divide text from shape.

26 people participated in the study; nine sketches were taken from each user, and 46 features were calculated on each sketch. The authors took a decision tree or statistical partitioning approach to determine the most useful feature. The feature at the root of the decision tree is the most optimal. Eight features were identified as being significant. Testing on the training data their eight feature approach misclassified shapes 10.8%, and text 8.8%. On testing data shapes were misclassified 42.1%, and 21.4% on text. Interstroke gap is the most important feature in determining the split, followed by the size of the shape. Pressure, intersections, and time were not useful for the purposes of separating the strokes. The authors suggest using an HMM technique to provide more flexibility than their statistical partitioning approach.

Discussion

The paper was interesting because finally in one place there is a listing of a large sent of ink features. I know of Rubine's initial set, and Long's follow up additions, but they don't come close to the 46 given in this paper. The accuracy statistics were disappointing.

Citation

Wednesday, November 7, 2007

Speech and Sketching: An Empirical Study of Multi-modial Interaction

by

Aaron Adler and Randal Davis

Summary

Oltman attempted to use speech as a way to overcome the ambiguities of sketching. The system proposed was limited though. It was very domain specific, and communication only went one way. Adler and Davis hope to create a white board system that incorporates both sketch and speech to aid in early design work. The system should be able to engage the user in natural dialog.

A user study was performed in an almost Wizard-of-Oz setup. The participant was given a variety of sketching goals to accomplish. The experimented had an identical tablet and engaged the user in dialog while the user was attempting to accomplish their goals. The participant was able to change the color of their strokes, and noted was that strokes fall into one of four categories: creation, modification, selection or writing. The authors were able to make three observations about the participants speech. First, they were difluent and often repeated words or phrases. Second, when prompted with a question from the experimenter they often responded with words that were used in the original question. Third, the speech utterance were related to what the user was drawing at the moment. Other observations were that the user would often list objects and then sketch them in that order, they wrote out words they used in their speech, and participants often paused their speeches so as to finish the drawing they were describing.

When prompted with a question from the experimenter the user often gave much more elaborate answers than were necessary. Often they would spot errors or ambiguities when giving these responses. Participants also made comments not related to the sketch, but in relation to the domain. The paper goes on to give details about the connection between the time of the sketch and word/phrase groupings, noting that speech phrases preceded sketching.

Discussion

There were a lot of observations presented that could be incorporated into a system. This paper seems to be the ground work for an implementation. I question whether having the experimenter in the room would have caused the participant to interact more? If the participants would be so giving if it was only them and a machine? I liked the use of color as a way to give context about the drawing. There could be additional layers to provide context that wouldn't necessarily need to show up on the sketch. Interested to see where they take this.

Citation

A. Adler and R. Davis. Speech and sketching: An empirical study of multimodal interaction. In Fourth Eurographics Conference on Sketch Based Interfaces and Modeling, Riverside, California, August 2-3 2007.