Sunday, November 25, 2007

Ink Features for Diagram Recognition

by

Patel, R., Plimmer, B., Grundy, J., Ihaka, R.

Summary

The paper begins by noting that there are three approaches to recognition: bottom-up, top-down, or a combination of both. Ink features - measured aspects of an ink stroke - are used by a variety of gesture recognition algorithms (Rubine, Long). Despite their oft use no one has studied the effectiveness of the various features. Patel et. al. conducted an experiment to find distinguishing ink features of text and shape strokes to be used in a divider application. Divider here being the act of determining if a stroke is text or a shape. They experimented with 46 features over seven categories: size, time, intersections, curvature, pressure, operating system recognition values, inter-stroke gaps. The goal was to find the most significant features to divide text from shape.

26 people participated in the study; nine sketches were taken from each user, and 46 features were calculated on each sketch. The authors took a decision tree or statistical partitioning approach to determine the most useful feature. The feature at the root of the decision tree is the most optimal. Eight features were identified as being significant. Testing on the training data their eight feature approach misclassified shapes 10.8%, and text 8.8%. On testing data shapes were misclassified 42.1%, and 21.4% on text. Interstroke gap is the most important feature in determining the split, followed by the size of the shape. Pressure, intersections, and time were not useful for the purposes of separating the strokes. The authors suggest using an HMM technique to provide more flexibility than their statistical partitioning approach.

Discussion

The paper was interesting because finally in one place there is a listing of a large sent of ink features. I know of Rubine's initial set, and Long's follow up additions, but they don't come close to the 46 given in this paper. The accuracy statistics were disappointing.

Citation

Wednesday, November 7, 2007

Speech and Sketching: An Empirical Study of Multi-modial Interaction

by

Aaron Adler and Randal Davis

Summary

Oltman attempted to use speech as a way to overcome the ambiguities of sketching. The system proposed was limited though. It was very domain specific, and communication only went one way. Adler and Davis hope to create a white board system that incorporates both sketch and speech to aid in early design work. The system should be able to engage the user in natural dialog.

A user study was performed in an almost Wizard-of-Oz setup. The participant was given a variety of sketching goals to accomplish. The experimented had an identical tablet and engaged the user in dialog while the user was attempting to accomplish their goals. The participant was able to change the color of their strokes, and noted was that strokes fall into one of four categories: creation, modification, selection or writing. The authors were able to make three observations about the participants speech. First, they were difluent and often repeated words or phrases. Second, when prompted with a question from the experimenter they often responded with words that were used in the original question. Third, the speech utterance were related to what the user was drawing at the moment. Other observations were that the user would often list objects and then sketch them in that order, they wrote out words they used in their speech, and participants often paused their speeches so as to finish the drawing they were describing.

When prompted with a question from the experimenter the user often gave much more elaborate answers than were necessary. Often they would spot errors or ambiguities when giving these responses. Participants also made comments not related to the sketch, but in relation to the domain. The paper goes on to give details about the connection between the time of the sketch and word/phrase groupings, noting that speech phrases preceded sketching.

Discussion

There were a lot of observations presented that could be incorporated into a system. This paper seems to be the ground work for an implementation. I question whether having the experimenter in the room would have caused the participant to interact more? If the participants would be so giving if it was only them and a machine? I liked the use of color as a way to give context about the drawing. There could be additional layers to provide context that wouldn't necessarily need to show up on the sketch. Interested to see where they take this.

Citation

A. Adler and R. Davis. Speech and sketching: An empirical study of multimodal interaction. In Fourth Eurographics Conference on Sketch Based Interfaces and Modeling, Riverside, California, August 2-3 2007.

Monday, November 5, 2007

Three Main Concerns in Sketch Recognition and an Approach to Addressing Them

by

James V. Mahoney, Markus P. J. Fromherz

Summary

The paper begins by noting that sketch recognition technology must meet three requirements: it must be able to cope with ambiguity, it must have interactive performance, and it must be extensible. To demonstrate their approach to solving these problems Mahoney and Fromherz took on the task of labeling stick figure drawings (i.e. labeling legs and arms). Mahoney and Fromherz believe there are three sources of ambiguity: sloppy drawing, articulation, and interaction with a background context. The sloppiness causes problems with segmentation, thus matching to a model becomes difficult. The authors propose a variety of preprocessing techniques to get around this (proximity linking, virtual junction splitting, spurious segment jumping). They further deal with the issues involved in sub-graph matching and model acquisition.

Discussion

I was expecting something much more from the title. A bit disappointed. It all seemed too focused on stick figures.

Citation

Mahoney, JV, & Fromherz, MPJ (2002). Three main concerns in sketch recognition and an approach to addressing them. AAAI Spring Symposium on Sketch Understanding, pp 105---112, March 25-27 2002.