Thursday, December 6, 2007

Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

by

Jacob O. Wobbrock, Andrew D. Wilson, Yang Li

Summary

The paper begins with stating that gesture recognition has mostly been the domain of AI and pattern matching experts. Wobbrock et. al. believes that the techniques used are too difficult for HCI experts to incorporate into their designs, and thus gesture recognition isn't getting the proper attention from the HCI community due to the high learning curve. Undergraduate HCI students often do not have the experience to implement traditional recognizers. Wobbrock et. al. describe in this paper a simple recognizer that is easy to implement and is accurate, and to boot they provide the complete algorithm design. The recognizer must work with consideration of certain goals dealing with resilience and ease of creation.

The $1 recognizer is a template based recognizer. For each symbol to be recognized at least one sample must be provided. The first step in the $1 recognizer is to resample the gesture so that it has 64 equidistantly space points. Next an the indicitive angle is found for the purposes of rotation. After rotation the gesture is scaled to a reference square and translated to a reference point. The distance between each point in the candidate gesture is calculated to its corresponding point in the template gesture. The best match is the template with the minimal distance (which is normalized to between 0 and 1). The indicative angle though is not necessarily the optimal angle to make the comparison from. Wobbrock et. al. want to fine-tune the angle of the candidate so that its path distance to the template is minimized. First a hill climbing approach was used, and this worked well for the number of rotations needed when the candidate and template were similar (7.2 on average). Unfortunately, the number of rotations needed for dissimilar gestures was too high. Golden Section Search was used in place of the hill climber.

The $1 recognizer does have some short comings. It cannot distinguish gestures whose identity depends on orientation. Gestures must be drawn in the style of the template, or multiple templates for each gesture must be provided.

4800 gestures were collected from ten subjects. There were sixteen gesture types. Each gesture was drawn by the user at three different speeds. The $1 recognizer has 0.98% errors. As the number of templates for each symbol increased the number of errors decreased. The authors note that speed has an impact on errors. The $1 recognizer is faster than Rubine and significantly faster than DTW. They were able to achieve a 99% accuracy, and 97% accuracy with a single template.

Discussion

Why the concern over the three different speeds? I do believe this would have been a good first algorithm to implement if for no other reason than there are no unknown thresholds to bother with. Nothing was worse than trying to "find" the thresholds that got certain authors their amazing recognition rates, and constantly wondering why they weren't in the paper.

It is a fine paper, I am just not sure why it was in UIST? It seems more of a paper for CS education than anything else. No real new technique is proposed.

Citation

Wobbrock, J. O., Wilson, A. D., and Li, Y. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th Annual ACM Symposium on User interface Software and Technology (Newport, Rhode Island, USA, October 07 - 10, 2007). UIST '07. ACM, New York, NY, 159-168. DOI= http://doi.acm.org/10.1145/1294211.1294238

3 comments:

Grandmaster Mash said...

The paper's in UIST because they marketed it for quick use in a gesture system. It was actually a great motivation for the paper, since people in UIST are probably more interested in getting a quick gesture system into a prototype than how accurate/powerful the gesture system actually is.

Paul Taele said...

Thanks to Aaron's comment, part of the mystery was unraveled about the paper's acceptance. As I posted in Josh J.'s blog, I was a bit surprised at the lacking of contribution this paper does to SR, especially for students learning it. Maybe, like what you implied, that this paper is a good SR system for people who want to hack one up together in a short time. As a serious system, I don't know how it can be improved beyond its simplistic model.

no said...

They are not really faster than DTW. It is simply that they did not use state of the art DTW tricks. They say “On average, DTW took a whopping 128.26 minutes to run the 14,400 tests for a given subject’s 160 gestures made at a given speed.”, but using a simple lower bounding trick (www.cs.ucr.edu/~eamonn/LB_Keogh.htm) reduces this to less than 20 seconds. DTW is amortized linear for large datasets. For a workshop paper (Everything you know about Dynamic Time Warping is Wrong) we did eight billion DTW calculations.


Eamonn Keogh