Sunday, November 25, 2007

Ink Features for Diagram Recognition

by

Patel, R., Plimmer, B., Grundy, J., Ihaka, R.

Summary

The paper begins by noting that there are three approaches to recognition: bottom-up, top-down, or a combination of both. Ink features - measured aspects of an ink stroke - are used by a variety of gesture recognition algorithms (Rubine, Long). Despite their oft use no one has studied the effectiveness of the various features. Patel et. al. conducted an experiment to find distinguishing ink features of text and shape strokes to be used in a divider application. Divider here being the act of determining if a stroke is text or a shape. They experimented with 46 features over seven categories: size, time, intersections, curvature, pressure, operating system recognition values, inter-stroke gaps. The goal was to find the most significant features to divide text from shape.

26 people participated in the study; nine sketches were taken from each user, and 46 features were calculated on each sketch. The authors took a decision tree or statistical partitioning approach to determine the most useful feature. The feature at the root of the decision tree is the most optimal. Eight features were identified as being significant. Testing on the training data their eight feature approach misclassified shapes 10.8%, and text 8.8%. On testing data shapes were misclassified 42.1%, and 21.4% on text. Interstroke gap is the most important feature in determining the split, followed by the size of the shape. Pressure, intersections, and time were not useful for the purposes of separating the strokes. The authors suggest using an HMM technique to provide more flexibility than their statistical partitioning approach.

Discussion

The paper was interesting because finally in one place there is a listing of a large sent of ink features. I know of Rubine's initial set, and Long's follow up additions, but they don't come close to the 46 given in this paper. The accuracy statistics were disappointing.

Citation

1 comment:

Grandmaster Mash said...

The accuracy is disappointing in that it isn't nearly up to consumer standards, but the improvement in accuracy is immense. At first glance it looks as if the accuracy decreases from Microsoft's when recognizing text, but then you notice that Microsoft's just classifies everything as text, so it never gets text wrong.

Oh, and it's comment time if you haven't noticed. Expect to get a lot of these suckers tonight.