Wednesday, September 12, 2007

A Domain-Independent System for Sketch Recognition

by

Bo Yu and Shijie Cai

Summary

The paper notes that the two main deficiencies of existing sketch recognition systems are their strict restriction on the manner in which the user sketches and their inability to analyze hybrid or smooth curves and decompose them into primitive shapes. The authors list six attributes of a practical sketch recognition system: users can draw naturally, consistent recognition results, understands hierarchical relations among graphic objects, should try and guess what the user intended to draw, drawing modification should be possible, and easily integrateable into other systems.

The author's system has a two stage process. First stage is imprecise stroke approximation; second stage is post-process. The system also has three levels of hierarchical output; lowest is raw data (the original stroke points), middle level is the syntactic level (vertexes and primitive shapes), the highest level is semantic level (recognition of primitive shapes and basic objects). Vertex detection is combined with primitive shape approximation in their system. A stroke is taken and compared to a primitive shape. If the primitive shape is not a match, the stroke is divided at the highest point on the curvature curve. These sections are thus compared to primitives, and the process repeats. Their system makes no use of the speed of the data stroke.

Strokes can be recognized as lines due to a horizontal line on the direction curve and simply if the points can be fitted to an actual line. In circle approximation when the stroke is put onto a direction graph it tends to form a line with the slope (2*pi)/n. An arc should have a line less than a circle. As the slope decreases the line becomes flatter, approaching the horizontal slope of a line segment. Self intersecting strokes pose a problem. Their system addresses overlapping by merging the strokes.

The authors have created a "module" that can recognize basic objects. This module sounds similar to LADDER in that it describes using more complex shapes using simple shapes. Editing strokes are briefly discussed, no real mention of how to switch between modes is given.

The overall recognition rate of primitive shapes and polylines was near 98%. Arcs alone was 94%, and hybrid shapes with smooth connections between lines and arcs had a recognition rate of 70%.

Discussion

The authors note "another noticeable feature of our system is the convient and efficient user interface for modifying recognition result." I am still unsure how the system differentiates drawn strokes versus strokes used for editing. How does one switch into modification state?

The idea of being able to recognize primitives, and thus breaking strokes down until it becomes a set of primitives seems very intuitive. Also the notion of the module for recognition being built for specific domains based on low-level descriptions of objects seems familiar.

Decent paper, the 70% doesn't bode well for us though. In many of these papers the recognition rates seem a bit inflated, so for them to openly admit 70% makes me conservative about our ability to recognize certain shapes.

Citation

Yu, B. and Cai, S. 2003. A domain-independent system for sketch recognition. In Proceedings of the 1st international Conference on Computer Graphics and interactive Techniques in Australasia and South East Asia (Melbourne, Australia, February 11 - 14, 2003). GRAPHITE '03. ACM Press, New York, NY, 141-146. DOI= http://doi.acm.org/10.1145/604471.604499

2 comments:

rg said...

It isn't clear how you switch from sketch mode to command mode, but it is clear that they use different modes for that input. I'm not sure that this is the best UI decision. It also isn't clear that drawing an arrow at a particular orientation is faster or better in anyway that Ctrl-Z or Edit -> Undo.

- D said...

You know, I'm starting to think that 70% is an AWESOMELY GREAT target to shoot for. Especially when I'm looking at about 10% accuracy in my current implementation (too many corners and line segments!). :( Of course, we don't know how well Sezgin himself would do on our dataset, which may be insanely hard.

I think you enter command mode /after/ you draw, at least that's how I envisioned it.

I always get a weird feeling when the papers talk about allowing the user to correct their algorithm's errors. Would this take more time than simply drawing it with CAD-like tools to begin with?