Tuesday, September 4, 2007

MARQS: Retrieving Sketches Using Domain- and Style-Independent Features Learned From a Single Example Using a Dual-Classifier

by

Brandon Paulson and Tracy Hammond

Summary

The authors propose an algorithm that can recognize unconstrained symbols. Other systems put constraints on the sketches - they must be single stroke, oriented in the same way, or drawn at the same scale. The authors propose a system in which images can be retrieved using sketches the authors have connected to the image. There are three goals associated with the algorithm: recognize unconstrained sketch symbols, recognize symbols from only a single training example, and learn from successful queries to improve accuracy over time. The application created to demonstrate this algorithm is MARQS, Media Albums Retrieved by a Query Sketch. The system is able to use the user's search queries over time to train itself.

The system uses a linear classifier, with a set of global features, when enough examples are available. The system normalizes the orientation of each drawing, by putting an axis through the two farthest points. The system uses four global features: a bounding box, pixel density, average curvature and the number of perceived corners. When only a single sketch is available the features for it and the user contributed query sketch are compared. The error is calculated by the difference of the values, the sketch with the lowest error is chosen.

The single classifier was used 27% of the time, linear 73%. Average search rank of the single classifier was 1.7, the linear classifier 1.44, and the hybrid system 1.51.

The paper notes that stroke length and speed are not good features when symbols are drawn in multiple strokes.

Discussion

The system, while not requiring the user to train the system, does require the user to create a sketch for each item that they want to search for (whole albums, or single images). Users are often hesitant, or lazy, about creating metadata to go with their images. Also, the responsibility is on the user to create an image that they can mentally connect to the image (not an easy task). It could be argued though that this is a more personal system than trying to compare a sketch to the actual image.

I was surprised that Brandon and Dr. Hammond were able to get such performance out of a small number of features. It would seem like there would come a tradeoff involving how much performance could improve versus the computation and the time needed to calculate additional features.

Citation

5 comments:

rg said...

I have to agree about forcing the user to generate metadata. It is a universally failed model. Digital solutions have best fit where that process can be streamlined of automated.

Test said...

I also wonder about performance issues, as recalculating features each time can cause a lot of slow down.

- D said...

Like iTunes and the using the CD database to retrieve metadata for me when I import a song (legally, of course ;). It seems like a better answer to this particular implementation problem (finding images based on sketch search) would be to match the sketch to the image, as you stated. Or with album of music, perhaps match a sketch to the album cover art or something.

Grandmaster Mash said...

In the media player application the issue of creating sketches to query is significant. In the notebook it wouldn't be as much of a problem, but then again users would have to remember what they initially drew.

Paul Taele said...

Like Josh, I made a similar connection of MARQS and iTunes when I read the paper. From my own personal experience, I also agree about the level of success for some people sketching images to retrieve pictures in MARQS when people are lazy enough to not create tags for these same images. An analog of what MARQS does to pictures with iPod to music would be the idea of a user humming the chorus of a song to retrieve a song. It's a neat idea on paper. Of course, the user response on the use of the algorithm for a collection of picures was already mentioned by Brandon during his presentation, so I guess I'm pretty much beating a dead horse...

On your other discussion comment, even though Dr. Hammond did bring up in class the downsides, in terms of computational time, of MARQS when the system has a lot of gestures that has to resort to the single classifer, I was also surprised myself of the performance in the general case in performance time for having such few features. There has to be at least some sort of workaround to bypass those downsides Dr. Hammond mentioned in order to accomodate those worse case. It would be a waste to abandon the algorithm's strengths, such as lack of training, over such issues.