The Million Dollar Programming Prize

Posted on May 22, 2009  Comments (3)

The Million Dollar Programming Prize

One of the main areas of collaborative filtering we exploited is the nearest-neighbor approach. A movie’s “neighbors” in this context are other movies that tend to be scored most similarly when rated by the same viewer. For example, consider Saving Private Ryan (1998), a war movie directed by Steven Spielberg and starring Tom Hanks. Its neighbors may include other war movies, movies directed by Spielberg, or movies starring Tom Hanks. To predict a particular viewer’s rating, we would look for the nearest neighbors to Saving Private Ryan that the viewer had already seen and rated. For some viewers, it may be easy to find a full allotment of close neighbors; for many others, we may discover only a handful of neighboring movies.

A second area of collaborative-filtering research we pursued involves what are known as latent-factor models. These score both a given movie and a given viewer according to a set of factors, themselves inferred from patterns in the ratings given to all the movies by all the viewers [see illustration, “The Latent-Factor Approach“]. Factors for movies may measure comedy versus drama, action versus romance, and orientation to children versus orientation to adults. Because the factors are determined automatically by algorithms, they may correspond to hard-to-describe concepts such as quirkiness, or they may not be interpretable by humans at all.

The model may use 20 to 40 such factors to locate each movie and viewer in a multidimensional space. It then predicts a viewer’s rating of a movie according to the movie’s score on the dimensions that person cares about most. We can put these judgments in quantitative terms by taking the dot (or scalar) product of the locations of the viewer and the movie.

We found that most nearest-neighbor techniques work best on 50 or fewer neighbors, which means these methods can’t exploit all the information a viewer’s ratings may contain. Latent-factor models have the opposite weakness: They are bad at detecting strong associations among a few closely related films, such as The Lord of the Rings trilogy (2001–2003).

Because these two methods are complementary, we combined them, using many versions of each in what machine-learning experts call an ensemble approach. This allowed us to build systems that were simple and therefore easy to code and fast to run.

Interesting article. See some other posts on challenge prizes.

Read: posts on programingProblems Programming MathProgrammers (comic)

3 Responses to “The Million Dollar Programming Prize”

  1. Anonymous
    May 22nd, 2009 @ 11:51 am

    So to be clear, this type of filtering based of accumulated data suggests there will no longer be a need for movie reviews as the outcome of a movie (Pre-release) will already have a rating attached to it? If this is indeed the method than we should not be seeing any more horrible movies making it to the theatre. Movies such as “Dumb and Dumberer” the sequel which i had walked out on.

  2. A.Razvan
    May 26th, 2009 @ 8:26 am

    I want to say i love your articles , and your website design also. Good luck in the future. With respect A.Razvan.

  3. Anonymous
    May 28th, 2009 @ 6:31 am

    this is very kool article one must try this , i have a cute little doggy i’ll go for this ASAP

Leave a Reply