The making of the NYT’s Netflix graphic
January 20th, 2010
One of The Times’ recent graphics, “A Peek Into Netflix Queues,” ended up being one of our more popular graphics of the past few months. (A good roundup of what people wrote is here). Since then, there have been a few questions about the how the graphic was made and Tyson Evans, a friend and colleague, thought it might interest SND members. (I bother Tyson with questions about CSS and Ruby pretty regularly, so I owe him a few favors.)
Most readers are probably interested in the interactive graphic, although I will say that we also ran a lovely full-page graphic in print in the Metropolitan section, which goes out to readers in the New York region. That graphic had a lot of interesting statistical analysis – in fact, it would have been nice to get some analysis in the web version, more on that later – but for this I will focus mostly on the web version. If there are questions about the print graphic, I will make sure I get Amanda Cox to try to explain cluster analysis to me again.
First is the data itself. Jo Craven McGinty, a CAR reporter, was in contact with Netflix to obtain a database of the top 50 movies in each ZIP code for every ZIP in the country. That’s about 1.9 million records. The database did not include the number of people renting the movie – just the rank. (We [more here: http://www.snd.org/2010/01/nyt-netflix-graphic ]