4Feb/110

Good Reads: High-Availability Storage Systems and Recommendation Algorithms

I have three papers today that are relevant to a project I'm working on. They're all under 10 pages and easy to read.

netflix.comFirst is a white paper by Siddharth Anand, Netflix’s Transition to High-Availability Storage Systems. This paper makes many of the points I've been trying to coalesce in my mind and communicate recently regarding NoSQL, but far more eloquently. I particularly enjoyed the two sections at the end, best practices and challenges of SimpleDB. One really can't go wrong with clear, concise lists such as he has written, and some of them actually made me snicker out loud as I imagined the consternation they could cause in some DBAs I've known over the years.

amazon.comWhile I've been very interested in the changes Netflix has been making moving to the Amazon AWS cloud. At the same time, I personally find their movie recommendation system to be frustrating and annoying, and so the second paper I have is on the Amazon.com recommendation algorithm. This paper is older, written in 2003, but still very relevant today. Amazon uses item-to-item collaborative filtering; achieving scalability by pushing the expensive operations to off-line computations and thus simplifying the real-time recommendation look-up. An algorithm building on that was presented recently, with a paper examining the YouTube.com scalable video recommendation system that was adopted about a year ago. youtube.comYouTube computes recommendations off-line with a series of MapReduce computations on the user graph of signals, building up a recommendation store in BigTable for fast real-time retrieval.

[Credit to a post by Gred Linden providing good food for thought.]

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


No trackbacks yet.