So I built a Netflix app last weekend…
I had a simple goal. I wanted to find the absolute worst 100 movies on netflix. You see, I’m a bad movie fan. I’m not talking a little bad. I’m talking the worst of the worst. The unwatchable. The unthinkable. I’ve found the humor of them and they totally crack me up (much much thanks to Jane). I was also very much inspired by IMDB’s Bottom 100.
So, I set forth on this task. First, I looked at Netflix’s APIs. I started down the path of using their REST API, which wasn’t bad, but had a few drawbacks.
First, you can only search movies by terms. This makes this a bit difficult for searching their catalog by rating or by pretty much anything else. They do have a full index, so I set off in fetching that. It took a bit to download the full catalog (around 300M), and it was a bit of a pain because it was behind their oauth APIs. After that, with grep and sed I was able to extract out the 76k or so titles from the catalog that were movies (not TV Series or People or Genres). Here is the simple client I wrote with oauth & crack. But the catalog didn’t have user ratings, so I had to go fetch and store all of them, then find the worst.
This was a great chance to play around with redis - it supports sets with scores and it’s wicked simple and fast. So, I whiped up this code so that I could run this code and determine the worst movies. I was all set.
Except one problem. Netflix’s API limits are 5,000 requests a day. If I run this every day, that will be 15 days until I can finish my app. That’s no fun. So, I went back to the drawing board, and looked over at Netflix’s OData APIs. I had never really heard of OData before, but it looked to provide a way of search and filtering on the data that I wanted. After finding the ruby odata gem, this gave me the start of what I needed.
I ended up whipping up this bit of code, which allowed me to not only search and sort by average rating, but I could also filter by DVDs available and Netflix Instant.
From there, it was a matter of throwing it together in a rails app and deploying to heroku. Heroku is simple and easy for hosting rails apps, and free for smaller usage.
So, I finished up and deployed it. I then realized it was a bit slow (from looking at newrelic), and decided to add some caching. Heroku provides a free memcached plugin (up to 5M), so I plugged that in and fragment cached the views. Since this data doesn’t seem to change often, seems like a pretty good compromise.
Challenges, Issues, Final Thoughts
All in all, pretty simple and fun. And not bad for a small weekend project. There are some take-aways. First, I still can’t get this 100% right. Netflix exposes average ratings, but not number of ratings. So, if there are ties, then I can’t sort by number of votes next (which is really more fair and accurate - it’s much harder to maintain a low rating the more votes that come in). Second, I’m still having issues with the ruby odata client filtering by Genre. I know it can be done, but I haven’t been able to get it to work yet with the gem. Ideally, I want the worst for each Genre as well. Overall, Netflix has done an okay job with providing some APIs, but they could really use some love. They last updated the Odata “preview api” about 10 months ago, they last updated the REST API 2 years ago. They were hiring 7 months ago, so I’m hoping they got their new devs and will be making some changes soon.
So, now it’s off to the movies (if you dare).
The other day my teammate said, “I finally found something that stinks about Ruby’s core library.”
What was wrinkling his nose? The Dir class gives you no way to ask for files only, or for directories only. He had a point. The Dir.entries is inflexible. And even the flexible …
I really disagree with Nils Jonsson’s take on ruby’s standard library and so does matz. Nils claims that Something stinks about Ruby’s Core Library, and goes on to provide an implementation for Dir.files and Dir.directories. Yes, there might be functionality that is missing here, but it’s trival to implement. His solution shells which is both dangerous and non-portable. It’s also unnecessarily complex. The beauty of ruby is that it can be changed in very simple ways to solve your problem - even the core library. Here’s my alternative (and much shorter and portable) solution:
class Dir
def self.directories(path)
Dir.new(path).entries.select do |e|
File.directory?(File.join(path, e))
end
end
def self.files(path)
Dir.new(path).entries.select do |e|
File.file?(File.join(path,e))
end
end
end
Now, his two main arguments where:
- if you don’t mind some computing time and space wasted and you want to write as little code as possible
- if you know the contents of the directory in advance
The computing time and space will be much much worse with forking out to the shell listing all of the files, then gsubing out the items with a trailing slash. The ruby filtering will be much faster. But, if performance is a concern, I’m sure my code could be re-written as ruby C much like the rest of the Dir class, and it would be even faster. In the non-windows example, ‘ls -d’ is begin used to filter. In most POSIX implementations of ls, it uses opendir(), readir(), and stat(). So, ls is looking at each file and determining if it is a directory or not. This could potentially be sped up by using scandir(), but that’s a newer POSIX standard, so it might not be available on older unix systems.
Ruby Can’t Scale (Don’t Listen to John Metta)!
This is a follow-up to Ruby Can’t Scale! by John Metta
I want everyone to believe that Ruby can’t scale. I want you to think that Ruby is an awkward, weird little academic language that shouldn’t be used in “real” production deployments. I want you to think that it’s slow, memory hungry, full of security holes and breaks the rules of proper programming.
I don’t want everyone to use Ruby. I want to keep it for myself. I want my company to build software faster, easier, and still be able to go home before it’s dark out while outbuilding you and your product. I want to have my pick of small startups and recruiters breathing down my neck to hire me because I know this weird, obtuse, little language. If you learn it, then I’ll have to work even harder to beat out the next guy.
So beware of Ruby! It has lambdas, class evals, blocks, mixins, and all sorts of scary things. These are weird and strange! Stick to what’s tried and true. Java and C++ aren’t going anywhere. They are stable, and besides, that’s what real companies use. Save Ruby for me.
On Procs and Rubies
Recently, I’ve been playing around with Shoes. I ran into an interesting bug (or side affect?) in Ruby 1.8.
Here’s the first code sample, create a Proc for each item in an array, then calling it.
procs = []
[1, 2, 3].each { | x | procs << Proc.new { puts x } }
procs.each { | p | p.call }
# prints 1, 2, 3
Here’s the same code, but adding a local variable of x before calling the block.
x = nil
procs = []
[1, 2, 3].each { | x | procs << Proc.new { puts x } }
procs.each { | p | p.call }
# prints 3, 3, 3
So what happened here? Well, as it turns out, in Ruby 1.8, block variables are not scoped locally to the block. Therefore, when x is evaluated in the Proc, it’s modified in each call. So, when it’s finally called, it’s been changed. This has been fixed in Ruby 1.9. Running the same code in Ruby 1.9 prints 1, 2, 3 for both examples. And now you know.

