Wednesday, August 25, 2010

GoGaRuCo 2010: mini-interview with Ilya Grigorik

Ilya Grigorik (@igrigorik) is another GoGaRuCo speaker who's kindly agreed to sit down and work through a short interview with me.  Hopefully this gives you taste of what you'll be missing if you're not going to the Bay Area's regional Ruby conference.

Machine Learning and Ruby don't leap to mind as a common pairing.  Why is machine learning important to Rubyists?
Ilya I don't think the topic of Machine Learning (ML) can or should be linked any specific language or runtime - it is much more general then that. Wikipedia provides a good starting point: "Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data". Artificial Intelligence is a close cousin to this definition, and you will find a lot of people using these two terms interchangeably, but I prefer the ML definition because, to me, defining and modelling the learning process is where the work happens, whereas "intelligence" is the ultimate outcome (plus, defining intelligence is a much harder concept to agree on).
With that in mind, I think you could make the argument that Rubyists apply ML to many everyday situations already: sorting algorithms, recommendations, and so on. It is also a truism that by the time any "AI" hits the mainstream, it is usually no longer interpreted as "AI". For example, computers answering telephone calls was the domain of pure science fiction only a few decades ago, whereas now we don't even stop to think about it. How about your ITunes "genius" playlist? Pandora, You see where I'm going. It's all around us.
Why is Ruby a good fit for machine learning applications?
Ilya Because Ruby commands such a presence in the web development world, I think it naturally finds itself in domains and applications that stand to gain a lot by leveraging the available data in some interesting and novel way. But once again, it's not really a question of language, as much as it is a question of modelling what you know, and applying that data to interesting questions. If Ruby, as a language, allows you to model your data in a faster or easier way, then so much the better.
On a purely practical, implementation side, Ruby has a number of great libraries and plugins that allow you to leverage many interesting algorithms: support vector machines, decision trees, bayes filters, neural nets, and so on. Will those tools scale to million row matrices? Perhaps not, but they will allow you to iterate through a number of solutions at a minimal cost, which in itself is a big win.
Where doesn't ruby fit well in the domain?
Ilya It is unlikely that you will be analyzing a multi-terabyte dataset with a Ruby ML algorithm. More likely, you'll work on a scale of a gigabyte (or a few), model and iterate your algorithm on a subset of data with Ruby, and then implement a lower level solution to scale up to larger datasets.
You're pretty well know for deep diving blogs about Ruby.  What's your day to day relationship with the language?
Ilya My day to day job is with PostRank, where being CTO/founder means I'm wearing many different hats throughout the day. Having said that, most of our systems are written in Ruby, and we have definitely pushed the limits of the language on many fronts.
My blog, is in many ways, a reflection of the technical challenges we're currently dealing with at PostRank, or technologies we're evaluating to improve our infrastructure. So, while I may not be working on implementing the next feature which is going into our analytics product, I am likely to be involved in the design and deployment of the infrastructure that has to deal with servicing all of the data requests required to make that feature possible (and when you're pushing as much data around as we do at PostRank, that's always a non-trivial challenge). The combination of having an awesome team, and a large and exciting problem to work on means there is never a shortlist of what to write about out on my blog!
Other than your own talk, what are you most looking forward to at GoGaRuCo this year?
Ilya To be honest, every single talk on the agenda sounds fascinating to me - it's hard to pick any favorites. Having said that, I have been recently thinking and talking to a few people about the topic of "test driven learning", so I'm really looking forward to the "Test First Teaching" presentation by Sarah Allen and Alex Chaffee. I am really curious how this concept could be applied more broadly, outside of just learning a programming language. For example, could you structure a physics course in the same manner? Arguably some (great) teachers do this already, but I would love to extract and distill some general rules and patterns.

1 comment:

Anonymous said...

Thanks for the interesting interview. (Same goes for all the others, too!)