Showing posts with label interviews. Show all posts
Showing posts with label interviews. Show all posts

Friday, April 01, 2011

Protocol Buffers - Brian Palmer's Take

Here's another in my continuing series of Ruby Protocol Buffers posts. This time, I've got an interview with Brian Palmer.(I interviewed Brian about his participation in a 'Programming Death Match' back in 2006.)

While working at Mozy, Brian worked with Protocol Buffers, and now maintains the ruby-protocol-buffer. He left Mozy last September, and is now working at a startup called Instructure now.

I hope you enjoy Brian's take on Protocol Buffers.


How did you get started using Protocol Buffers?

Brian: At Mozy, the back-end storage system is written in C++. When we started to standardize our messaging, we evaluated a few different libraries, like Thrift, but ultimately settled on Protocol Buffers because of their great performance and minimal footprint. Our servers handle terabytes of new data a day, so we became sort of "I/O snobs" I guess. We hated using any library that handled its own I/O in a way that we couldn't pipe everything through our finely tuned event loop and zero-copy data structures. So Protocol Buffers, where the data structure and wire format are standardized but the surrounding protocol is up to you, was a perfect fit.

What are its primary application domains?

Brian: I'd say Protocol Buffers are a better fit than say, JSON, for more performance sensitive code, since the wire format is extremely space efficient but fast to parse and generate. Or if you need a very flexible protocol. For instance, we use protocol buffers for the message header but can just pack the actual file data in a raw byte array in the same message, saving the overhead of having our protocol library parsing all those terabytes of data.

Or if you want a flexible data format that still allows a bit more definition than just an arbitrary map of key/value pairs, like JSON. Requiring that fields be defined up front in the .proto file can be really helpful when trying to coordinate communication between different apps internally, especially with the guaranteed backwards compatibility.

Why did you decide to write a Ruby library?

Brian: Mozy's back-end is C++, but we use Ruby for the integration test suite for that system, along with all the web software. So we found that we really needed Protocol Buffers for Ruby. This was 3+ years ago -- at the time, we looked at the existing ruby-protobuf library and it wasn't at all suitable for our needs.

Initially this was just going to be a small internal tool, part of our testing framework. There wasn't any talk of open sourcing the library until we'd already been using it internally for a couple years. I just looked at ruby-protobuf again when you contacted me, and it looks like it's come a long way in both completeness and performance. Makes me a bit sad that I might have muddied the waters with another competing library when neither is a clear winner, that's unfortunate. Hopefully somebody finds it useful, though.

How much effort did you put into making your library performant? Ruby-ish? What are the trade-offs between the two?

Brian: My main focus was on performance, since we found that ruby's time spent encoding/decoding protocol buffers was actually a bottleneck in running our integration tests. Internally at Mozy we install the library as a debian package, rather than a gem, and this includes the C extension that is currently disabled in the gem packaging, which provides another performance boost, especially on ruby 1.8.7.

I think the library remains pretty ruby-ish though. The only place that doesn't feel very ruby-ish is in the code generation: while in theory you could just write your .pb.rb files by hand without writing a .proto file, it's not very natural in the current implementation. So you'll want to use the code generator. But the runtime API is very natural to use, I think.

Since our main goal was interfacing with an existing C++ infrastructure that uses .proto files extensively, this was a natural trade-off for us to make. It wouldn't take much effort to make a more ruby-ish DSL for the modules, though.

With three competing implementations out there, how do you see the playing field shaking out?

Brian: To be honest, I haven't looked closely at the other ruby offerings in a couple years. But I can say that while generally I'm a big fan of choice, I'm not sure it really makes sense to have three ruby libraries for something as simple as protocol buffers — one library could probably easily be made to serve everybody's purposes. So in that sense, I hope a clear winner is established, if just to avoid fragmentation of effort.

What would you like to see happen in terms of documentation for ruby-protocol-buffers?

Brian: The best documentation for Protocol Buffers in general is definitely going to remain on Google's site. And my online library documentation covers how to use the runtime API pretty well. But it'd be nice to have more explanation of how to get up and running with ruby-protocol-buffers.

Thursday, May 07, 2009

RubyNation Mini-Interview: Russ Olsen

For my second RubyNation interview, I talked with Russ Olsen (@russolsen), one of the organizers. (I've also interviewed Hal Fulton about this upcoming regional Ruby conference.)

Don't forget RubyNation will be held in Reston, VA on June 11-13, so you don't have to much time left to register.

Russ is an awfully bright guy, and a real contributor to the Ruby community. You might also be interested in reading my earlier Russ Olsen Interview, or my book review of On Ruby: Design Patterns In Ruby.


Why did you decide to put RubyNation together?

Russ I think it was actually Grey Herter's idea originally, but I can remember sitting down with Gray and talking about putting on a little one day, 50 person mini-conference mostly to raise a little bit of money for the Our local Ruby users' group, which at the time was running on a shoestring.

From that initial idea it grew and grew, expanding to more than twice that original size, moving to a real conference center and spreading out to two days.

What makes regional Ruby conferences special?

Russ In general the regional conferences are fun because they all have their own feeling, each one putting a special twist on the what it means to be part of the Ruby community. I also think get a kick out of the fact that this is all coming from the bottom up, driven mostly by people who just want to be involved in the Ruby community.

What makes RubyNation stand out as a regional Ruby conference?

Russ There's really two answers to that question: In a larger sense, RubyNation can build on the thriving Ruby community that has grown up around Washington DC. We have a really diverse bunch of people here doing Ruby here, everything from Web 2.0 start ups to older established companies that are trying to find a better way, to individual, enthusiastic techies.

For me personally, RubyNation is great because I already know a lot of the people, but it's rare to see everyone in the same place at the same time.

What's your favorite memory from last year's RubyNation?

Russ I had just finished helping clean up (trust me, this conference organizing stuff is not all glamor) and happened to find myself walking out with the last two attendees. One of them turned to me and said, "So how long have you guys been doing this?" and I realized that we had pulled it all off with enough professionalism that at least one person thought we had been at it for years. Or he was just being polite. Either way it worked for me.

What are you most looking forward to for this year's RubyNation?

Russ I'm looking forward to hearing Hal Fulton talk about Reia, a new programming language that he has found. Remember, the last language that caught Hal's attention...

[ed. you can learn more about Reia in my interview with Reia developer Tony Arcieri.]

Why should people come to RubyNation?

Russ Come because we have a great line up of speakers. Come because it's not very expensive. Come because the Washington area Ruby community is just bubbling over with enthusiasm. Come because June in Washington is not nearly as bad as August in Washington. Come because you might hear about the next big thing. Come and tell us about your next big thing.

Click here to Tweet this article

Diamondback Ruby Interview

After the announcement of Diamondback Ruby on ruby-talk a bit ago, I decided to contact the developers to learn more about what they're doing. Two of the team members, Mike Hicks and Mike Furr, and I ended up having quite a conversation that I'm posting as the interview below.


What do you hope to learn from this project?

Mike Hicks There is a long-discussed tension between statically (or explicitly)-typed languages like Java and dynamically (or implicitly)-typed languages like Ruby. My hope has been to discover how to include the best aspects of static and dynamic typing in a single language. We all really like Ruby's design and features, and felt it was the right language to start with. As we go, we'll look to derive general design principles that make sense for most any programming language, to simplify and improve the process of programming generally.

Static types are useful at catching bugs early, and serving as useful documentation (indeed, RDoc includes a pseudo-type for Ruby library methods). But static type systems may reject perfectly correct programs due to their imprecision, and thus can "get in your way," particularly when doing rapid prototyping. On the other hand, dynamic types suffer no imprecision, but delay discovery of some bugs until run-time. It can be particularly annoying to mistype the name of a method in a call deep within your complicated program, and have the program fail after running for a long while with "method not found," when having a type checker would have immediately revealed the mistake. The challenge is to incorporate the best bits of both approaches, e.g., to not reject programs prematurely (as a static type system could) while finding as many certain errors in advance as possible.

Mike Furr We had two main hypotheses going into this research. The first hypothesis was that development in dynamic languages doesn't scale. This is a hard thing to measure, but there is a fair amount of anecdotal evidence that seems to support this. For example, just recently a few of the Twitter developers were interviewed about moving their infrastructure from Ruby to Scala, and static typing was mentioned as one of their advantages of Scala over Ruby.

The second hypothesis we had was that most people "think in types" even if they don't write them down while programming. Thus, we imagined that most code written by Ruby programmers would be accepted by a sufficiently advanced static type system. Thus we hoped to be able to design a type system for Ruby that was powerful enough to handle the kind of Ruby code people actually write, but not be so complex that it was impossible to use.

What made you choose Ruby as your test implementation instead of Python, Perl, or one of the other widely used dynamicly typed languages?

Mike Furr When we were first throwing around ideas about analyzing a scripting language, Ruby seemed to be the language with the most momentum, likely because of the release of Ruby on Rails a few years earlier. Ruby is also a rather young language and its syntax and semantics are continuing to evolve. Ideally, we hope that our research could influence future directions for Ruby, although much of our research would be applicable to other languages as well.

Mike Hicks I really like Ruby's design, particularly the principles that "everything is an object" and "duck typing." We also liked that Ruby was the core of Ruby on Rails, whose popularity was increasing. In our exploration of Perl, an early contender, we became frustrated with the huge number of overlapping language features, and quite surprising behavior in many instances. We didn't see how to write a useful static analysis for Perl programs without a lot of difficulty. We thought about Python only cursorily, and I don't recall any particular downsides that came up.

Why did you choose OCaml as your implementation language?

Mike Hicks Our group at Maryland uses OCaml almost exclusively for writing static analysis tools, particularly for analyzing C code, so it was natural to want to use it for this project, too. OCaml, in my view, is the perfect language for writing a compiler or analysis: its syntax is terse (as compared to Java, say), and features such as first-class functions and pattern matching very much simplify the process of writing tools that analyze or manipulate structured data, like abstract syntax trees. We followed the lead of the CIL (C Intermediate Language) project, also written in OCaml, in designing RIL (Ruby Intermediate Language), the core of DRuby. For example, both CIL and RIL syntax trees are simplified after parsing to make analysis more manageable.

Mike Furr OCaml is my favorite language to program in and I have been using it throughout my time in graduate school. However, I also think it is the right tool for the job. The quintessential example for functional programming languages is writing a compiler, and DRuby is essentially a compiler front-end. OCaml's type system is also a real asset in developing a complex code that manipulates abstract syntax trees.

A lot of folks seem to think that you've written a Ruby implementation in OCaml instead of a type analyzer for the existing 1.8 Ruby. Do you think an OCaml implementation of the language would be a good thing? Why or why not?

Mike Hicks This is a hard question to answer. Why might one want to implement an interpreter in one language vs. another? I can imagine several reasons: performance, portability, maintainability, and reliability, among others. Developers often implement interpreters or VMs in C/C++ for reasons of performance and portability. But C and C++ encourage programming "on the edge of safety," so mistakes can lead to crashes, security vulnerabilities, etc., hurting reliability and maintainability. By contrast, coding in a high level language, e.g., Java or OCaml, avoids many reliability problems, thanks to type safety and garbage collection, but at the cost of some performance. (For grain-of-salt interlanguage performance metrics, check out the Computer Language Benchmarks Game, http://shootout.alioth.debian.org.) And the language is really well-suited to writing compilers and interpreters, thanks to its rich structured datatypes and pattern matching. In my experience, an OCaml-based compiler or interpreter is much more succinct than one written in Java. So I think it's a good option.

Mike Furr There are already several implementations of Ruby available and so adding another one simply because it was implemented in OCaml doesn't seem like a good idea to me. Maintaining an implementation of Ruby is a lot of work since the language continues to evolve its syntax and semantics from version to version and the API of Ruby's standard library is also tied to particular Ruby versions. Instead, there would need to be a fundamental new feature that an OCaml implementation would provide that developers would find useful. For example, it might be interesting to explore compiling Ruby programs to native code using the OCaml bindings of LLVM and doing type driven optimizations based on Diamondback Ruby's type system. This would be a lot of work and I have no idea if the resulting code would be any faster than some of the newer Ruby virtual machines, but it could be a fun project.

How does static type inference affect the balance between testing and debugging? How does it affect the testing and debugging processes?

Mike Hicks Type inference is a debugging aid, I suppose. It is meant to help identify bugs that could come up, and do so without requiring you run your program. It is not a replacement for testing, though. Essentially it finds out whether you are programming with a certain level of consistently; if in one place you declare your method to take three arguments but elsewhere call the method with four, that's an inconsistency. But it doesn't prove that your code does "the right thing," e.g., whether you formatted your output string correctly. You need tests for that. Our approach allows one to help the other. When you write tests, DRuby will profile their execution to provide information that helps type inference. And type inference helps you identify some bugs without having to run tests.

Mike Furr Static typing is a tool just like testing frameworks and debuggers. All of them are meant to improve the quality of the software being developed, and each has their own advantages and disadvantages. The major advantage of static analysis is that it is able to reason about every path through your program simultaneously (and before it is run). Static types also provide terse, verified documentation. If you method has a type annotation that says it returns a Fixnum, that annotation will never become stale and can be trusted by any other developer who is calling your method.

However, static typing isn't perfect and is not meant to replace other QA techniques such as testing. One of the goals for DRuby is to allow programmers to incrementally add static types to their code bases, allowing them to benefit from extra checking where they want, without requiring changes to the entire code base.

You mentioned that you've found several potential errors in Ruby libraries and prgrams as a result of type inference analysis. What kinds of problems are you finding? How could the Ruby community take advantage of these kinds of discoveries?

Mike Furr The Ruby community has accepted test driven development as a standard practice and so we didn't expect to find a large number of errors. However, getting 100% testing coverage is often difficult and, not surprisingly, many of the bugs we found were in error handling code that was not exercised by any test cases. Some of these bugs were extremely simple, like misspelling a variable name, or referencing a method that did not exist.

One bug that I found particularly interesting was where a program called a method in the "File" class that didn't exist. This code was covered by the test suite and didn't cause a test failure. The reason for this was because the testing code monkey patched the File class to add the method before running the test suite. Thus, you would only encounter the problem if you executed the code outside of the test suite.

We hope that DRuby will develop into a tool that developers can run on their projects as part of their own quality assurance process. In the mean time, we have been filling bug reports for the errors we discovered so that the authors can fix them. For example, we found 2 errors in the Rubygems package manager that have already been fixed in their latest release.

What kind of feedback are you getting from Rubyists?

Mike Furr We've gotten some very encouraging feedback so far. In fact, despite the legendary flame wars between static and dynamic typing, I haven't received any negative comments about the idea of bringing static types to Ruby. A lot of people are using Ruby these days and so a tool that can help improve their development process through finding bugs or improving documentation is clearly appealing. Diamondback Ruby still needs some polishing so that programmers can begin using it on their own projects, and this is something we are going to continue to work on. Eventually, we'd like to perform some user studies to measure the effectiveness of the various features of Diamondback Ruby, and so its usability is certainly important to us.

How well do Ruby programs perform under Diamondback Ruby?

Mike Furr Diamondback Ruby uses a combination of static and dynamic checks to ensure that Ruby programs are well typed. Programs that can be checked purely statically (which we hope will be most of the time) will have no overhead at all since the programs can be safely run by a traditional Ruby interpreter unchanged. However, if the program does require a runtime check, then individual objects or methods may be instrumented to ensure they don't violate their types. When dynamically checking objects, we instrument the eigenclass of the individual object so that only methods calls to that object must be checked (not every object of the same class). Thus the checks are pay-as-you-go: the more objects that require dynamic checks, the higher the overhead. Therefore, it's hard to use a single measurement to quantify the overhead as it can vary from execution to execution of an application. I have run some micro-benchmarks and observed a 15% slowdown in one case, but this data point should be taken with a grain of salt, as it was merely to convince myself that the instrumentation was working and not egregiously slow. An application that rarely uses a dynamically checked object may see almost no overhead, but if the application calls methods on that object in a tight loop, it could be significantly higher.

Are you using the RubySpec framework to drive your implementation?

Mike Furr DRuby includes a dynamic analysis that allows us to reason more precisely about features such as eval(). This analysis requires us to parse the original Ruby program into our intermediate language, add any instrumentation code, and then write the transformed program out to a separate location on disk to be executed by the Ruby interpreter. This whole process was rather tricky to get right, and we used the RubySpec test suite to ensure that our transformations were correct. It was definitely a great help to have such a comprehensive test suite.

We haven't used the RubySpec tests to drive any type analysis for the standard library just yet, but I can definitely see using it in the future as we continue our research.

I'd really like to see OCaml get more play, but I keep seeing books like this and wonder when a good OCaml book for non-Math/CS types is going to hit the shelves. What will it take to get OCaml in front of more developers?

Mike Hicks My observation is that languages take off when library or framework support for some important set of applications hits critical mass. Then developers interested in that application intuit that it's easiest to build that app in a certain language or framework, and then go off and learn what they need to learn. Then those developers start building more libraries and the language is used for other things. I think we can see this trend in Perl (first killer app: text processing), Java (first killer app: applets), Ruby (first killer app: Rails), etc. We're starting to see more adoption of Erlang, thanks to the rise in multi-core and high-availability commercial systems, and we're seeing a growth in Haskell, at least in part because of all of the code you can get for it (though I can't speculate on what its killer app is).

When I first started doing work in static analysis, C/C++ were the languages of choice, oftentimes building on gcc or other existing tools. But then George Necula and his students wrote CIL (C Intermediate Language). Nowadays many, many tools are written using CIL as the front end and intermediate language, by people who never were "functional programming people." CIL was just so much better, clearly, than anything else, that people flocked to it. (As of today there are 297 citations to the CIL paper, according to Google Scholar — esp. noteworthy for an "infrastructure" piece of work.)

Of course, C analysis and other "symbolic computations" on programming languages are a niche area, and not likely to bring in the masses. OCaml still needs that breakthrough use-case of great interest that will push it over the top. It remains to be seen what that will be. But once it's found, the books, tools, etc. will all follow.

Mike Furr I agree with everything that Mike (Hicks) said but would also add that OCaml needs to overcome its branding as an "academic language". I have found that a lot of programmers think of functional programming as a fringe concept, perhaps invoking bad memories of struggling with it as an CS major. At the same time, one of the features people really seem to love about Ruby are blocks, which of course are a functional programming technique. I think that Ruby's syntax plays an important role here: programmers don't have to understand what a higher order function is to be able to use a block and yet they can immediately see the usefulness of the technique. However, a functional programmer might find this syntax restrictive (why only one block per method?). Perhaps the road to OCaml's adoption will be through Ruby which gives a gentler introduction to some of the same ideas used in ML.

Click here to Tweet this article

Wednesday, May 06, 2009

MWRC Interview Collection

I've done a bunch of interviews ABOUT MWRC. Some of these have been quite popular, while others are sort of hidden gems. If you've enjoyed one of them, maybe you'd like to see some of the others. I'll start out with three of my favorites, but there's a complete list down below.

1) My interview with Philippe Hanrigou has become one of the most popular posts on my blog. He has developed a reputation as one of the great MWRC speakers, and I think his interview is a good indicator of why. You might also want to see his talks on What to do when Mongrel stops responding (from 2008) and What The Ruby Craftsman Can Learn From The Smalltalk Master (from 2009).

2) I started an interview with David Brady on twitter. Before it was over, Kirk Haines and Jim Weirich had joined in. Not only did this interview spawn a great meme, the story got better when David came into the second day of conference from his sickbed and gave his talk during lunch (we'd already run the replacement talk). Now that's dedication. You should go watch David's presentation on TourBus, Kirk's, on vertebra, or Jim's keynote.

3) I like this last one because it provides a different take on things. I interviewed Josh Susser about his interest in attending MWRC. Maybe most telling about his interest was that he went off and organized GoGaRuCo (hmmm, isn't there something about imitation and flattery — just kidding, Josh and GoGaRuCo are great!).

MWRC 2009


MWRC 2008