Friday, April 01, 2011

Protocol Buffers - Brian Palmer's Take

Here's another in my continuing series of Ruby Protocol Buffers posts. This time, I've got an interview with Brian Palmer.(I interviewed Brian about his participation in a 'Programming Death Match' back in 2006.)

While working at Mozy, Brian worked with Protocol Buffers, and now maintains the ruby-protocol-buffer. He left Mozy last September, and is now working at a startup called Instructure now.

I hope you enjoy Brian's take on Protocol Buffers.


How did you get started using Protocol Buffers?

Brian: At Mozy, the back-end storage system is written in C++. When we started to standardize our messaging, we evaluated a few different libraries, like Thrift, but ultimately settled on Protocol Buffers because of their great performance and minimal footprint. Our servers handle terabytes of new data a day, so we became sort of "I/O snobs" I guess. We hated using any library that handled its own I/O in a way that we couldn't pipe everything through our finely tuned event loop and zero-copy data structures. So Protocol Buffers, where the data structure and wire format are standardized but the surrounding protocol is up to you, was a perfect fit.

What are its primary application domains?

Brian: I'd say Protocol Buffers are a better fit than say, JSON, for more performance sensitive code, since the wire format is extremely space efficient but fast to parse and generate. Or if you need a very flexible protocol. For instance, we use protocol buffers for the message header but can just pack the actual file data in a raw byte array in the same message, saving the overhead of having our protocol library parsing all those terabytes of data.

Or if you want a flexible data format that still allows a bit more definition than just an arbitrary map of key/value pairs, like JSON. Requiring that fields be defined up front in the .proto file can be really helpful when trying to coordinate communication between different apps internally, especially with the guaranteed backwards compatibility.

Why did you decide to write a Ruby library?

Brian: Mozy's back-end is C++, but we use Ruby for the integration test suite for that system, along with all the web software. So we found that we really needed Protocol Buffers for Ruby. This was 3+ years ago -- at the time, we looked at the existing ruby-protobuf library and it wasn't at all suitable for our needs.

Initially this was just going to be a small internal tool, part of our testing framework. There wasn't any talk of open sourcing the library until we'd already been using it internally for a couple years. I just looked at ruby-protobuf again when you contacted me, and it looks like it's come a long way in both completeness and performance. Makes me a bit sad that I might have muddied the waters with another competing library when neither is a clear winner, that's unfortunate. Hopefully somebody finds it useful, though.

How much effort did you put into making your library performant? Ruby-ish? What are the trade-offs between the two?

Brian: My main focus was on performance, since we found that ruby's time spent encoding/decoding protocol buffers was actually a bottleneck in running our integration tests. Internally at Mozy we install the library as a debian package, rather than a gem, and this includes the C extension that is currently disabled in the gem packaging, which provides another performance boost, especially on ruby 1.8.7.

I think the library remains pretty ruby-ish though. The only place that doesn't feel very ruby-ish is in the code generation: while in theory you could just write your .pb.rb files by hand without writing a .proto file, it's not very natural in the current implementation. So you'll want to use the code generator. But the runtime API is very natural to use, I think.

Since our main goal was interfacing with an existing C++ infrastructure that uses .proto files extensively, this was a natural trade-off for us to make. It wouldn't take much effort to make a more ruby-ish DSL for the modules, though.

With three competing implementations out there, how do you see the playing field shaking out?

Brian: To be honest, I haven't looked closely at the other ruby offerings in a couple years. But I can say that while generally I'm a big fan of choice, I'm not sure it really makes sense to have three ruby libraries for something as simple as protocol buffers — one library could probably easily be made to serve everybody's purposes. So in that sense, I hope a clear winner is established, if just to avoid fragmentation of effort.

What would you like to see happen in terms of documentation for ruby-protocol-buffers?

Brian: The best documentation for Protocol Buffers in general is definitely going to remain on Google's site. And my online library documentation covers how to use the runtime API pretty well. But it'd be nice to have more explanation of how to get up and running with ruby-protocol-buffers.