Thursday, February 16, 2012

The Art of R: interview and mini-review


The Art of R Programming is an approachable guide to the R programming language. While tutorial in nature, it should also serve as a reference.
Author Norman Matloff comes from an academic background, and this shows through in the text. His writing is formal, well organized, and tends toward a pedagogical style. This is not a breezy, conversational book.
Matloff approaches R from a programmer's perspective, rather than a statistician's. This approach shows through in several of the chapters: Ch 9, Object-Oriented Programming; Ch 13, debugging; Ch 14, Performance Enhancement; Ch 15, Interfacing R to other languages; and Ch 16, Parallel R. I do wish he had spoken to using R with Ruby as well as C/C++ and Python. I also would have liked to see a chapter on Functional Programming with R, especially after the teaser in the Introduction.
I asked Norm and an R using friend if they could help me get my head around things a little better, and the following mini-interview is the result.

Almost every language has some kind of math support. Why bother with R? Where does it fit in a programmer's toolkit?
Norm: It's crucial to have matrix support, not necessarily in terms of linear algebra operations but at least having good matrix subsetting capability. MATLAB and the Python extension NumPy have this, but I'm not sure how far they go with it. And since MATLAB is not a free product (in fact very expensive) I'm summarily excluding it anyway. :-)
Second, R has a very rich graphics capability, which really sets it apart from the others. You can see some nice examples (with the underlying R code) in The R Graph Gallery.
Third, R is "statistically correct." It was created by top professional statisticians in industry and academia.
Russel: As something of a polyglot, I find that each language comes with something of an attitude of how problems should be approached. The grammatical structure and keyword vocabulary of each language drives a way of thinking about problems, as well as what sorts of libraries must be created to cover what may be base structures and functions in other languages. R has a particularly rich data representation vocabulary which lends itself very nicely to a data-centric problem solving mindset. While many more general-purpose languages can, with appropriate libraries, deal well with data, R reduces the cognitive load required for working with multidimensional data sets. In my (relatively limited) work with R, I've come to think of R as a domain-specific language that happens to have some general-purpose functionality, while other languages such as Ruby, Python, Perl, etc., are general-purpose languages with many domain-specific libraries.
I really feel drawn to the idea that languages drive approaches to problem solving. It reminds me of the ##PragProg idea of a language of the year. With that in mind, what do you think a dynamic language (Perl, Python, Ruby, etc.) programmer going to find new and different in R? What about a programmer coming from a system programming language (C, C++, etc.)?
Russel There is much in R which is from the "dynamic language" camp you mentioned: dynamically typed variables, an interactive shell, dynamically loaded libraries, etc. These will be pretty quickly noticeable to a C/C++/Java/C# programmer.
The structure and forced-forethought enforced by those languages are part of their value proposition: they force programmers into design paradigms and ways of thinking that scale up well, while dynamic languages, with their looser syntax rules, do not enforce that sort of engineering discipline on the programmer. For highly organized people who think in very structured ways, dynamic languages are "freeing", while less structured thinking programmers can find that the lack of enforced structure puts a lot of onus upon them to be disciplined in their coding as program sizes get larger. For example, a simple flat namespace is great for a small program with a few dozen lines, but namespacing becomes much more important as your programs come to the thousands of lines and dozens of individual functions or components -- especially as programs become the shared workspace of multiple programmers.
I personally use R as a dynamic language, most of the time not even writing programs in it so much as using it in interpreted mode for data analysis and "analysis prototyping." In that sense, R does for data analysis what dynamic languages do for task automation: it allows you to easily play with scenarios and prototype your thinking about data quickly and easily. You can then codify the best of those techniques into a small (or large) program that can automate that work for various data sets.
Similarly, R has a very powerful and interactive help system. Most packages not only have a quickly available set of API and help documents, but sample data sets built right into the library. From a command line, R users can get examples of how to use almost any library, with sample data included specifically for that particular library.
R has some inconsistencies from its history that can make it feel more "old school" in some ways. For example,there are two object models and the older (S3-style) object model is widely used in older libraries. However, it's nowhere near as "bolted-onto" as languages like Perl or C. R has an extremely rich set of libraries easily available via CRAN (a la CPAN), but the flip side of this wealth is that these libraries work in many ways, expecting data in various formats, etc. Again, it's not as spotty as CPAN or the Python Cheese Shop, or even Pear—most packages are quite good— but it can leave some beginners feeling a little lost when they want to accomplish a certain task. That's pretty common in the open source world, of course, but can be an issue.
R's rich first-class data types build a foundation that is nicely added to by the various libraries and simple interactive shell. Enough libraries are written in native code that performance is generally top notch. For my part, I almost always find that the available libraries far exceed my generally limited statistical needs, so I rarely find myself needing to rewrite some particular statistical code. I'm not a statistician, so I find it quite valuable to not have to worry about that aspect of the work I'm doing in any given project. Additionally, the rich libraries generally spur me on to doing a richer analysis of the data than I would if I did not have such a fully-featured tool available.
Norm, in the Introduction of your book, you talk about R as a functional language. I wish there had been a chapter on this. Can you give some examples of what you mean? Russel, do you have any thoughts about R as an FP language?
Russel: Many languages have recognized the value of functional constructs and added at least simple implementations of lambda and map functions, first-class functions and the like . FP is generally considered to be more easily parallelized, and should thus scale better on modern multi-core and CUDA-like systems. This will be quite advantageous in large data processing jobs.
Norm: Every operation in R is a function. For instance, the operation y = x[5]is really the function call y = "["(x,5) Same for + and so on.
This is brought up throughout the book, starting with the vector chapter.
The biggest implication of this, in my opinion, is in performance. One can often speed up a computation by a factor in the hundreds by exploiting the FP nature of R.
What are some of the things you've done with R that show off it's power and/or niche?
Russel R works beautifully for many types of data analysis problems. I recently used R to generate annotated graphs of Bayesian content filter scorings against timestamps, with lowess smooth and regression line and other enhancements, all built into the graphs without additional effort. This was done for all permutations of the 5 variables used in the study which had tens of thousands of data points. I was using this as a script because of my need to regenerate the graphs repeatedly, but before I'd codified that process, I used R in a "tweak and go" sort of way, as R lends itself well to ad hoc data exploration. Adding and removing data attributes, filtering data, generating data models, regressions, etc., are all easy to do in an on-the-fly manner.
Norm: A fun application I've done is R code to analyze the differences and similarities between the various dialects of Chinese. It can be used as a learning aid for those who know one Chinese dialect but not another. This is an example in my book, in the chapter on data frames.

If you're interested in adding R to your arsenal of programming tools, this is a great way to get started.
Truth in posting—No Starch Press sent me a free copy of this book to review.

Friday, April 01, 2011

Protocol Buffers - Brian Palmer's Take

Here's another in my continuing series of Ruby Protocol Buffers posts. This time, I've got an interview with Brian Palmer.(I interviewed Brian about his participation in a 'Programming Death Match' back in 2006.)

While working at Mozy, Brian worked with Protocol Buffers, and now maintains the ruby-protocol-buffer. He left Mozy last September, and is now working at a startup called Instructure now.

I hope you enjoy Brian's take on Protocol Buffers.


How did you get started using Protocol Buffers?

Brian: At Mozy, the back-end storage system is written in C++. When we started to standardize our messaging, we evaluated a few different libraries, like Thrift, but ultimately settled on Protocol Buffers because of their great performance and minimal footprint. Our servers handle terabytes of new data a day, so we became sort of "I/O snobs" I guess. We hated using any library that handled its own I/O in a way that we couldn't pipe everything through our finely tuned event loop and zero-copy data structures. So Protocol Buffers, where the data structure and wire format are standardized but the surrounding protocol is up to you, was a perfect fit.

What are its primary application domains?

Brian: I'd say Protocol Buffers are a better fit than say, JSON, for more performance sensitive code, since the wire format is extremely space efficient but fast to parse and generate. Or if you need a very flexible protocol. For instance, we use protocol buffers for the message header but can just pack the actual file data in a raw byte array in the same message, saving the overhead of having our protocol library parsing all those terabytes of data.

Or if you want a flexible data format that still allows a bit more definition than just an arbitrary map of key/value pairs, like JSON. Requiring that fields be defined up front in the .proto file can be really helpful when trying to coordinate communication between different apps internally, especially with the guaranteed backwards compatibility.

Why did you decide to write a Ruby library?

Brian: Mozy's back-end is C++, but we use Ruby for the integration test suite for that system, along with all the web software. So we found that we really needed Protocol Buffers for Ruby. This was 3+ years ago -- at the time, we looked at the existing ruby-protobuf library and it wasn't at all suitable for our needs.

Initially this was just going to be a small internal tool, part of our testing framework. There wasn't any talk of open sourcing the library until we'd already been using it internally for a couple years. I just looked at ruby-protobuf again when you contacted me, and it looks like it's come a long way in both completeness and performance. Makes me a bit sad that I might have muddied the waters with another competing library when neither is a clear winner, that's unfortunate. Hopefully somebody finds it useful, though.

How much effort did you put into making your library performant? Ruby-ish? What are the trade-offs between the two?

Brian: My main focus was on performance, since we found that ruby's time spent encoding/decoding protocol buffers was actually a bottleneck in running our integration tests. Internally at Mozy we install the library as a debian package, rather than a gem, and this includes the C extension that is currently disabled in the gem packaging, which provides another performance boost, especially on ruby 1.8.7.

I think the library remains pretty ruby-ish though. The only place that doesn't feel very ruby-ish is in the code generation: while in theory you could just write your .pb.rb files by hand without writing a .proto file, it's not very natural in the current implementation. So you'll want to use the code generator. But the runtime API is very natural to use, I think.

Since our main goal was interfacing with an existing C++ infrastructure that uses .proto files extensively, this was a natural trade-off for us to make. It wouldn't take much effort to make a more ruby-ish DSL for the modules, though.

With three competing implementations out there, how do you see the playing field shaking out?

Brian: To be honest, I haven't looked closely at the other ruby offerings in a couple years. But I can say that while generally I'm a big fan of choice, I'm not sure it really makes sense to have three ruby libraries for something as simple as protocol buffers — one library could probably easily be made to serve everybody's purposes. So in that sense, I hope a clear winner is established, if just to avoid fragmentation of effort.

What would you like to see happen in terms of documentation for ruby-protocol-buffers?

Brian: The best documentation for Protocol Buffers in general is definitely going to remain on Google's site. And my online library documentation covers how to use the runtime API pretty well. But it'd be nice to have more explanation of how to get up and running with ruby-protocol-buffers.

Thursday, March 31, 2011

Protocol Buffers - BJ Neilsen's Take

BJ Neilsen (@localshred or at github) is a member of my local Ruby Brigade, and he's hacking with/on Protocol Buffers with Ruby — oh, and he's a fan of Real Salt Lake too.

He works for a Provo, Utah based startup MoneyDesktop. Where he helped them transition away from a less-than-desirable PHP solution to Rails. They now enjoy an entirely new service-architecture driven by Ruby (and Protobuf). When not working with Ruby, he runs OneSimpleGoal and plays around with iOS and Objective-C.

To get another take on Protocol Buffers, I asked BJ to join me for a quick interview. Enjoy!


How did you get started using Protocol Buffers?

BJ: At the beginning of 2010 I was hired by a startup in Provo to help build out their product offering. The entire application was written in Java, but for the piece I was to be in charge of I was given free reign to choose a platform. Of course I chose Ruby, but it soon became apparent that we needed a solid way to get data from one application to the other.

This need launched a refactor to a more service-oriented approach. Different solutions were researched for dealing with data interchange such as Thrift and the like, but we ended up choosing Protobuf for its simplicity, pedigree, and multi-platform support. No XML, no WSDL, just simple definitions compiled to the language of your choice. Defining a Data Structure and API with one declarative language, and then being able to build the client and server implementations in two different languages was a huge win. We created a Socket-based RPC server on the Java side, and called the endpoints from Ruby. It was very simple.

I'm now with a new company and the new team was very receptive to the idea of a Protobuf Service ecosystem for our service-oriented application. It is currently the primary method of internal data interchange between multiple service applications. At the time of writing, we have over 20 different proto definition files, 63 separate defined data types (including Enums), 15 independent service classes implementing a total of 32 service endpoints.

What do you see as the strengths of the Protocol Buffers data format?

BJ: One of the greatest strengths of Protobuf is its clear data definitions. Open up any .proto file and it's not hard to deduce the structure of the represented Data Types. Defining Service endpoints is similarly simple, meaning all of the ambiguity of WIki-based (or similar) API documentation is immediately eliminated. Clarity is such a key when building a large system with a team of any size. Being able to clearly understand how and what data is transferred within the system is absolutely key, especially when you hire beyond your core development team and need to get people contributing quickly.

I've already mentioned the power we gained from being able to tie together a Service architecture with multiple languages in a unified API. The Protobuf project officially supports Java, C++, and Python implementations for the definitions compiler and data serialization code, but they have a ton of third party code listed for many other languages like Objective-C and JavaScript (with support in Node.js as well).

Which Protocol Buffers implementation are you using? How did you end up choosing it?

BJ: The only Ruby project listed on Protobuf's "Third Party" page (at the time) was Mack's Ruby-Protobuf. This was a great start as the compiler was built in YACC. However, once I started integrating the API into our Ruby application, it became clear that the RPC side had been half-baked and just sort of thrown out into the wild. Files were compiled and stubbed in the wrong places, meaning that if I added any code to the stubbed client or server files, subsequent compiles would overwrite my changes. Not good.

By that time we were full-steam ahead on the Protobuf implementation in the other services, so I basically had to go in and rewrite the compiler code generation for each of the services, as well as a complete rewrite the entire RPC backend to become compatible with the Protobuf SocketRPC library written for Java. Since that first rewrite at the early part of 2010, I've since done another rewrite (late 2010) to use EventMachine as the RPC backend and I can tell you its lightyears faster, and the DSL is much sexier also, looking much more like an AJAX request with callbacks than a standard socket connection with byte-reading hell. You can get that code on my github fork on the compatability-0.4.0 branch.

What are your plans for you fork of Mack's ruby-protobuf? Will it get wrapped into his distribution or will you go all the way, rename it, and start publishing it as a gem?

BJ: Fantastic question. Currently I've packaged the gem internally for our SOA ecosystem to get around the problem of getting it into a full release with the original code. I've embarked in merge-hell attempting to get my code to work with theirs several times now and each time it just feels like it's not worth it. I've yet to have contact with the original developers (I'm fairly sure they live in Japan) and so I'm not entirely sure they'd accept any patches I'd send anyways.

I've also toyed with the idea that since I've changed a significant chunk of the original code I could just make it my own gem with some witty name (and a reference to the original). The only thing that has kept me from that path is that a) I'd prefer not to insult the original developers, and b) I'm a bit ashamed that there aren't very many tests backing up the RPC backend (the major piece that I wrote from scratch).

Each day we have thousands of successful RPC calls with a virtually non-existent error rate running through the EventMachine RPC code written into this gem, so it has certainly been battle tested in a heavily used production system. Unfortunately it just doesn't have that warm fuzzy feeling (for those who haven't used it yet) that you get when you have 200 green tests behind each class. However, patches with tests are certainly welcome :).

Anyone can pull from my fork on the compatibility-0.4.0 branch (essentially my "master" I build the gem from) and build their own gem if they wish. The current version in my fork is 0.4.0.8. I'd be happy to provide any answers to questions that may arise, and I may even be available to consult with anyone on how to implement Protobuf into your current system.

You gave a presentation on Protocol Buffers at uv.rb. How was it received? Do you see more people starting to use this data format?

BJ: To be honest, I'm not sure my presentation went the way I'd hoped, certainly not well enough to highlight many of the benefits and reasons for using Probotuf. I spent too much time showing the "How" instead of the "Why". I think many people left the meeting intrigued but it was also marred by a drawn-out rant by a few of the developers that were present, debating whether or not it was more prudent to use REST/JSON than a more declarative format like Protobuf.

The argument is moot simply because both styles are great, they just fulfill slightly different needs. When it comes to "Code as Documentation" its hard to argue against Protobuf, a format that is much easier for devs from other languages to buy into. I've never had a developer come to work on a Protobuf API who, after being shown the .proto files, could not understand how to read or extend the definitions.

I hope that developers will give the format a try because I think it's the next level up from normal web application design. It's the start of understanding that for larger applications, different tools should be considered to help alleviate the pains of a (potentially) larger system and the needs of moving data from one place to another on the fly.

Ok, that's a pretty intriguing statement. What different tools should we be looking at (or developing) to work on larger systems and larger data sets?

BJ: Hopefully I don't get myself into too much hot water with the answer to this question (or go off on a large tangent), but here we go. Keep in mind also that this long-winded answer comes with a grain of salt, because every system will be designed to meet different goals. Therefore, there is no "one true way" as some would tout.

That being said, if you are looking to build a system for growth, there are certain concepts and technologies that should at least be considered from the outset. Service-oriented Architecture (SOA) is a way of designing a system for growth, to me it's the most natural way to begin with the journey in mind. For those new to SOA, a short primer: It involves creating smaller independent applications that are easier to write and maintain because they focus on smaller feature sets, while when roped together you can gain the benefit of all the systems working as a whole and ready to scale.

In this type of system we never want to share data between service applications directly, such as connecting from Service A to Service B's database to get user data. We share data by creating APIs for each service application (with protobuf of course :)), then publish those APIs for our other services to consume. If one application needs user data, it doesn't connect to the user database, it connects to the internal User service's API to gather the data. Naturally protobuf fits extremely well here, but REST/JSON or SOAP or (insert other transport protocol here) can obviously be used also.

Other "large systems" or so-called "enterprise" technologies that fit well into an SOA system are background jobs (queues) and various types of messaging systems.

Queueing is essential for the speed and scalability of a system as it offloads non-relevant (yet important) processing to seperate threads or processes. A simple example of how a queue can give you an increase in speed and usability of a system is sending an email when a user is created. The user generally doesn't care (or know) that you are sending him an email when their account is created, but they do care that if its taking 10 seconds. So rather than tie up the user's process just to send an email, you would queue that "job" for later (even if it's processed milliseconds later) and let the process return the result of the user creation. Workers in other threads or processes will pick up the email job and send the email for you.

The main queueing system we use is Github's excellent Resque coupled with my own little resque-remote plugin. Resque-remote gives us the ability to queue a job for another service to consume.

Messaging is such an enormous topic that I'm not sure I'm the one you want to describe its ins and outs. The short of it is that in certain contexts we've found that it can make more sense to use push-based data transfer rather than pull-based. Take the user creation example: when a user is created in my User Service Application, the user service doesn't know about any other systems that may be interested that a user was created, and frankly it shouldn't care. The User Service should only be responsible to post a message (to a message service or bus) that an event occurred in the system, in this case a user was created. Once the event is messaged, user service creation can go about its merry way. Other parts of the system may be listening to the message (event) bus for user creation events and their associated data, and they will receive the data as a push. This specific messaging paradigm is usually referred to as PubSub (Publish/Subscribe). As I've already mentioned, there are many many more types of messaging patterns that can be followed.

These are just a few of the systems we've put in place to manage data transfer complexity in our SOA ecosystem. There's also another branch for data warehousing such as ETL data transfer systems like Pentaho or Jasper. The possibilities are... well, you get the idea.

The coolest part about all of this is that you can use Ruby for 100% of these so-called enterprise situations. We do. You don't have to use Java or .NET to solve "Big Boy" problems. When I first started with Ruby, I wasn't entirely sure of this, but I certainly am now.


So, you've read along this far. What do you think? How are you using Protocol Buffers? Why did you choose to go down this route?

Saturday, March 26, 2011

Ruby and Protocol Buffers, Take One and a Half

In a comment on my previous post on Protocol Buffers, Clayton O'Neill recommended trying out the java protobuf library with jruby. I'll get to that eventually, but his comment made me wonder how jruby and rubinius would do with this little test.
I fired up rvm and looped through my installed versions. Here are the results:
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
real 3m11.857s
user 3m11.024s
sys 0m0.124s
jruby 1.5.5 (ruby 1.8.7 patchlevel 249) (2010-11-10 4bd4200) (Java HotSpot(TM) Client VM 1.6.0_24) [i386-java]
real 2m54.035s
user 2m53.355s
sys 0m0.388s
rubinius 1.1.1 (1.8.7 release 2010-11-16 JI) [i686-pc-linux-gnu]
real 1m59.693s
user 2m5.292s
sys 0m0.148s
ruby 1.9.1p378 (2010-01-10 revision 26273) [i686-linux]
real 1m50.293s
user 1m49.811s
sys 0m0.092s
I certainly wouldn't choose a ruby implementation based on this alone, but it's good to see where things stand at the get go. As I keep going with this exploration, I'll try to keep posting timing results.

Update: Evan Phoenix (@evanphx) pointed out that I was using old versions of Rubinius and JRuby. Since I'm boxed into which Ruby implementation I use (1.8.7 on our boxes at work), I wasn't thinking about keeping things up to date in my RVM installation. I've updated JRuby and Rubinius and rerun the test. The results are as follows:
jruby 1.6.0 (ruby 1.8.7 patchlevel 330) (2011-03-15 f3b6154) (Java HotSpot(TM) Server VM 1.6.0_24) [linux-i386-java]
real 1m38.390s
user 1m42.914s
sys 0m0.508s
rubinius 1.2.4dev (1.8.7 536a6eb8 yyyy-mm-dd JI) [i686-pc-linux-gnu]
real 1m58.138s
user 2m1.492s
sys 0m0.144s

Thursday, March 24, 2011

Ruby and Protocol Buffers, Take One

At work, we're moving from XML to protocol buffers.  While we're mostly a Java shop, the operations/sysadmin team I'm on does a lot of Ruby. I was interested in how we might use the same technology for some of our stuff. After a bit of looking, I found two libraries that looked mature enough to investigate:

ruby-protobuf, by MATSUYAMA Kengo (@macks_jp), was straightforward to install and use.  It has a good online tutorial and the redme has all I needed to get started.
ruby-protocol-buffers, by Brian Palmer was also easy to install and use.  It seems a bit lacking in the online documentation, but does have some examples to follow.  (If Brian's name rings a bell, it might be because I interviewed him some time ago about winning a programming contest sponsored by Mozy's former incarnation.)
I started out with a very simple proto file:

package bench;

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
}

I compiled this with rprotoc for ruby-protobuf and with ruby-protoc for ruby-protocol-buffers. This generated the following (which I edited lightly).  For ruby-protof:

### Generated by rprotoc. DO NOT EDIT!
### <proto file: bench.proto>
# package bench;
#
# message Person {
#   required string name = 1;
#   required int32 id = 2;
#   optional string email = 3;
# }
require 'protobuf/message/message'
require 'protobuf/message/enum'
require 'protobuf/message/service'
require 'protobuf/message/extend'

module Bench1
  class Person1 < ::Protobuf::Message
    defined_in __FILE__
    required :string, :name, 1
    required :int32, :id, 2
    optional :string, :email, 3
  end
end
for ruby-protocol-buffer
#!/usr/bin/env ruby
# Generated by the protocol buffer compiler. DO NOT EDIT!

require 'protocol_buffers'

# Reload support
Object.__send__(:remove_const, :Bench2) if defined?(Bench2)

module Bench2
  # forward declarations
  class Person2 < ::ProtocolBuffers::Message; end

  class Person2 < ::ProtocolBuffers::Message
    required :string, :name, 1
    required :int32, :id, 2
    optional :string, :email, 3

    gen_methods! # new fields ignored after this point
  end
end

Then I pulled out the statistical benchmarking I wrote about a while ago (Since no one else has taken the bait, maybe I should bundle up a gem for that.).  Instead of quoting the whole thing at you, here are the pertinent loops.  For ruby-protobuf:

msg = Bench1::Person1.new(:name => idx.to_s,
                          :id => idx,
                          :email => idx.to_s)
msg_str = msg.serialize_to_string
msg == msg.parse_from_string(msg_str)

for ruby-protocol-buffer:

msg = Bench2::Person2.new(:name => idx.to_s,
                          :id => idx,
                          :email => idx.to_s)
 msg == Bench2::Person2.parse(msg.to_s)
And here are the results:

$ rvm 1.9.2
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
$ time ruby ProtoBufBench
testing ruby-protobuff against ruby-protocol-buffer
The deviation in the deltas was 0.021731
The mean delta was 0.198301
max = 0.241761921640842 :: min = 0.154839326146633
ruby-protocol-buffer was better

real 1m50.672s
user 1m50.599s
sys 0m0.092s

$ rvm 1.8.7
$ ruby -v
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
$ time ruby ProtoBufBench
testing ruby-protobuff against ruby-protocol-buffer
The deviation in the deltas was 0.009414
The mean delta was -2.205984
max = -2.18715483485056 :: min = -2.22481263341116
There's no statistical difference

real 3m8.131s
user 3m7.996s
sys 0m0.056s

I didn't try compiling the c extension for ruby-protocol-buffers, and I haven't tried any more involved .proto files yet.  I'll work on those in the next couple of days and post results as I see them.