Wednesday, July 26, 2006

RubyInline, Making Making Things Faster Easier

Today on ruby-talk, I caught a thread called "For Performance, Write it in C". I'd like to present a slight amendment to that thesis. Let me explain.

It all started with an email at work. Someone was passing around a bunch of prime number printers in various languages (C, Java, C#, Perl, and Python). They all used the same (ugly) algorithm, and were supposed to show just how 'performant the languages were. Since I'm the local Ruby evangelist, I was asked to write a Ruby version. Here's what I came up with (warning, ugly algorithm ahead):


#!/usr/bin/ruby


for num in 1..10_000 do
  is_prime = 1
  for x in 2..(num - 1) do
    if (num % x == 0) 
      is_prime = x
      break
    end
  end
  if is_prime == 1
    puts "#{num} is a prime number"
  else
    puts "#{num} equals #{is_prime} * #{num/is_prime}"
  end
end

How fast is it? Well, time says:


$ time ./primes.rb > /dev/null

real    0m2.905s
user    0m2.716s
sys     0m0.004s
$

Certainly, nothing to write home about, but not too far from Perl or Python either.

Wanting to improve it, and not being able to touch the algorithm (we want to be comparing apples to quinces after all, not apples to oranges). I know my only hope is to find the bottleneck(s) and rewrite it (them?) in C. My first step is to grab Ruby's profiler and see what it says (oh, by the way, I reduced the value of num to 100 so that this would complete in my lifetime ... the profiler is slow).


$ ruby -rprofile primes.rb > /dev/null
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 63.64     0.14      0.14      101     1.39     3.56  Range#each
 13.64     0.17      0.03     1133     0.03     0.03  Fixnum#%
  9.09     0.19      0.02     1233     0.02     0.02  Fixnum#==
  4.55     0.20      0.01      100     0.10     0.20  Kernel.puts
  4.55     0.21      0.01      200     0.05     0.05  IO#write
  4.55     0.22      0.01      248     0.04     0.04  Fixnum#to_s
  0.00     0.22      0.00       74     0.00     0.00  Fixnum#/
  0.00     0.22      0.00      100     0.00     0.00  Fixnum#-
  0.00     0.22      0.00        1     0.00   220.00  #toplevel
$

Which tells me that most of my time is spent in each (well, it's actually spent in the block I sent to each. It's taking a whopping 1.39 msec per call, compared to .0X msec for everything else. What would happen if I rewrote just that block?

Enter RubyInline (A great tool written by zenspider and Eric Hodel). I'm not a C wiz by any stretch of the imagination, but this stuff is pretty easy to bang out. My new code looks like this:


#!/usr/bin/ruby

require "rubygems"
require "inline"

class Primes
  inline do |builder|
    builder.c '
    int prime(int num) {
      int x; 
      for (x = 2; x < (num - 1) ; x++) {
        if (num == 2) {
          return 1;
        }
        if (num % x == 0) {
          return x;
        }
      }
      return 1;
    }'
  end
end

p = Primes.new

for num in 2..10_000 do
  is_prime = p.prime(num)
  if is_prime == 1
    puts "#{num} is a prime number"
  else
    puts "#{num} equals #{is_prime} * #{num/is_prime}"
  end
end

Not too ugly. At least I can still see what's happening. The main loop being in Ruby helps a lot too (especially if this were a much bigger program). How much of a difference does it make? time says:


$ time ./cprimes.rb > /dev/null

real    0m0.328s
user    0m0.288s
sys     0m0.020s

An order of magnitude improvement. Not too shabby.

What's the lesson here? Optimize what you need to (and only what you need to), profile find out what that is (it may be slow, but profiling is your friend), and use the right tools (rewriting a bit of code with RubyInline is way better than rewriting the whole app in C).

UPDATE


I guess I should include a link to RubyInline.

21 comments:

Anonymous said...

Very nice demonstration of rubyinline. Thanks for writing it. And for profiling you might want to use ruby-prof next time. gem install ruby-prof. It is a much faster profiler.

gnupate said...

Hi Ezra,
you're right ruby-prof (and ZenProfile) are great options to profile faster. I was hoping not to stick with the standard library (other than RubyInline) -- and I didn't bother to include a link to that one.

Anonymous said...

What would happen if you passed a Bignum to that function?

(btw. the missing Math.sqrt hurts; *that* would be quite an improvement };-)

Charles Oliver Nutter said...

RubyInline works great for this kind of thing, but it's not for all problems. For example, as one commenter noted, you aren't really using Ruby types anymore. You're using C primitives. If you had used a different numeric type like Bignum, it would not work correctly. If you had overridden Fixnum#+ to increment your bank account on each call, that would no longer happen. RubyInline is great for embedding C code directly into your Ruby app, but that's all it really is...C code embedded in a Ruby app. Useful for some things, but not useful for others.

For what it's worth we'll probably provide some sort of RubyInline functionality in JRuby, so you can have all the joy of embedding whereever you go.

Anonymous said...

One thing I miss in all these RubyInline examples is how to access a Ruby Array.

It's pretty common that the slow part of my program is hitting 10000+ elements in an array; and pretty rare that it's simply working with 2 integers.

Any examples like that out there?

gnupate said...

mfp,
Charles sort of answered your question. In the case of the C I've written it will blow up (I took the easy way out). On the other hand, I can access Ruby objects through C and that should let me "do the right thing". I'll see what I can come up with later today.

Charles,
JRubyInline will be interesting. I assume it would only be possible to inline Java. Would that be right?

Anonymous,
I'll see what I can do about putting together an array example for you.

Anonymous said...

Is the "inlined" C code in the example compiled to binary and somehow linked every time the ruby script is ran? Inlining C code in this way could present a higher performance hit if the C code is larger?

Anonymous said...

Just wondering, is this still classed as a Ruby version of the the prime number printer or a C version? I know, if I worry about this type of thing then clearly I have too much time on my hands ;-)

Anonymous said...

charles: you can always call through to ruby methods and use VALUES. the real cost of ruby is method dispatch.

anonymous1: read the pickaxe or sift through ruby.h and intern.h.

anonymous2: It is compiled and linked the first time (and on subsequent changes to the C code). The cost of the gcc toolchain is negligible for plain C. I've never seen a case where inlining code took longer than the original.

anonymous3: who cares? the point is to get something done and make it fast enough for whatever requirements you have. what label you put on that is up to you. I label it pragmatic.

Anonymous said...

re C/Ruby, the original request was to write a Ruby version. This is a C version inside a Ruby wrapper. Interesting but maybe not a valid comparison against prime printers in other languages. That's assuming the point of the exercise was to see how it would be written in Ruby and/or for speed comparison.

gnupate said...

re: C/Ruby
Anonymous, you kind of missed the point I was trying to make (which probably means I wasn't clear enough). This is a simple case (though real) where RubyInline allowed me to take a Ruby application that was too slow and speed it up by an order of magnitude.

Imagine a much larger Ruby application (thousands of lines, lots of classes, and many methods). I could do the same thing. Profile it to find the one or two methods that are just too slow, try to optimize them algorithmicly, and finally rewrite them in C with RubyInline.

At that point I might have 1500 lines of Ruby and 30 of C. I'd say that's still a Ruby application, wouldn't you?

Anonymous said...

you should mention that the c code gets even faster if you run it a second time. On my machine it ran only a third of the first time run. I guessthe reason is that rubyinline needs time to compile the code on the first run.

Carl Youngblood said...

One problem that I'm trying to wrap my brain around with Ruby Inline is this: what if your algorithms work with ruby data structures? Is it possible to use RubyInline to access Ruby objects? How would you use RubyInline for a tree traversal type of an algorithm, for example?

gnupate said...

Cayblood,
You might want to look at:
RubyInline, Going a Bit Further for an example of accessing Ruby Objects.

Anonymous said...

So you're saying that neither you nor your coworkers knew that you only need to test up to floor(sqrt(n)) for factors?

n-1 is useless to test as a factor because the gcd of n & n-1 can only be 1. Same thing is true with n-2. It means that 2 is a factor. (and is really only true of 4, anyway.) But that will be found out by testing 2.

Since no test could disprove prime for less than 2, the maximum number that could ever need to be tested is n/2. But if n/2 is an integer, than testing for 2 finds that out. Same thing for n/3, n/4, ....

But if 4 succeeds, then 2 should have, so you never need to test 4 or 6, or 8, or 9...just the primes.

It's just a warning about naive algorithms slowing things down as well.

gnupate said...

anonymous, I know, and they knew that you only have to test through the square root of the prime candidate.

I don't know where they found the initial set of programs, but that's the way they were written. Since that's the algorithm I was handed, it's the one I was stuck using. Writing a faster version in pure Ruby wouldn't have been a reasonable comparison.

Brian said...

What were the results of the test in each of the other languages?

Ekkaladevi said...

Your algorithm of finding prime numbers in ruby can be optimized by caching the prime numbers.
Here's an example, it took 3.06sec for 100K.
class MyPrimes
def initialize
@primes = [2, 3, 5]
@is_prime = false
end

def add(n)
@primes << n
end

def build(n)

n.times { |x|
next if (x == 0 or x == 1)
@is_prime = true

@primes.each { |y|
if x%y == 0 then
@is_prime = false
break
end
}
@primes << x if @is_prime
}
end
end

p = MyPrimes.new
t1 = Time.new
p.build(100000)
t2 = Time.new() -t1
puts "#{t2.to_s}"

gnupate said...

Ekkaladevi, yes. As mentioned in comments above and in the article, I was using the same (broken) algorithm that the non-ruby implementations used.

This program made for a nice way to show off rubyinline.

Ekkaladevi said...

The example was just perfect for showing the power of using Rubyinline. Great blog!!

Anonymous said...

Hi I want to use RubyInline for making http requests in C but I am not able to see any solution anywhere