Today on ruby-talk, I caught a thread called "For Performance, Write it in C". I'd like to present a slight amendment to that thesis. Let me explain.
It all started with an email at work. Someone was passing around a bunch of prime number printers in various languages (C, Java, C#, Perl, and Python). They all used the same (ugly) algorithm, and were supposed to show just how 'performant the languages were. Since I'm the local Ruby evangelist, I was asked to write a Ruby version. Here's what I came up with (warning, ugly algorithm ahead):
#!/usr/bin/ruby
for num in 1..10_000 do
is_prime = 1
for x in 2..(num - 1) do
if (num % x == 0)
is_prime = x
break
end
end
if is_prime == 1
puts "#{num} is a prime number"
else
puts "#{num} equals #{is_prime} * #{num/is_prime}"
end
end
How fast is it? Well, time says:
$ time ./primes.rb > /dev/null
real 0m2.905s
user 0m2.716s
sys 0m0.004s
$
Certainly, nothing to write home about, but not too far from Perl or Python either.
Wanting to improve it, and not being able to touch the algorithm (we want to be comparing apples to quinces after all, not apples to oranges). I know my only hope is to find the bottleneck(s) and rewrite it (them?) in C. My first step is to grab Ruby's profiler and see what it says (oh, by the way, I reduced the value of num to 100 so that this would complete in my lifetime ... the profiler is slow).
$ ruby -rprofile primes.rb > /dev/null
% cumulative self self total
time seconds seconds calls ms/call ms/call name
63.64 0.14 0.14 101 1.39 3.56 Range#each
13.64 0.17 0.03 1133 0.03 0.03 Fixnum#%
9.09 0.19 0.02 1233 0.02 0.02 Fixnum#==
4.55 0.20 0.01 100 0.10 0.20 Kernel.puts
4.55 0.21 0.01 200 0.05 0.05 IO#write
4.55 0.22 0.01 248 0.04 0.04 Fixnum#to_s
0.00 0.22 0.00 74 0.00 0.00 Fixnum#/
0.00 0.22 0.00 100 0.00 0.00 Fixnum#-
0.00 0.22 0.00 1 0.00 220.00 #toplevel
$
Which tells me that most of my time is spent in each (well, it's actually spent in the block I sent to each. It's taking a whopping 1.39 msec per call, compared to .0X msec for everything else. What would happen if I rewrote just that block?
Enter RubyInline (A great tool written by zenspider and Eric Hodel). I'm not a C wiz by any stretch of the imagination, but this stuff is pretty easy to bang out. My new code looks like this:
#!/usr/bin/ruby
require "rubygems"
require "inline"
class Primes
inline do |builder|
builder.c '
int prime(int num) {
int x;
for (x = 2; x < (num - 1) ; x++) {
if (num == 2) {
return 1;
}
if (num % x == 0) {
return x;
}
}
return 1;
}'
end
end
p = Primes.new
for num in 2..10_000 do
is_prime = p.prime(num)
if is_prime == 1
puts "#{num} is a prime number"
else
puts "#{num} equals #{is_prime} * #{num/is_prime}"
end
end
Not too ugly. At least I can still see what's happening. The main loop being in Ruby helps a lot too (especially if this were a much bigger program). How much of a difference does it make? time says:
$ time ./cprimes.rb > /dev/null
real 0m0.328s
user 0m0.288s
sys 0m0.020s
An order of magnitude improvement. Not too shabby.
What's the lesson here? Optimize what you need to (and only what you need to), profile find out what that is (it may be slow, but profiling is your friend), and use the right tools (rewriting a bit of code with RubyInline is way better than rewriting the whole app in C).
UPDATE
I guess I should include a link to RubyInline.
21 comments:
Very nice demonstration of rubyinline. Thanks for writing it. And for profiling you might want to use ruby-prof next time. gem install ruby-prof. It is a much faster profiler.
Hi Ezra,
you're right ruby-prof (and ZenProfile) are great options to profile faster. I was hoping not to stick with the standard library (other than RubyInline) -- and I didn't bother to include a link to that one.
What would happen if you passed a Bignum to that function?
(btw. the missing Math.sqrt hurts; *that* would be quite an improvement };-)
RubyInline works great for this kind of thing, but it's not for all problems. For example, as one commenter noted, you aren't really using Ruby types anymore. You're using C primitives. If you had used a different numeric type like Bignum, it would not work correctly. If you had overridden Fixnum#+ to increment your bank account on each call, that would no longer happen. RubyInline is great for embedding C code directly into your Ruby app, but that's all it really is...C code embedded in a Ruby app. Useful for some things, but not useful for others.
For what it's worth we'll probably provide some sort of RubyInline functionality in JRuby, so you can have all the joy of embedding whereever you go.
One thing I miss in all these RubyInline examples is how to access a Ruby Array.
It's pretty common that the slow part of my program is hitting 10000+ elements in an array; and pretty rare that it's simply working with 2 integers.
Any examples like that out there?
mfp,
Charles sort of answered your question. In the case of the C I've written it will blow up (I took the easy way out). On the other hand, I can access Ruby objects through C and that should let me "do the right thing". I'll see what I can come up with later today.
Charles,
JRubyInline will be interesting. I assume it would only be possible to inline Java. Would that be right?
Anonymous,
I'll see what I can do about putting together an array example for you.
Is the "inlined" C code in the example compiled to binary and somehow linked every time the ruby script is ran? Inlining C code in this way could present a higher performance hit if the C code is larger?
Just wondering, is this still classed as a Ruby version of the the prime number printer or a C version? I know, if I worry about this type of thing then clearly I have too much time on my hands ;-)
charles: you can always call through to ruby methods and use VALUES. the real cost of ruby is method dispatch.
anonymous1: read the pickaxe or sift through ruby.h and intern.h.
anonymous2: It is compiled and linked the first time (and on subsequent changes to the C code). The cost of the gcc toolchain is negligible for plain C. I've never seen a case where inlining code took longer than the original.
anonymous3: who cares? the point is to get something done and make it fast enough for whatever requirements you have. what label you put on that is up to you. I label it pragmatic.
re C/Ruby, the original request was to write a Ruby version. This is a C version inside a Ruby wrapper. Interesting but maybe not a valid comparison against prime printers in other languages. That's assuming the point of the exercise was to see how it would be written in Ruby and/or for speed comparison.
re: C/Ruby
Anonymous, you kind of missed the point I was trying to make (which probably means I wasn't clear enough). This is a simple case (though real) where RubyInline allowed me to take a Ruby application that was too slow and speed it up by an order of magnitude.
Imagine a much larger Ruby application (thousands of lines, lots of classes, and many methods). I could do the same thing. Profile it to find the one or two methods that are just too slow, try to optimize them algorithmicly, and finally rewrite them in C with RubyInline.
At that point I might have 1500 lines of Ruby and 30 of C. I'd say that's still a Ruby application, wouldn't you?
you should mention that the c code gets even faster if you run it a second time. On my machine it ran only a third of the first time run. I guessthe reason is that rubyinline needs time to compile the code on the first run.
One problem that I'm trying to wrap my brain around with Ruby Inline is this: what if your algorithms work with ruby data structures? Is it possible to use RubyInline to access Ruby objects? How would you use RubyInline for a tree traversal type of an algorithm, for example?
Cayblood,
You might want to look at:
RubyInline, Going a Bit Further for an example of accessing Ruby Objects.
So you're saying that neither you nor your coworkers knew that you only need to test up to floor(sqrt(n)) for factors?
n-1 is useless to test as a factor because the gcd of n & n-1 can only be 1. Same thing is true with n-2. It means that 2 is a factor. (and is really only true of 4, anyway.) But that will be found out by testing 2.
Since no test could disprove prime for less than 2, the maximum number that could ever need to be tested is n/2. But if n/2 is an integer, than testing for 2 finds that out. Same thing for n/3, n/4, ....
But if 4 succeeds, then 2 should have, so you never need to test 4 or 6, or 8, or 9...just the primes.
It's just a warning about naive algorithms slowing things down as well.
anonymous, I know, and they knew that you only have to test through the square root of the prime candidate.
I don't know where they found the initial set of programs, but that's the way they were written. Since that's the algorithm I was handed, it's the one I was stuck using. Writing a faster version in pure Ruby wouldn't have been a reasonable comparison.
What were the results of the test in each of the other languages?
Your algorithm of finding prime numbers in ruby can be optimized by caching the prime numbers.
Here's an example, it took 3.06sec for 100K.
class MyPrimes
def initialize
@primes = [2, 3, 5]
@is_prime = false
end
def add(n)
@primes << n
end
def build(n)
n.times { |x|
next if (x == 0 or x == 1)
@is_prime = true
@primes.each { |y|
if x%y == 0 then
@is_prime = false
break
end
}
@primes << x if @is_prime
}
end
end
p = MyPrimes.new
t1 = Time.new
p.build(100000)
t2 = Time.new() -t1
puts "#{t2.to_s}"
Ekkaladevi, yes. As mentioned in comments above and in the article, I was using the same (broken) algorithm that the non-ruby implementations used.
This program made for a nice way to show off rubyinline.
The example was just perfect for showing the power of using Rubyinline. Great blog!!
Hi I want to use RubyInline for making http requests in C but I am not able to see any solution anywhere
Post a Comment