Thursday, January 31, 2008

The Rubinius Debugger

When I read the posting about Cylon, the Ruby debugger for Visual Studio by SapphireSteel, I immediately thought about the Rubinius debugger highlighted in InfoQ recently. I decided to give it a try and see what I came up with. I wasn’t able to replicate Dermot’s tests exactly (I don’t know where he set his breakpoints, and Rubinius doesn’t support everything he tested), but here are my initial results:

Line Based
Rubinius no debugger Rubinius w/debugger no breakpoints Ruby
2.039 seconds 2.399 seconds 0.898 seconds
Call Based
Rubinius no debugger Rubinius w/debugger no breakpoints Ruby
13.441 second 13.999 seconds 4.171 seconds

Rubinius still has a long performance road ahead of them in terms of general execution, but it’s pretty exciting to see that their debugger is so fast. The cylon debugger (which smoked the other options in their testing) was 25% slower for line based debugging, and 306% slower for call based. By comparison, Rubinius’ debugger only added 17% for line based and 4% for call based (actually, I’m guessing that had the line based run through more iterations, the Rubinius debugger would have done better, I think most of its time is spent in set up).

Using the Rubinius debugger requires that you add a call to the debugger to the program:

debugger

def fac(n)
  lvar = n
  n == 1 ? 1 : n * fac(n-1)
  return n
end

count = 0

tstart = Time.new
0.upto(1000000) {fac(5)}               # 1ine based
#0.upto(30000000) {count += 1}         # call based
tend = Time.new  
puts "%10.3f" % tstart
puts "%10.3f" % tend.to_f
diff = tend - tstart
puts "%10.3f" % diff.to_f

When you run this, you’ll get something like this:

$ ../shotgun/rubinius debug_fac.rb
[Debugger activated]

debug_fac.rb:1 [IP:5]

rbx:debug> c
[Resuming program]
1201801369.694
1201801372.001
     2.307

There are several commands available:

command explanation
h get a listing of commands
b list breakpoints
b set a breakpoint at the start of the method
n Step to the next line
ni step to the next VM instruction
c continue execution until the next breakpoint
l list source code around the current breakpoint
d decode VM bytecode around the current breakpoint
v display local variables and their values
vs display the contents of the VM stack

Some of these are pretty cool, take a look at these examples:

rbx:debug> vs
     VM stack [0-5]:
  sp => 0: TrueClass  true
  fp => 1: Class      Rubinius::VM
        2: String     "" 
        3: String     "debug_fac.rb" 
        4: String     "debug_fac.rbc" 
        5: NilClass   nil
           ...

or:

rbx:debug> d
   Bytecode instructions [5-25] in compiled method script:
           ...
           # line 1
     0005: pop
           # line 3
     0006: push_literal  #
     0008: push_literal  :fac
     0010: push_self
     0011: send_stack    :add_method, 2
     0014: pop
           # line 9
  => 0015: meta_push_0
     0016: set_local     :count
     0018: pop
           # line 11
     0019: push_const    :Time
     0021: send_method   :new
     0023: set_local     :tstart
     0025: pop
           ...

which compares to:

rbx:debug> l
   Source lines [1-18] in debug_fac.rb:
      1: debugger
      2:
      3: def fac(n)
      4:   lvar = n
      5:   n == 1 ? 1 : n * fac(n-1)
      6:   return n
      7: end
      8:
  =>  9: count = 0
     10:
     11: tstart = Time.new
     12: 0.upto(1000000) {fac(5)}               # $-1òølineòù based
     13: #0.upto(30000000) {count += 1}     #$-1òùcallòù based
     14: tend = Time.new
     15: puts "%10.3f" % tstart
     16: puts "%10.3f" % tend.to_f
     17: diff = tend - tstart
     18: puts "%10.3f" % diff.to_f

Like a lot of things in Rubinius, the debugger isn’t quite ready for primetime but it sure shows a lot of promise.

Wednesday, January 30, 2008

MWRC Attendees Interview Josh Susser

Being involved in organizing and putting on a regional Ruby conference means that you get to peek behind the curtain and see some pretty interesting things. One of the things that stands out to me so far is the distance people are willing to travel to come to our little conference. So far, we have attendees from California, Florida, Missouri, New York, Oregon, Virginia, and Washington.

Among the various attendees, are some pretty interesting names as well. Josh Susser’s name caught my eye. Since he’s both well known among the Ruby and Rails communities and coming in from outside the Mountain West, I asked him a few questions about regional Conferences and the MountainWest RubyConf. Here’s what he had to say.


Why are you planning on coming to MountainWest RubyConf this year?

I heard a lot of good things about MWRC last year and was sorry I missed it – especially the implementors summit. I want to check out a couple regional confs this year, and MWRC looks like one of the better ones.

Which of the talks are you most looking forward to?

The Ruby Internals, DataMapper and CouchDB talks.

What will make this conference worth your time and money?

I am not a werewolf. No really… learning something new, making connections, and of course the sexy t-shirts.

What value to you see regional Ruby conferences providing?

There’s only so much room in the program at RubyConf and even RailsConf, so the regionals give other venues to present stuff. Also, spreading things out over the calendar means that you get to see new things throughout the year. Beyond that, they are valuable local residents to define and energize their regional Ruby communities.

What do you see as the main differences between an regional Ruby conference and RubyConf?

The big conferences have a lot to choose from, but sometimes that’s a drawback. I like a nice, intimate, single-track conference so I don’t feel like I’m missing anything. Also, I get to talk to more people. Yes, that’s how it works – less is more.

Tuesday, January 29, 2008

Ruby Concurrency with Actors

Tony Arcieri (read an interview with Tony here) recently posted an announcement about his new Revactor library, which provides actor style concurrency built atop Ruby 1.9’s new Fiber technology. Since Fibers and Actors haven’t been a part of the Ruby lexicon, I wanted to put some information together to help myself (and anyone else who wants to crib off my notes) get up to speed. Update! You might also want to check out my new Questions Five Ways on Concurrency post.

To start with, I figured it was best to with the basics. Quoting from Revactor’s Philsophy page

The basic operation of an Actor is easy to understand: like a thread, it runs concurrently with other Actors. However, unlike threads it is not pre-emptable. Instead, each Actor has a mailbox and can call a routine named ‘receive’ to check its mailbox for new messages. The ‘receive’ routine takes a filter, and if no messages in an Actor’s mailbox matches the filter, the Actor sleeps until it receives new messages, at which time it’s rescheduled for execution.

Well, that’s a bit of a naive description. In reality the important part about Actors is that they cannot mutate shared state simultaneously. That means there are no race conditions or deadlocks because there are no mutexes, conditions, and semaphores, only messages and mailboxes.

Ok, that’s not so bad. How well can this work in Ruby? I thought I’d go to the horses mouth for this one. Joe Armstrong, creator of the Erlang language which is built on the Actor model had this to say:

Difficult to say – we always said that performance of a concurrent system depends upon three critical times

  1. process spawning
  2. context switching
  3. message passing times If you can make these fast it might work.

The main problem with concurrency is isolating processes from each other one process must not be able to corrupt another process -

Even though the message-passing process spawning etc. seems simple there is a lot of junk going on in the background that you are not aware of – implementing this is of the order of complexity of implementing an operating system – i.e., there is processes scheduling, memory management garbage collection etc.

It’s rather easy to add a layer to an existing system that can give you a few thousand of processes but as you get to the hundreds of thousands things get tricky – the overheads per process must be small and so on.

This is tricky stuff – I’m sure you can make it work – but making it fast needs a lot of thinking …

MenTaLguY has spent a lot of time working with Ruby concurrency. He also weighed in Actors place in Ruby:

I think in the long-term actors are likely to become the major means of distributed concurrent programming in Ruby. At the very least I think distributed actors are likely to displace DRb for the things it is used for today.

I don’t know that actors will necessarily dominate for more “local” problems, however. I think the actor model will face competition from e.g. the join calculus, and other approaches like concurrent logic programming which can offer more natural solutions to some problems. Transactional memory will also have a place, although after writing several STM implementations I am not the fan of it that I used to be.

Paul Brannon: (a long time Rubyist and all-around good guy) told me that he thinks Actor implementations are important for Ruby:

I think the industry is about to make a shift toward erlang (actor)-style concurrency, because it makes true concurrency transparent and easy for the user. Home computers are being shipped with more and more cores these days, and pretty soon, taking full advantage of the hardware available will necessarily imply concurrent programming.

I also asked around for recommendations about good resources for learning about Actor based concurrency (and concurrency in general). People were unanimous in suggesting that interested programmers spend some time with Erlang to get a handle on it. The book Programming Erlang also got high marks. Ola Bini also recommended Java Concurrency in Practice for those with a Java bent/background.

However you choose to do it, I recommend spending some time with Actors, it looks like a good way to get more out of your programming. Let me just go back to Revactor’s Philosophy page for a second:

Actors rule. You really should use them in your programs. Especially if your programs do a lot of stuff at once. Seriously, whatever you’re doing besides Actors, it probably sucks. Actors are this awesome panacea that will make it all better, I swear. In conclusion: use them, do it!

There, are you convinced now?

Interview With Revactor Developer Tony Arcieri

With the recent release of his Revactor library, I wanted to talk with Tony Arcieri about Ruby, Actors, and Revactor. He was kind enough to sit down for a short interview. Here’s what we talked about.


How did you get started with Ruby?

Tony Ruby is a language some roommates of mine were using for years and kept raving to me about. Unfortunately, I was a performance-obsessed C programmer and couldn’t really get past the whole “Ruby is slow” stigma. Then in early 2005 Rails started generating a lot of buzz, and I got sucked into using Ruby for web development. A few years later I can look back wondering how I could stand programming in C for so long.

Revactor is an implementation of Actors for Ruby 1.9. Is there a reason you targetted 1.9 instead of Rubinius (with tasks) or another implementation?

Tony Ruby 1.8 already supports Actors with the Omnibus Concurrency Library and Rubinius supports them in its standard library. I’m not aware of an Actor model implementation for JRuby but it’d be pretty easy to do with a Scala-like thread pool. I chose Ruby 1.9 because I felt that, for the time being, it’s the most practical and performant platform for writing network applications with the Actor model. Revactor is built on a number of Ruby 1.9-specific features, specifically Fibers which provide the underlying concurrency primitive. However, Revactor is also built on top of an event library called Rev whose feature set was tailored for implementing high performance networking within the Actor model (although it can be used as a general purpose event library if you so desire). Ruby 1.9 contains several features which made writing this event library quick and easy with minimal C code. These include things like support for blocking system calls and non-blocking I/O.

However, I definitely feel that down the road Rubinius will be much better suited. Rubinius already supports multiple shared-nothing virtual machines which each run in their own hardware thread and can communicate over an internal message bus. Using that in conjunction with Actors, you can do scatter/gather distributed programming (MapReduce is probably the most famous example of this) which can run a copy of a job on each VM (and thus on its own CPU core) then reduce the results to the final output. With this approach, your program runs N times faster on N CPUs.

What do you think about some of the other approaches to concurrency? (See MenTaLguY’s page for example.)

Tony Many of the techniques there can go hand in hand with Actors (futures, for example). As far as non-Actor approaches, my favorite is probably join calculus as seen in languages like JoCaml.

MenTaLguY has long been involved in concurrency in Ruby. I see that you’re using his Case gem in Revactor. What other influence has he had on Revactor?

Tony MenTaLguY has been very helpful in smoothing out the API design and will hopefully be making Revactor thread safe in the near future. He’s pointed out solutions to problems which, in retrospect, were pretty obvious but I just didn’t see at the time. We’re trying to put together something of a standard Actor API and protocol such that a program written using Actors in Ruby isn’t tied to a particular implementation and can run on Omnibus, Rubinius, or Revactor. We’ll also hopefully be putting out a cross-compatible gem which bundles up a lot of the standard Actor functionality so there aren’t 3 different implementations of the same thing floating around.

You’ve got a great introduction to Actors up at your Philosophy page, but it’s a little light on code. Could you give us an example of Revactor at work?

Tony There’s a number of code examples available on http://doc.revactor.org which go a bit more in depth as to how Actors send and receive messages, but here’s an example of an echo server:


# An example echo server, written using Revactor::TCP
# This implementation creates a new actor for each
# incoming connection.

require 'revactor'

HOST = 'localhost'
PORT = 4321

# Before we can begin using actors we have to call Actor.start
# Future versions of Revactor will hopefully eliminate this
Actor.start do

# Create a new listener socket on the given host and port
  listener = Revactor::TCP.listen(HOST, PORT)
  puts "Listening on #{HOST}:#{PORT}"

  # Begin receiving connections
  loop do

    # Accept an incoming connection and start a new Actor
    # to handle it
    Actor.spawn(listener.accept) do |sock|
      puts "#{sock.remote_addr}:#{sock.remote_port} connected"

      # Begin echoing received data
      loop do
        begin
          # Write everything we read
          sock.write sock.read
          
        rescue EOFError
          puts "#{sock.remote_addr}:#{sock.remote_port} disconnected" 
        end
        
        # Break (and exit the current actor) if the connection
        # is closed, just like with a normal Ruby socket
        break
      end
    end
  end
end

This doesn’t demonstrate inter-Actor messaging (although it’s doing it behind the scenes). However, what you do see is that there’s very little disconnect between using Revactor and writing a traditional threaded network server. If you’ve written programs in the past using Thread and Queue, then moving over to Revactor will be easy, and you’ll find Actor mailboxes to be a much more powerful way of processing messages.

Are there any books, blogs, or websites you’d recommend for learning more about concurrency in general or actors in particular.

Tony Programming Erlang by language creator Joe Armstrong was immensely helpful in understanding Actor-based concurrency, and many of the ideas in Revactor are drawn directly from Erlang. Some of the Erlang portal sites such as planeterlang.org also cover concurrent programming in general, particularly with Actors.

Friday, January 25, 2008

MWRC mini-interview: Patrick Farley

Here’s another mini-interview to whet your MWRC appetite. I asked Patrick Farley of ThoughtWorks about his talk and which MWRC sessions he’s excited about. With speakers like this, how could you not register—$100 bucks for two days of Ruby awesomeness is an incredible deal.


Your session is entitled “Ruby Internals”, why should people be excited to come see it?

Programming in Ruby leads to a lot of head scratcher moments? “Called id for nil, which would mistakenly be 4” anyone? In my experience, getting past the head scratching and into a nuanced understanding of what’s what in Ruby is a lot easier than one might think. The big barrier is, of course, C. The thing to keep in mind, is that you don’t have to be Brian Kernighan to read and understand some basic C code, particularly when you know the domain well. The domain of Matz’s Ruby Interpreter (MRI) is the Ruby language itself, so Ruby programmer are, by definition, domain experts. The goal of my talk isn’t to turn the audience into overnight Ruby internals experts. As the books will tell you, becoming a guru in any technology takes exactly 21 days. Instead, I aim to give an advanced introduction to some key areas of Ruby internals and at the same time equip folks to do additional exploration on their own.

What’s your Ruby/RoR background?

I’ve been using Ruby since 2005, and working full time on enterprise Ruby and RoR development for close to two years now. I’m lucky enough to work for ThoughtWorks where I’ve helped to put some fairly large Ruby projects into production.

Which session are you most looking forward to seeing?

I’m jazzed about a few of them. Philippe Hanrigou is a colleague and good friend of mine, and his shortcut for Addison Wesley, “Troubleshooting Ruby Processes”, is a fantastic resource so I’m of course looking forward to his talk, “What To Do when Mongrel Stops Responding to Your Requests and Ruby Doesn’t Want to Tell You About It”. Devlin Daley’s “Enough Statistics so that Zed won’t yell at you” also sounds great. I’ve been meaning to dig into the R language for a while, so I’m hoping this will be just the motivation I need. I’m also a bit of a closet DBA, so Jan Lehnardt’s “Next Generation Data Storage with CouchDB” sounds like important stuff that I’m looking forward to hearing about.