On Ruby: January 2008

Thursday, January 31, 2008

The Rubinius Debugger

When I read the posting about Cylon, the Ruby debugger for Visual Studio by SapphireSteel, I immediately thought about the Rubinius debugger highlighted in InfoQ recently. I decided to give it a try and see what I came up with. I wasn’t able to replicate Dermot’s tests exactly (I don’t know where he set his breakpoints, and Rubinius doesn’t support everything he tested), but here are my initial results:

Line Based

Rubinius no debugger	Rubinius w/debugger no breakpoints	Ruby
2.039 seconds	2.399 seconds	0.898 seconds

Call Based

Rubinius no debugger	Rubinius w/debugger no breakpoints	Ruby
13.441 second	13.999 seconds	4.171 seconds

Rubinius still has a long performance road ahead of them in terms of general execution, but it’s pretty exciting to see that their debugger is so fast. The cylon debugger (which smoked the other options in their testing) was 25% slower for line based debugging, and 306% slower for call based. By comparison, Rubinius’ debugger only added 17% for line based and 4% for call based (actually, I’m guessing that had the line based run through more iterations, the Rubinius debugger would have done better, I think most of its time is spent in set up).

Using the Rubinius debugger requires that you add a call to the debugger to the program:

debugger

def fac(n)
  lvar = n
  n == 1 ? 1 : n * fac(n-1)
  return n
end

count = 0

tstart = Time.new
0.upto(1000000) {fac(5)}               # 1ine based
#0.upto(30000000) {count += 1}         # call based
tend = Time.new  
puts "%10.3f" % tstart
puts "%10.3f" % tend.to_f
diff = tend - tstart
puts "%10.3f" % diff.to_f

When you run this, you’ll get something like this:

$ ../shotgun/rubinius debug_fac.rb
[Debugger activated]

debug_fac.rb:1 [IP:5]

rbx:debug> c
[Resuming program]
1201801369.694
1201801372.001
     2.307

There are several commands available:

command	explanation
h	get a listing of commands
b	list breakpoints
b	set a breakpoint at the start of the method
n	Step to the next line
ni	step to the next VM instruction
c	continue execution until the next breakpoint
l	list source code around the current breakpoint
d	decode VM bytecode around the current breakpoint
v	display local variables and their values
vs	display the contents of the VM stack

Some of these are pretty cool, take a look at these examples:

rbx:debug> vs
     VM stack [0-5]:
  sp => 0: TrueClass  true
  fp => 1: Class      Rubinius::VM
        2: String     "" 
        3: String     "debug_fac.rb" 
        4: String     "debug_fac.rbc" 
        5: NilClass   nil
           ...

or:

rbx:debug> d
   Bytecode instructions [5-25] in compiled method script:
           ...
           # line 1
     0005: pop
           # line 3
     0006: push_literal  #
     0008: push_literal  :fac
     0010: push_self
     0011: send_stack    :add_method, 2
     0014: pop
           # line 9
  => 0015: meta_push_0
     0016: set_local     :count
     0018: pop
           # line 11
     0019: push_const    :Time
     0021: send_method   :new
     0023: set_local     :tstart
     0025: pop
           ...

which compares to:

rbx:debug> l
   Source lines [1-18] in debug_fac.rb:
      1: debugger
      2:
      3: def fac(n)
      4:   lvar = n
      5:   n == 1 ? 1 : n * fac(n-1)
      6:   return n
      7: end
      8:
  =>  9: count = 0
     10:
     11: tstart = Time.new
     12: 0.upto(1000000) {fac(5)}               # $-1òølineòù based
     13: #0.upto(30000000) {count += 1}     #$-1òùcallòù based
     14: tend = Time.new
     15: puts "%10.3f" % tstart
     16: puts "%10.3f" % tend.to_f
     17: diff = tend - tstart
     18: puts "%10.3f" % diff.to_f

Like a lot of things in Rubinius, the debugger isn’t quite ready for primetime but it sure shows a lot of promise.

Wednesday, January 30, 2008

MWRC Attendees Interview Josh Susser

Being involved in organizing and putting on a regional Ruby conference means that you get to peek behind the curtain and see some pretty interesting things. One of the things that stands out to me so far is the distance people are willing to travel to come to our little conference. So far, we have attendees from California, Florida, Missouri, New York, Oregon, Virginia, and Washington.

Among the various attendees, are some pretty interesting names as well. Josh Susser’s name caught my eye. Since he’s both well known among the Ruby and Rails communities and coming in from outside the Mountain West, I asked him a few questions about regional Conferences and the MountainWest RubyConf. Here’s what he had to say.

Why are you planning on coming to MountainWest RubyConf this year?

I heard a lot of good things about MWRC last year and was sorry I missed it – especially the implementors summit. I want to check out a couple regional confs this year, and MWRC looks like one of the better ones.

Which of the talks are you most looking forward to?

The Ruby Internals, DataMapper and CouchDB talks.

What will make this conference worth your time and money?

I am not a werewolf. No really… learning something new, making connections, and of course the sexy t-shirts.

What value to you see regional Ruby conferences providing?

There’s only so much room in the program at RubyConf and even RailsConf, so the regionals give other venues to present stuff. Also, spreading things out over the calendar means that you get to see new things throughout the year. Beyond that, they are valuable local residents to define and energize their regional Ruby communities.

What do you see as the main differences between an regional Ruby conference and RubyConf?

The big conferences have a lot to choose from, but sometimes that’s a drawback. I like a nice, intimate, single-track conference so I don’t feel like I’m missing anything. Also, I get to talk to more people. Yes, that’s how it works – less is more.

Tuesday, January 29, 2008

Ruby Concurrency with Actors

Tony Arcieri (read an interview with Tony here) recently posted an announcement about his new Revactor library, which provides actor style concurrency built atop Ruby 1.9’s new Fiber technology. Since Fibers and Actors haven’t been a part of the Ruby lexicon, I wanted to put some information together to help myself (and anyone else who wants to crib off my notes) get up to speed. Update! You might also want to check out my new Questions Five Ways on Concurrency post.

To start with, I figured it was best to with the basics. Quoting from Revactor’s Philsophy page

The basic operation of an Actor is easy to understand: like a thread, it runs concurrently with other Actors. However, unlike threads it is not pre-emptable. Instead, each Actor has a mailbox and can call a routine named ‘receive’ to check its mailbox for new messages. The ‘receive’ routine takes a filter, and if no messages in an Actor’s mailbox matches the filter, the Actor sleeps until it receives new messages, at which time it’s rescheduled for execution.

Well, that’s a bit of a naive description. In reality the important part about Actors is that they cannot mutate shared state simultaneously. That means there are no race conditions or deadlocks because there are no mutexes, conditions, and semaphores, only messages and mailboxes.

Ok, that’s not so bad. How well can this work in Ruby? I thought I’d go to the horses mouth for this one. Joe Armstrong, creator of the Erlang language which is built on the Actor model had this to say:

Difficult to say – we always said that performance of a concurrent system depends upon three critical times

process spawning

context switching

message passing times If you can make these fast it might work.

The main problem with concurrency is isolating processes from each other one process must not be able to corrupt another process -

Even though the message-passing process spawning etc. seems simple there is a lot of junk going on in the background that you are not aware of – implementing this is of the order of complexity of implementing an operating system – i.e., there is processes scheduling, memory management garbage collection etc.

It’s rather easy to add a layer to an existing system that can give you a few thousand of processes but as you get to the hundreds of thousands things get tricky – the overheads per process must be small and so on.

This is tricky stuff – I’m sure you can make it work – but making it fast needs a lot of thinking …

MenTaLguY has spent a lot of time working with Ruby concurrency. He also weighed in Actors place in Ruby:

I think in the long-term actors are likely to become the major means of distributed concurrent programming in Ruby. At the very least I think distributed actors are likely to displace DRb for the things it is used for today.

I don’t know that actors will necessarily dominate for more “local” problems, however. I think the actor model will face competition from e.g. the join calculus, and other approaches like concurrent logic programming which can offer more natural solutions to some problems. Transactional memory will also have a place, although after writing several STM implementations I am not the fan of it that I used to be.

Paul Brannon: (a long time Rubyist and all-around good guy) told me that he thinks Actor implementations are important for Ruby:

I think the industry is about to make a shift toward erlang (actor)-style concurrency, because it makes true concurrency transparent and easy for the user. Home computers are being shipped with more and more cores these days, and pretty soon, taking full advantage of the hardware available will necessarily imply concurrent programming.

I also asked around for recommendations about good resources for learning about Actor based concurrency (and concurrency in general). People were unanimous in suggesting that interested programmers spend some time with Erlang to get a handle on it. The book Programming Erlang also got high marks. Ola Bini also recommended Java Concurrency in Practice for those with a Java bent/background.

However you choose to do it, I recommend spending some time with Actors, it looks like a good way to get more out of your programming. Let me just go back to Revactor’s Philosophy page for a second:

Actors rule. You really should use them in your programs. Especially if your programs do a lot of stuff at once. Seriously, whatever you’re doing besides Actors, it probably sucks. Actors are this awesome panacea that will make it all better, I swear. In conclusion: use them, do it!

There, are you convinced now?

Interview With Revactor Developer Tony Arcieri

With the recent release of his Revactor library, I wanted to talk with Tony Arcieri about Ruby, Actors, and Revactor. He was kind enough to sit down for a short interview. Here’s what we talked about.

How did you get started with Ruby?

Tony Ruby is a language some roommates of mine were using for years and kept raving to me about. Unfortunately, I was a performance-obsessed C programmer and couldn’t really get past the whole “Ruby is slow” stigma. Then in early 2005 Rails started generating a lot of buzz, and I got sucked into using Ruby for web development. A few years later I can look back wondering how I could stand programming in C for so long.

Revactor is an implementation of Actors for Ruby 1.9. Is there a reason you targetted 1.9 instead of Rubinius (with tasks) or another implementation?

Tony Ruby 1.8 already supports Actors with the Omnibus Concurrency Library and Rubinius supports them in its standard library. I’m not aware of an Actor model implementation for JRuby but it’d be pretty easy to do with a Scala-like thread pool. I chose Ruby 1.9 because I felt that, for the time being, it’s the most practical and performant platform for writing network applications with the Actor model. Revactor is built on a number of Ruby 1.9-specific features, specifically Fibers which provide the underlying concurrency primitive. However, Revactor is also built on top of an event library called Rev whose feature set was tailored for implementing high performance networking within the Actor model (although it can be used as a general purpose event library if you so desire). Ruby 1.9 contains several features which made writing this event library quick and easy with minimal C code. These include things like support for blocking system calls and non-blocking I/O.

However, I definitely feel that down the road Rubinius will be much better suited. Rubinius already supports multiple shared-nothing virtual machines which each run in their own hardware thread and can communicate over an internal message bus. Using that in conjunction with Actors, you can do scatter/gather distributed programming (MapReduce is probably the most famous example of this) which can run a copy of a job on each VM (and thus on its own CPU core) then reduce the results to the final output. With this approach, your program runs N times faster on N CPUs.

What do you think about some of the other approaches to concurrency? (See MenTaLguY’s page for example.)

Tony Many of the techniques there can go hand in hand with Actors (futures, for example). As far as non-Actor approaches, my favorite is probably join calculus as seen in languages like JoCaml.

MenTaLguY has long been involved in concurrency in Ruby. I see that you’re using his Case gem in Revactor. What other influence has he had on Revactor?

Tony MenTaLguY has been very helpful in smoothing out the API design and will hopefully be making Revactor thread safe in the near future. He’s pointed out solutions to problems which, in retrospect, were pretty obvious but I just didn’t see at the time. We’re trying to put together something of a standard Actor API and protocol such that a program written using Actors in Ruby isn’t tied to a particular implementation and can run on Omnibus, Rubinius, or Revactor. We’ll also hopefully be putting out a cross-compatible gem which bundles up a lot of the standard Actor functionality so there aren’t 3 different implementations of the same thing floating around.

You’ve got a great introduction to Actors up at your Philosophy page, but it’s a little light on code. Could you give us an example of Revactor at work?

Tony There’s a number of code examples available on http://doc.revactor.org which go a bit more in depth as to how Actors send and receive messages, but here’s an example of an echo server:


# An example echo server, written using Revactor::TCP
# This implementation creates a new actor for each
# incoming connection.

require 'revactor'

HOST = 'localhost'
PORT = 4321

# Before we can begin using actors we have to call Actor.start
# Future versions of Revactor will hopefully eliminate this
Actor.start do

# Create a new listener socket on the given host and port
  listener = Revactor::TCP.listen(HOST, PORT)
  puts "Listening on #{HOST}:#{PORT}"

  # Begin receiving connections
  loop do

    # Accept an incoming connection and start a new Actor
    # to handle it
    Actor.spawn(listener.accept) do |sock|
      puts "#{sock.remote_addr}:#{sock.remote_port} connected"

      # Begin echoing received data
      loop do
        begin
          # Write everything we read
          sock.write sock.read
          
        rescue EOFError
          puts "#{sock.remote_addr}:#{sock.remote_port} disconnected" 
        end
        
        # Break (and exit the current actor) if the connection
        # is closed, just like with a normal Ruby socket
        break
      end
    end
  end
end

This doesn’t demonstrate inter-Actor messaging (although it’s doing it behind the scenes). However, what you do see is that there’s very little disconnect between using Revactor and writing a traditional threaded network server. If you’ve written programs in the past using Thread and Queue, then moving over to Revactor will be easy, and you’ll find Actor mailboxes to be a much more powerful way of processing messages.

Are there any books, blogs, or websites you’d recommend for learning more about concurrency in general or actors in particular.

Tony Programming Erlang by language creator Joe Armstrong was immensely helpful in understanding Actor-based concurrency, and many of the ideas in Revactor are drawn directly from Erlang. Some of the Erlang portal sites such as planeterlang.org also cover concurrent programming in general, particularly with Actors.

Friday, January 25, 2008

MWRC mini-interview: Patrick Farley

Here’s another mini-interview to whet your MWRC appetite. I asked Patrick Farley of ThoughtWorks about his talk and which MWRC sessions he’s excited about. With speakers like this, how could you not register—$100 bucks for two days of Ruby awesomeness is an incredible deal.

Your session is entitled “Ruby Internals”, why should people be excited to come see it?

Programming in Ruby leads to a lot of head scratcher moments? “Called id for nil, which would mistakenly be 4” anyone? In my experience, getting past the head scratching and into a nuanced understanding of what’s what in Ruby is a lot easier than one might think. The big barrier is, of course, C. The thing to keep in mind, is that you don’t have to be Brian Kernighan to read and understand some basic C code, particularly when you know the domain well. The domain of Matz’s Ruby Interpreter (MRI) is the Ruby language itself, so Ruby programmer are, by definition, domain experts. The goal of my talk isn’t to turn the audience into overnight Ruby internals experts. As the books will tell you, becoming a guru in any technology takes exactly 21 days. Instead, I aim to give an advanced introduction to some key areas of Ruby internals and at the same time equip folks to do additional exploration on their own.

What’s your Ruby/RoR background?

I’ve been using Ruby since 2005, and working full time on enterprise Ruby and RoR development for close to two years now. I’m lucky enough to work for ThoughtWorks where I’ve helped to put some fairly large Ruby projects into production.

Which session are you most looking forward to seeing?

I’m jazzed about a few of them. Philippe Hanrigou is a colleague and good friend of mine, and his shortcut for Addison Wesley, “Troubleshooting Ruby Processes”, is a fantastic resource so I’m of course looking forward to his talk, “What To Do when Mongrel Stops Responding to Your Requests and Ruby Doesn’t Want to Tell You About It”. Devlin Daley’s “Enough Statistics so that Zed won’t yell at you” also sounds great. I’ve been meaning to dig into the R language for a while, so I’m hoping this will be just the motivation I need. I’m also a bit of a closet DBA, so Jan Lehnardt’s “Next Generation Data Storage with CouchDB” sounds like important stuff that I’m looking forward to hearing about.

Practical Ruby Projects Review

DZone digg reddit

Not too long ago, I interviewed Topher Cyll about his new book, Practical Ruby Projects. Now that I’ve read through it, I wanted to share my thoughts with you.

First things first, this is not a first Ruby book, but once you’ve gotten you feet wet with Ruby, this is a great book to help you get to the next level. It features eight projects in nine chapters (one per chapter, plus an introduction), that should entertain while they teach you more about Ruby programming. I do wish Topher had included a set of exercises or enhancment suggestions in each chapter to point the way to some next steps, but that’s a pretty niggling issue.

The first three project chapters (two through four) all introduce some cool, lesser known Ruby skills:

Chapter 2, Making Music With Ruby, uses the DL library to link against existing MIDI libraries in C.
Chapter 3, Animating Ruby, builds a DSL to create SVG files, rasterize them, and finally build simple animations with them.
Chapter 4, Pocket Change, uses memoization to help speed up largish simulations running in Ruby (this could be a cool place to jump into RubyInline to revisit C if you’re looking for a follow-on project)

The other chapters continue to build Ruby hacking skills with some really cool projects. Chapters five and six build a turn-based strategy game (DinoWars), then use RubyCocoa to put a pretty face on it. Chapter seven dives into genetic algorithms (see also my interview with Sander Land on the same topic). Chapter eight implements Lisp in Ruby. Finally, chapter nine is all about using Ruby to build parsers.

Topher knows his Ruby (in chapter four, he dives into the internals of Ruby’s Hash class), and through this book he helps the reader build a solid base of Ruby knowledge as well. If you’re looking for a way to push your Ruby-fu to the next level, Practical Ruby Projects is a great way to go.

Wednesday, January 23, 2008

RubyZone, RailsSpace, and Ruby DVDs

dZone has recently introduced a set of ‘zones’ with original content, forums, and other things about specific topics to compliment their social bookmarking site. One of these is the RubyZone. I’ve been asked to work as a ‘Zone Leader’ there, writing new material and helping build the community. Even before I’d gotten there someone else had written some material, a review of the book RailsSpace: Building a Social Networking Website with Ruby on Rails.

Since there are a lot of people who can do a better job of reviewing Rails books, I’ve been trying to stick to pure Ruby (or other non-Rails) books recently. But I feel like I should point out something about this book. Addison-Wesley is moving their Professional Ruby Series beyond books and ‘digital shortcuts’ and into video with the release of their RailsSpace Ruby on Rails Tutorial

I’ve lent a copy of this to a friend from church who’s just getting started with Rails. Every week now, he comes up to me and tells me how wonderful Rails is and how excited he is to be working with it. The DVD contains over 5 hours of video spanning 15 tutorial sessions. Judging from my friend’s response, I’d say that this video is a great way to get started with Rails.

Tuesday, January 22, 2008

MWRC Mini-Interview: Jeremy McAnally

Since the MountainWest RubyConf speakers have been announced, I’ve started to do some mini-interviews with each of them. This interview with Jeremy McAnally is the first. By the way, you can read more interviews with Jeremy here or here—you can also pre-order his book, Ruby in Practice, from amazon.com.

Why should people be interested in your ‘Deep Ruby’ talk?

Jeremy “Deep Ruby” is going to give attendees a solid primer for the rest of the “dynamic feature” focused talks. It’ll be a high level discussion of the dynamic features, the terminology, and give attendees a little of the “flavor of Ruby.”

Which other talks are you most looking forward to?

Jeremy I’m looking forward to most of the talks, but especially the talk on DSLs and Datamapper. I’m really excited to see more obscure topics getting exposure at the conference level.

What do you see as the big benefit of regional Ruby conferences?

Jeremy I love the regional conference for a lot of reasons, but the biggest reason is size and community. The small size lets you talk to nearly everyone at the conference, and if there’s someone there that you really want to talk to (for me that was Bruce Tate at the Ruby Hoedown), they’re far more accessible at the regional conference than at something like RailsConf.

Thursday, January 17, 2008

Interview with Sander Land, Developer of charlie

Sander Land recently released charlie a Ruby library for using genetic algorithms. Since the initial release, he’s made several updates, and posted some performance information for charlie on Jruby 1.1 RC1. I was intrigued by the activity, and asked Sander to join me for a quick interview.

Why the name ‘charlie’?

Sander It’s a reference to Charles Darwin. The name was originally the project’s code name, and it just stuck.

Everything I know about genetic algorithms, I learned from wikipedia. I’m guessing a lot of other people are in the same boat. Maybe you could help us all out. What are genetic algorithms?

Sander Wikipedia is not a bad place to start:

“A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover.”

The basic idea is that you have a population of solutions, and some kind of fitness ranking. You select solutions such that those with higher fitness (on average) get more children. This makes the solution converge towards some local or global optimum. Crossover and mutation are used to generate more variation in the solutions, so it can explore new parts of the search space.

There are a lot of ways to do selection, crossover and mutation. The basic idea behind charlie is to make it easy to choose these parts, and to be able to compare several of these strategies to see which works the best.

Ok, so what are they good for?

Sander As wikipedia says “optimization and search problems”. That is, any time you need to optimize something or when you can restate your problem as optimizing something. Examples are finding a shortest cycle through a graph, or finding the airplane wing with the most lift.

A big advantage of GA over other optimization algorithms is that you only need a simple representation of your solution (usually an array of floats or a bitstring) and a fitness function which returns a higher value for better solutions. There’s no need for calculating derivatives or knowing specialized algorithms for each problem.

So, is delta¹ a good example of the space that GAs play in?

Sander It’s not a typical example, but it’s probably possible:

Take a bitstring of length (# of lines in original file) as your genotype. Use zero/one at position i to mean “solution does (not) include line i”.

Take 1/(# of ones) as the fitness value, but assign a value of 0 to uninteresting files.

Initialize all your initial solutions to all ones.

The problem is hard to solve with GA because you’re forcing the uninteresting files to have a fitness value of zero. There is no way to tell ‘how uninteresting’ the file is. This makes step (3) a necessity, instead of being able to use the standard random initialization, because otherwise you would most likely start with a population of uninteresting files and no way to determine which are the better solutions. It also has a negative effect on the convergence of the algorithm, because every small mistake, even when combined with an improvement somewhere else, is going to result in an fitness of zero.

To get back to my earlier, more typical, examples:

All cycles through a graph will have some length.

All wing designs will have some lift force. This could be negative, but even in that case you can tell ‘how wrong’ the solution is.

What needs did you have that made you write charlie?

Sander I had used GAs a few times, using a ~250 line Ruby script which was rather messy (think global variables with a hash of several procs of 5+ lines each). I wanted to make a library which was a lot cleaner and easier to use, but without restricting what genotype/selection/crossover/mutation I could use. I also wanted to learn more about genetic algorithms and genetic programming, and thought writing a library would be a nice way to do this.

Finally, when using a GA to solve a problem there was always the question “which settings are the best?”. Most settings work reasonably well, and it can be hard to determine whether some are actually better, or just ‘lucky’ the single time you tried them. Hence the focus on benchmarks to answer these questions.

What books/websites would you recommend for learning more about genetic algorithms?

Sander This website has some basic info and a lot of links to news articles, tutorials, etc.

A simple google search for “genetic algorithms tutorial” will also give some decent results, but reading a book is probably the best way to learn more about GA.

Some of the examples included with charlie are from “Evolutionary Computation for Modeling and Optimization” by Daniel Ashlock. He has the book on his website.

Is Ruby really a good language to do things like this in?

Sander I think so. Ruby certainly makes it easy.

As for performance issues, the fitness function is often the bottleneck for more complex problems. Because you always need to implement this function yourself, it is easy to increase performance by writing this in a lower level language.

Another issue is parallelization. This is not yet implemented anywhere, but I have something planned for multiple parallelly evolving populations (multi-deme GA). I’m not sure how difficult it will be to do this.

Wikipedia talks about a number of programming methods related to genetic algorithms, do you see charlie growing to encompass any of these, or do we need to look for other libraries (and other developers)?

Sander Some of those programming methods, like Evolution strategies and Evolutionary programming are very similar to GA, and sometimes even nearly indistinguishable, so they are easy to fit in a GA framework. A basic form of Genetic programming is already implemented, as there is a tree-based genotype. I definitely plan to extend charlie to encompass more GP techniques, and to try some selection and crossover techniques more common to ES and EP.

I currently don’t have any plans for extending the library to methods that are not similar to GA.

What have you learned about Ruby while writing charlie?

Sander Not much, actually. I’ve been programming in Ruby for more than two years now, and picked up most of the metaprogramming skills needed for this library from reading blog posts and making ruby quizzes.

However, with this being my first ‘large’ project, I’ve learned how to use rdoc, rake, hoe and other tools.

How did you end up picking up Ruby as a programming language?

Sander I was looking for a new programming language, because using C++ or PHP for everything was inconvenient. I first tried python, which was interesting, but somehow not interesting enough to make me stop searching. A few days later I tried Ruby. With the pickaxe included in the installer it was easy to learn and I liked its consistency. Within a week I joined the ruby-talk mailing list, and made Ruby my language of choice.

You were one of the first people to post benchmarks for JRuby 1.1 RC1. How much work did it take to get charlie running on JRuby?

Sander No work at all, all my tests succeeded on the first try. Again, great job JRuby team.

Does charlie run on Rubinius or 1.9 yet?

Sander Ruby 1.9: Yes, as of version 0.6.0 there is partial support, and the recently released version 0.7.0 has full support. This required a few workarounds, since there is still a rather nasty bug in 1.9 which makes superclass lookups fail in some cases. (note: The bug is reported here.)

Rubinius: Just tried it. Over 50% of the tests fail, so I guess that’s a no. But Rubinius’ performance is probably still too low to seriously consider using it for this kind of application.

What’s the coolest thing that someone’s used charlie for thus far?

Sander I don’t know! Except for you mailing me for the interview, and some people telling me they were going to check the library out, there has been no feedback at all.

What plans do you have for charlie?

Sander I have some additional features planned, like extra crossovers and genotypes for 2D arrays. I also plan to learn more about genetic algorithms and genetic programming, and include more features in the library as I learn about these things.

¹ ‘Delta assists you in minimizing “interesting” files subject to a test of their interestingness. A common such situation is when attempting to isolate a small failure-inducing substring of a large input that causes your program to exhibit a bug’

Wednesday, January 16, 2008

MountainWest RubyConf 2008 Announces One of Two Keynotes

MountainWest Ruby has announced the first of two keynote speakers for the upcoming MountainWest RubyConf 2008. Jim Weirich will be presenting “Shaving with Ockham”. His presentation explores the search for somple solutions to solve today’s complex problems.

Jim is the Chief Scientist at EdgeCase LLC, and is an active member of he Ruby community. He’s also a good friend and a great speaker; funny, insightful, and very educational. I’m really looking forward to hearing him talk in March.

An RWB Primer (from 2005)

vote this up on DZone

Since RWB had been relegated to the dust bin of depracation before I decided to pick it back up and start afresh, I thought it would be worthwhile to post some introductory information about it. This is cribbed from things I wrote at the time of the 0.1.1 release.

Introduction

RWB was written to scratch an itch that ab couldn’t quite reach. I wanted to be able to build a weighted list of urls with which to test a website. RWB will become a little language in which you can write such lists, run tests, and build reports. For now, I’m trying to get the engine and the reporting right. Please use RWB and let me know what else it ought to do. Criticism, suggestions, and especially patches are welcome.

Samples

Here are two quick examples of scripts using RWB and their output. (Note that I’m showing both stdout and stderr in the output.) Here’s a very simple test to start with:

require 'rwb'

urls = RWB::Builder.new()

urls.add_url(10, "http://www.example.com")
urls.add_url(10, "http://www.example.com/nonesuch")
urls.add_url(70, "http://www.example.com/entries")

queries = ['foo+bar', 'bar+baz', 'quux']
urls.add_url_group(10, "http://www.example.com/search?", queries)

tests = RWB::Runner.new(urls, 100, 5)

tests.run
tests.report_header
tests.report_overall([0.5, 0.9, 0.99, 0.999])

This script does a couple of things. It sets up an RWB::Builder object called urls, which contains three Url objects and a UrlGroup object. The three Urls consist of a weight and a URL—the total of the weights don’t matter, they’re not percentages. The UrlGroup has a weight, a base URL and an array of extensions to be added to that URL.

Once the urls object is set, an RWB::Runner object called tests is built. This object is then used to run the tests and build a report header and an overall performance report. The output of running this script is shown below:

$ ruby vsim.rb
completed 10 runs
completed 20 runs
completed 30 runs
completed 40 runs
completed 50 runs
completed 60 runs
completed 70 runs
completed 80 runs
completed 90 runs
completed 100 runs
Concurrency Level:       5
Total Requests:          100
Total time for testing:  1.29918 secs
Requests per second:     76.9716282578242
Mean time per request:   64  msecs
Standard deviation:      9
Overall results:
        Shortest time:  10 msecs
        50.0%ile time:  62 msecs
        90.0%ile time:  78 msecs
        99.0%ile time:  88 msecs
        99.9%ile time:  88 msecs
        Longest time:   88 msecs

Here’s a more involved test, which includes a warmup, using a url_group, and some fancier reporting:

require 'rwb'

urls = RWB::Builder.new()

urls.add_url(2, "http://www.example.com")
urls.add_url(3, "http://www.example.com/nonesuch")
urls.add_url(3, "http://www.example.com/entries")

queries = ['foo+bar', 'bar+baz', 'quux']
urls.add_url_group(10, "http://www.example.com/search?", queries)

tests = RWB::Runner.new(urls, 10_000, 50)

tests.sla_levels = [0.5, 0.9]

tests.warm_up(10)

tests.run
tests.report_header
tests.report_by_time
tests.report_urls([0.5, 0.9, 0.99, 0.999])

This generates the following output:

$  ruby minv.rb
warming up with 10 runs
1 2 3 4 5 6 7 8 9 10
completed 1000 runs
completed 2000 runs
completed 3000 runs
completed 4000 runs
completed 5000 runs
completed 6000 runs
completed 7000 runs
completed 8000 runs
completed 9000 runs
completed 10000 runs
Concurrency Level:       50
Total Requests:          10000
Total time for testing:  38.848752 secs
Requests per second:     257.408526276468
Mean time per request:   41  msecs
Standard deviation:      12
Results by time:
results for requests 0 - 2000
       Shortest time:  23 msecs
       50.0%ile time:  39 msecs
       90.0%ile time:  50 msecs
       Longest time:   93 msecs
results for requests 2000 - 4000
       Shortest time:  24 msecs
       50.0%ile time:  42 msecs
       90.0%ile time:  54 msecs
       Longest time:   68 msecs
results for requests 4000 - 6000
       Shortest time:  25 msecs
       50.0%ile time:  41 msecs
       90.0%ile time:  57 msecs
       Longest time:   82 msecs
results for requests 6000 - 8000
       Shortest time:  26 msecs
       50.0%ile time:  38 msecs
       90.0%ile time:  58 msecs
       Longest time:   81 msecs
results for requests 8000 - 10000
       Shortest time:  25 msecs
       50.0%ile time:  33 msecs
       90.0%ile time:  62 msecs
       Longest time:   102 msecs
Results for http://www.example.com :
       Shortest time:  35 msecs
       50.0%ile time:  52 msecs
       90.0%ile time:  67 msecs
       99.0%ile time:  81 msecs
       99.9%ile time:  98 msecs
       Longest time:   102 msecs
Results for http://www.example.com/nonesuch :
       Shortest time:  24 msecs
       50.0%ile time:  34 msecs
       90.0%ile time:  55 msecs
       99.0%ile time:  62 msecs
       99.9%ile time:  80 msecs
       Longest time:   83 msecs
Results for http://www.example.com/entries :
       Shortest time:  25 msecs
       50.0%ile time:  33 msecs
       90.0%ile time:  55 msecs
       99.0%ile time:  61 msecs
       99.9%ile time:  72 msecs
       Longest time:   79 msecs
Results for http://www.example.com/search?:
       Shortest time:  23 msecs
       50.0%ile time:  34 msecs
       90.0%ile time:  54 msecs
       99.0%ile time:  62 msecs
       99.9%ile time:  88 msecs
       Longest time:   93 msecs

If you’d like to grab rwb and give it a try, it’s available as a gem.

RWB Reborn

With Zed’s recent departure from the Ruby community, I’m going to un-depracate RWB (the Ruby Web Bench). I’d originally stopped development on it for two reasons:

I wasn’t happy with the basic design of the system. I still have concerns about its scalability and its extensibility, but I think both of these are fixable with some redesign/rewriting.
Zed’s rfuzz seemed poised to provide a better solution, but this never really materialized and doesn’t look like it ever will.

I still think there’s a need for a good web benchmarking and testing tool. Something that’s capable of checking multiple URLs, fuzz testing, and providing detailed reports about test results. RWB could be the basis for such a tool.

Before I can do anything else, I need to look into some basic design decisions.

How should I assemble, send, receive, and record HTTP requests and responses?
How can I better parallelize the work when a single process/system isn’t capable of generating the needed load?
What data do I need to keep, and how do I best make it accessible to users wanting to generate their own reports?
Do I need to maintain test information across multiple runs?

The answers to these questions will probably force some additional questioning:

Is Ruby the right language for all/part of this, or should I be looking to Ragel, C, Erlang, etc.?
Should I be providing a single package, or a library and an app that uses it?

So, what does this mean to you? First, if you’re interested in RWB (or are currently using it), I’d like to hear from you. What’s good, bad, and ugly about the current system? Second, if you’re interested in a new and improved website testing/benchmarking tool, what are you looking for? Third, if you’re interested in working on a system like RWB, get in touch with me because I’d love to build a team to do this right.

What am I going to do next? I’m off to read up on httperf and rfuzz—we’ll see where that takes me. Oh, and I've got a couple of pages of planning material I wrote before I depracated RWB for rfuzz -- I guess I should give those a good read too.

Tuesday, January 15, 2008

MountainWest RubyConf 2008 Registration Open

MountainWest Ruby, LLC has just opened registration for this year’s MountainWest RubyConf, which will be March 28th and 29th in Salt Lake City, Utah. According to the website:

MountainWest RubyConf is a two day conference costing just $100 which includes lunch both days, t-shirts, and a terrific opportunity to rub elbows with some of the smartest Rubyists around. We have a great list of presenters with compelling presentations.

The list of presenters does look great:

Giles Bowkett – Code Generation: The Safety Scissors Of Metaprogramming
Devlin Daley – Enough Statistics so that Zed won’t yell at you
Patrick Farley – Ruby Internals
Philippe Hanrigou – What To Do when Mongrel Stops Responding to Your Requests and Ruby Doesn’t Want to Tell You About It
Yehuda Katz – Faster, Better, ORM with DataMapper
Jan Lehnardt – Next Generation Data Storage with CouchDB
Jeremy McAnally – Deep Ruby
Joe O’Brien – Domain Specific Languages: Molding Ruby
Tammer Saleh – BDD with Shoulda
Jonathan Younger – Using Amazon’s Web Services from Ruby

While the keynotes aren’t announced yet, I think they’re going to be awesome too. I’m looking forward to the announcements. So, what are you waiting for? Go register—it’s only $100 for what looks like an awesome conference.

Friday, January 11, 2008

Real World Rubinius Performance

Well, I have good and bad news from the Rubinius front. This morning, I built the latest Rubinius from the git repository, and gave LogWatchR another try … and it worked! This is a huge step forward from my perspective, since I’ve had all kinds of wierd failures in the past.

The bad news (well, bad is an overstatement, let’s say ‘not so good news’) is that the performance is pretty bad at this point. To be fair, the Rubinius team is still focused on completeness. I’d expect performance to improve once they turn their eye to it.

Here’s my 30 second recap:

My ‘real world performance’ timings run a small Ruby app called LogWatchR, which scans syslog data for cataloged good and bad patterns. I feed it about 73,000 syslog entries (about 20 minutes worth) that I’ve got archived. I measure the execution time against repeated runs, and find the average run time and standard deviation.

Rubinius’ first successful run clocked in at an average of 143.03 seconds, with a standard deviation of 0.56 seconds.

Given that things took so long, I thought I’d try a quick profiling run too (I only ran against 1000 log entries though, to save time). Here’s the abbreviated version of what the profiler told me:

../shotgun/rubinius -r profile logwatcher.rb < short_log
Total slices: 160, 2680000 clocks
 % time   slices   name
   8.75       14   String#[]
   8.12       13   Hash#[]
   6.88       11   Array#<<
   6.88       11   Object#kind_of?
   5.00        8   #.single_load
   4.38        7   String#split
   4.38        7   Hash#keys
   4.38        7   Regexp#match_from
   3.75        6   Hash#[]=
   2.50        4   String#substring
   2.50        4   Array#replace

Now for the amazing part, the sampling debugger took almost no extra time, clocking in at 3.009 seconds of real time while a regular run took 2.805 seconds. Wow! I’m looking forward to seeing what the Legion of Rubinius Heroes can do with performance over the next couple of months.

If you’re interested in seeing more performance information, you might want to look at:

JRuby 1.1 RC1 Real World Performance

Real World Performance on Boxing Day

I’ve also written about profiling if you’d like to read more about that.

Russ Olsen Interview

Welcome InformIT readers! You might also like to see my review of Eloquent Ruby. Enjoy!

With the popularity of my recent review of Design Patterns In Ruby and a series of questions on the ruby-talk mailinglist I’ve put together this interview with Russ Olsen. If you’re interested in the story behind the book, or Russ’s thoughts on Ruby and OO thinking, read on.

Some people say that Ruby makes people better programmers and OO thinkers. Do you agree? Why or why not?

Russ I think Ruby can help make you a better OO programmer because it is fundamentally a very simple language. The simplicity starts at the syntax level: Ruby syntax is spare but clear; most of the things that you want to say in Ruby you can say without having to worry too much about fiddling details like braces or a semicolons or required but redundant keywords. Ruby is programming syntax cut down just close enough to the bone to make the code concise without making it indecipherable. The simplicity also goes way down deep: Ruby’s system of classes and modules and instances is about as stripped down as you can get and still have a flexible language.

All this simplicity means that the programmer can concentrate on the essentials of getting the job done without being distracted by clutter and trivia. And that, I think, leads to better programming and programmers.

Why are people interested in patterns in Ruby?

Russ Well, patterns are little packaged solutions to the problems that you come across when you build programs. So no matter what language you are using, you will probably be interested in how people solve problems in that language, interested in the design patterns appropriate to that language.

But of course, we also have these 23 specific design patterns, the ones from the GoF book. Why should Ruby programmers be interested in them? Mostly to see how they morph in the transition to Ruby. In writing Design Patterns In Ruby I saw this huge opportunity to explain that with Ruby we can improve on many of the patterns that were presented in Design Patterns. Now Design Patterns is a software engineering classic for some very good reasons, but it was written a long time ago. Many of the specific solutions – the patterns – presented in Design Patterns are very mid-90s C++ oriented. Many of them are not really not really appropriate to a open, dynamically typed language like Ruby. So in my book I tried to show how you might solve those original problems in a way that is appropriate to Ruby.

How did you pick the patterns you covered in the book?

Russ I started by sitting down and making a list of the design patterns that I had actually seen in production code over the years. I also tried to think hard about the problem that each pattern was trying to solve. Then I went exploring the base of Ruby code out there – I think I read every line of Rails at least once, along with Rake and some lesser known, but really stellar projects like runt. I tried to find the code in those programs that grappled with the problems that the GoF patterns solved.

Turns out that if you go looking for them in real Ruby code, the GoF patterns fall neatly into three classes. First there are the patterns that are relatively unchanged: a builder in Ruby looks pretty much like a builder in Java and a singleton in Ruby works pretty much like a singleton in Java – though in Ruby you can get the singleton job done with just a couple of lines of code.

Second, there were the patterns that morphed to a greater or lesser degree as they passed into Ruby. There were a lot of these: iterators, proxies, decorators, adapters, commands and strategies all come to mind.

The third category was possibly the most interesting: these are the patterns that are common in other programming languages but simply do not get much use in Ruby. For example, if you go trawling for either of the classic GoF factory patterns in the Ruby code base you are going to come up with a pretty empty net. Why are factories so common in Java and yet so rare in Ruby? Have Ruby programmers completely missed the boat? Are the Ruby folks simply ignoring the problems that the factories solve? Or have they found (cue dramatic music) a better way?

You introduced three Ruby specific patterns in your book, are there more of these out there?

Russ Well, I took a look at internal Domain Specific Languages, at Metaprogramming and at Convention Over Configuration. I thought that was enough for any one book. But in terms of all of the possibilites that Ruby offers, I made a barely visible scratch in the surface.

Look at it this way: All of the programmers who are coming to Ruby from the C++/Java/C# worlds are used to programming under some pretty restrictive rules. In those more tradiional languages object behavior is completely determined by the class. Classes are fixed. There is compile time and run time and things that you can do during the one that you may not do during the other. Ruby removes all of these restrictions. With Ruby you can create an object with methods all of its own, different from those of its class. You can modify just about any Ruby class anytime. With Ruby there is no compile time and run time: it’s all just time.

But with all of this freedom comes some serious responsibility. In programming as in life, just because you can do something does not mean that you should do it. I was around when the industry adopted object-oriented programming and it took us just years and years to go from “Gee, this OO stuff is great!” to really knowing how to apply OO effectively. I suspect that the industry is in the same situation with dynamic, open languages like Ruby – many of us have seen the light, but I think we are very much in the early days of figuring out how to apply these languages effectively. I’m looking forward to seeing the results of all of that figuring out rolled up in more Ruby oriented patterns.

Are there other patterns that you think deserve coverage?

Russ Absolutely. Take ActiveRecord for example: Did you know that along with being the name of the database interface included with Rails that ActiveRecord is also a design pattern identified by Martin Fowler? Not many people realize this. In the book I talk a lot about the patterns that you find inside of the Rails ActiveRecord implementation, but I didn’t really have the space to talk about the ActiveRecord pattern itself.

Another easy example is Model/View/Controller. It’s been talked to death in Java, but how about the implementations of MVC that you find in the various Ruby based web frameworks? There are also the various testing patterns that emerging from the Ruby community. I could go on and on, but let me just leave it here: We have barely begun.

I loved your ‘pattern in the wild’ examples. How did you choose examples for the patterns in your book?

Russ For folks who haven’t read the book, the ‘Patterns In The Wild’ sections were where I wrote about examples of the design patterns that I found in actual Ruby applications and libraries. Researching the ‘Patterns In The Wild’ was probably the most fun I had in writing the book: Imagine that your job is to go out and find the most interesting, ingenious examples of code that you possibly can.

Mostly I found the examples by simply keeping my eyes open; as I said, early on in the book I was spending a lot of time reading Ruby code and I would just note down things that caught my eye.

With your ‘Design Patterns In Ruby’ and the forthcoming Ruby rewrite of ‘Refactoring’, it looks like Ruby is becoming a new language for expressing the classics. What other books do you see being rewritten in/for Ruby?

Russ I am very much an old school programmer and so the books that I would like to see updated are equally old school. I would really like see the Ruby equivalent of Knuth’s Art of Computer Programming. Volume 2 of this massive tome has been my constant companion for decades – I think that sometimes my wife is a little jealous of Mr. Knuth.

The other book that I would like to see updated into Ruby is the The Mythical Man-Month by Fred Brooks. Doing this translation would be easy because there is virtually no code in the book! Brooks was writing about problems of building software in the late 1960’s, about how to organize technical people to work well together. I think that the book is still instructive because it shows that no matter how much technical progress we make – and I certainly think that languages like Ruby represent real progress – there is a human factor in software engineering that hasn’t changed a bit.

Why did you include an introductory chapter on Ruby in a book an advanced book on Ruby programming?

Russ The reason that I included the introductory chapter about Ruby in there was to make the book accessible to folks with little or no Ruby background. Now honestly, I don’t think that you could come to my book with no background in Ruby and walk away from it an expert Ruby programmer – it’s not really that kind of introductory book. But I do think that someone with experience in other languages could read my book and come away knowing about Ruby, understanding what all the shouting is about.

I also think there is a deeper point here. I’ve gotten a fair amount of feedback from the Ruby community that my book is an ‘advanced’ Ruby book because I talk about things like open classes, method_missing, DSLs and metaprogramming. I have to say that I think that we Ruby people need to stop describing these techniques as ‘advanced’: all we are doing is scaring people. We need to shout out to the world our little secret: we only use those scary sounding ‘advanced’ techniques because they make our lives easier. We use them because we are a bunch of lazy sods (as all good programmers are) and they let us write programs that work, with less effort.

Take metaprogramming for example: It’s not easy to write that first metaprogramming program. Metaprogramming is one of those techiques that you need to work at when you first try it. Well my son is studying algebra and he occasionally has trouble wrapping his head around ideas like simultaneous equations or open intervals, but that doesn’t make algebra advanced mathematics. Fundimentally the concept of metaprogramming is really quite simple: In order to solve the problem at hand, your program modifies itself at runtime. Anyone who can write code at the keyboard can learn to write programs that write code at runtime. It’s just not that complex. Writing guidance software for a Mars probe, now that’s advanced programming. Metaprograming, not so much.

This is important because there are a lot of programmers out there who are massively frustrated with trying to solve problems with the traditional programming languages and are looking for an alternative. I think we Ruby folks have a professional responsibility to shout out, “Hey, for many of your problems there is an easier way.” Now that easier way involves some programming techniques that seem strange in the beginning, but once you get used to them they stop being so strange and all you are left with is easier.

Thursday, January 10, 2008

Rubinius Momentum

Evan Phoenix doesn’t update his blog very often, but when he does it’s worth reading. His most recent post is all about momentum, and it covers a cascade of big news in the Rubinius world.

It all started with the team finishing the new compiler. This led to an implementation of Kernel#eval, which led to a working irb, and so on …

Another indication of Brian Mitchell bringing up the idea of Io on Rubinius (which reminds me of my post Will rubinius Be An Acceptable Lisp from a year ago). While Brian hasn’t released working code yet, it’s certainly interesting that someone would even consider tryig to build an Io on top of a VM designed for Ruby.

Sequel Interview with Sharon Rosner

With the recent release of Sequel, I took some time to interview Sharon Rosner about it. Sequel is a really cool project, and uses some Ruby tools in novel ways. Read on and find out more.

There are already several ORMs out there. Why write another one?

Sharon I wrote Sequel mainly because I tried ActiveRecord and it really didn’t fit what I wanted to do. The first thing that frustrated me was the lack of support for multi-threading. I was messing with Mongrel and writing my own web controller framework, and couldn’t get ActiveRecord to work properly with multiple threads. It leaked memory like a sieve, and it just felt wrong. There was also no support for connection pooling for example.

The other, even more important, issue was that ActiveRecord is great for dealing with individual records, but if you need to work with multiple records or large datasets, it kinda feels awkward. If you need to filter then you have to write raw SQL expressions, and it’s a bit hard to do GROUP BY and than sort of stuff. Also, ActiveRecord loads the entire result set into memory before you can iterate over it. Not very nice if you work with millions of records.

So I started from there and tried to design a Ruby interface for databases that would feel like Ruby code, where I didn’t have to switch between SQL and Ruby. So the basic idea was that you can express an SQL query using Ruby constructs, and then iterate over the results just like you iterate over an array or any Enumerable, fetching each record as a Ruby hash with symbols for keys:

DB[:posts].filter(:category => 'ruby').each {|row| puts row[:title]}

I actually wrote Sequel to be used in this project I was working on (and still am), and added more features as I needed them. The development of Sequel is still very much feature-request driven. People ask for stuff and if it’s a good idea we add it the library.

Apart from that, I like the fact that there are several different ORM’s for Ruby. Each has a different mindset, each has its pros and cons. It’s very much in the Ruby spirit – one of the things I like most about Ruby is that you can write the same stuff a million different ways. Some people dislike it, but I think it’s brilliant!

What are some of the cooler things that you’ve added to Sequel based on other people’s requests?

Sharon Some people were asking for a way to change table schemas easily, so I came up with a DSL for doing that, so you can do stuff like:


alter_table :items
  add_column :name, :text, :unique => true
  drop_column :category
end

And Sequel will generate the correct SQL for you. Another thing that was requested is support for accessing values inside arrays, so we came up with a way to specify array subscripts:

DB[:items].filter(:col|1 => 0).sql #=> "SELECT * FROM items WHERE (col[1] = 0)"

What spaces do you think Sequel plays well in? Which spaces should be left to a different approach?

Sharon One of the things that separates Sequel from other ORM’s is that you don’t really have to define model classes. In fact one of the recent changes (in version 0.5) was to divide the code into two separate gems—sequel_core which takes care of connecting to databases and fetching records, and sequel_model which implements an ORM layer on top.

With sequel_core you can fetch records as naked Ruby hashes, which you can also use to insert and update records. So that gives you a lot of freedom in querying virtually any database, without making assumptions on how the schema looks. So Sequel can be used with legacy databases and also as a general purpose tool for writing short scripts that process records. In fact, Sequel also lets you fetch records with arbitrary SQL and iterate over the results using the same interface.

Sequel can be used as a general low level database access library. It is built so you can stick any ORM model implementation on top. I know for example that there was an effort to graft Og on top of Sequel, so this kind of stuff can be done. Sequel already has adapters for ADO, ODBC, PostgreSQL, MySQL, SQLite, and DBI. There are also experimental adapters for Oracle, Informix, DB2, OpenBase and JDBC (on JRuby). In that respect, I believe Sequel can a good replacement for DBI.

The flip side, however, is that Sequel is not really made for Ruby beginners. The API is very terse and very powerful if you know how to use it, but newcomers might be bewildered by how short everything looks. :-) They better stick with ActiveRecord, Og or DataMapper.

What other DB tools have you looked at, learned from as you’ve worked on Sequel?

Sharon Actually a lot of my inspiration came from two Python frameworks: web.py, which is sort of a micro web-framework, and sqlalchemy, which is in my opinion a brilliant piece of work. I haven’t used, but I read a lot of the documentation and grabbed some ideas from there, like for example using a URL to specify a database connection, or the separation between a layer that deals with fetching records and a modelling layer.

What have you learned about Ruby while you’ve been working on Sequel?

Sharon I learned a lot about meta-programming. I got a lot from reading the stuff that _why wrote. He’s probably the brightest most genius Ruby programmer in existence today! A lot of parts of Sequel are written using meta-programming techniques. Once you know how to use those, you can do nuclear stuff!

The biggest discovery I made though was ParseTree. It’s a library that takes your code and gives you back a parse tree made from arrays with symbols in them. It’s the ultimate meta-programming tool. I believe it’s really the next step for Ruby programming and should become part of the Ruby core. Unfortunately, it doesn’t work with Ruby 1.9, at least for now.

Sequel uses ParseTree to translate Ruby code into SQL. So for example, the following code:

DB[:items].filter {:score > 100 and :category.in('ruby', 'perl')}

Is translated into the SQL statement “SELECT * FROM items WHERE (score > 100) AND (category IN (‘ruby’, ‘perl’))”

The Ruby to SQL translator (I call it “The Sequelizer”) is also smart enough to evaluate Ruby expressions, so you can also use constants, variables and ivars, and make calculations.

Another important thing I learned was the value of properly written specs and good code coverage. The thing about rspec is that unlike unit testing, when you write specs you tend to repeat a lot of expectations. I found that with specs the code gets a much more thorough work-out than with unit tests. A lot of stuff gets tested multiple times under different scenarios.

RSpec has also proved to be an indispensable tool for debugging, but this really requires a change in how you work. When you work on a bug, instead of just hammering out a solution, you first write a spec that will fail, setting expectations that expose the bug. Only then do you fix your code so the spec will pass. This seems trivial but it’s important when you develop a library used by hundreds of people. When you work this way you can also ensure that the bug doesn’t return when you make changes to your code.

I use RCov in conjunction with RSpec to make sure every line of code is covered by specs and as of version 1.0 we have 100% code coverage!

Are you using other coverage tools (like dcov or heckle)? Why or why not?

Sharon Both dcov and heckle are things I haven’t looked into in depth. Sequel is not very strong on documentation, and we should put a lot more effort into it. As regards heckle, I tried playing with it a few times, but eventually it would always make the specs hang. Some of the Sequel specs really wreak havoc with Ruby threads and loading/unloading of classes and methods, so I don’t know if heckle can really be beneficial for us. As many other people have already observed, 100% code coverage doesn’t mean there are no bugs or that the code is perfect, but it’s still, together with specs, a good indication of code quality.

Often, people ask if you need to know Ruby to use Rails. How much SQL do you need to get started with Sequel and how much SQL do you need to know to really use Sequel well?

Sharon That depends. The abstraction provided by ORM’s is never perfect, and if you’re going to deal with a relational database, you pretty much need to understand SQL. But with Sequel, you don’t need to know how columns, strings and other literal values should be quoted, or worry about SQL injection, or what’s the correct order of SQL clauses.

There’s a #filter method for specifying record filters, an #order method for specifying the order, a #limit method for limiting the size of the result set and so on. Sequel has idioms for column references (using symbols), SQL functions ( e.g. :max[:price]), array subscripts (as shown above), qualified column names, column aliasing, etc, so the mapping of Ruby to SQL is pretty much complete. There’s also substantial support for sub-queries, which is a believe a unique feature of Sequel. And you can always see what the SQL looks like by calling #sql. Sequel also provides a DSL for defining and changing table schemas. In that respect, you can forget how SQL statements look and manipulate your database using Ruby code.

So if you’re just writing a blog app and need to put some stuff in a database, you don’t really need to know SQL, but if you’re dealing with large data sets or complex relationships then you better have a good understanding of how relational databases work and how records are fetched, and that involves at least some knowledge of SQL.

What books, websites, etc. would you recommend for people wanting to learn more about SQL?

Sharon I’m really not the type that reads books on programming, I just dive right in. There are though tons of articles on the web about the subject. That’s especially useful if you want to look at how to do more complex stuff like multiple joins, sub-queries, grouped queries etc.

Where can people turn for more information about Sequel?

Sharon There are four good places to look:

the project page which contains some Wiki pages about Sequel.

the API Documentation

the google group dedicated to sequel (very friendly people there)

there are always some people lurking on IRC at #sequel or #merb.

I also encourage people to look at the code, which is here

Is there a Sequel book lurking in the wings?

Sharon Not that I know of, and I’m not really the person to take on that kind of project. But if somebody wants to go ahead and do it, that would be like super cool!

What’s the coolest thing that someone’s done with Sequel?

Sharon I’d have to say Hackety Hack, which is a project by _why to make programming accessible to kids. He wraps Sequel a bit to make it more friendly but you can see a bit of Sequel code right there on the front page!

Wednesday, January 09, 2008

Programming Collective Intelligence Review

Last week I wrote about the three books that O’Reilly sent me, and reviewed Data Visualization. In that review, I promised that I’d be reviewing the other two shortly. Here’s the first of those reviews.

Programming Collective Intelligence isn’t a book that lends itself well to review. Not that it’s a bad book, but because it takes a while to get through and really grok the material. This is a dense volume, which uses a lot of code to get its point across. That’s a good thing, because this is important (if difficult) material.

The book begins with a 5 page introduction and wraps up with a 32 page summary 9(and two appendices taking up 24 pages) which sandwich 270 pages (10 chapters) of great material covering:

recommendation systems
data clustering
searching and ranking
optimizing collaboration problems
filtering
decision trees
price models
advanced classification
data extraction and characterization
genetic programming

I’ve learned a lot from this book already (I’ve skimmed the whole thing, and read the first three chapters in some depth), but I can tell it’s going to take some sustained work to really get the most from it. I know that members of the Seattle.rb are organizing a study group, and are planning on translating the code examples from python to Ruby as they go—perhaps we should work on something similar here in Utah or maybe in a google group on-line.

So far, I’m finding the book well written with clear examples. I love the exercises at the end of each ‘content’ chapter. I also think Toby did a great job finding sample problems for each of the topics he covers (and pointing out other way in which the approach might be applied).

Tuesday, January 08, 2008

JRuby 1.1RC1 Real World Performance Update

With the release of JRuby 1.1RC1 I’ve run a new set of LogWatchR performance tests. This time, I’ve only run a set of 1.0 and 1.1 versions of JRuby. If you’re really interested in seeing more about 1.8 and 1.9 performance, you can always go look at my older post on the topic.

As a bit of clarification up front, this test measures the execution time of a simple minded log analysis tool I wrote/use against a 20 minute sample of syslog output. Contrary to the common assumption, this is not an IO bound process (see here for further details). I’m running this test with the following flags through 1.1b1:

JAVA_OPTS="-Djruby.objectspace.enabled=false \
  -Djruby.jit.enabled=true -server" 
jruby -C

Version 1.1RC1 says that jruby.jit.enabled=true is no longer valid, so I’m usingthis instead:

JAVA_OPTS="-Djruby.objectspace.enabled=false \
  -Djruby.compile.mode=JIT -server" 
jruby -C

Build	Avg Execution Time (in secs)	Std Dev
Jruby 1.0.2	21.03	0.69
Jruby 1.0.3	21.22	0.39
Jruby 1.1b1	18.8	0.95
Jruby 1.1RC1	15.41	0.22

It looks like the JRuby team is continuing to do great things with performance. This build is a 27% improvement over the 1.0.2 release and an 18% improvement over the first 1.1 beta.

I’m not a JRuby or Java performance geek, so if anyone on the team would like to lend me some performance enhancing clues I’d be happy rerun the tests.

UPDATE: Based on Charles comments below, I've now tried this with:

+C, which compiles every method
-C, which turns off compilation
the default compilation (no C switch at all) which will compile any method called more than 20 times

Since I'm making fairly small runs, any compilation is a pretty big hit to my run times (e.g., +C takes 25 seconds to complete on average). Maybe this weekend will be a good time to try some much longer runs.

Holiday Contest Winners

After looking at the entries in the Holiday Blogging Contest, the judges have picked a winner in both the Ruby and the Ruby on Rails categories. Each winner will receive three Apress books of their choice.

Tamer Salama won the Ruby on Rails prize for his entry How to generate a site-map in Rails.

Carl Leiby son on the Ruby side of the contest with Using Ruby to Get at Your Mac Unix Mail.

Congratulations to both winners, and thanks to Apress for sponsoring this contest.

Friday, January 04, 2008

Ruby In Practice Blogging Contest

Manning has been kind enough to offer up a free MEAP (Manning Early Access Program) copy of Ruby in Practice as a prize for this month’s blogging contest.

Since Jeremy and Assaf are writing about practical Ruby use, I’d like to hear what you guys are doing with Ruby. Write up your Ruby In Practice adventures, post them on your blog, then drop a link in the comments here, and you could win. You can enter as often as you’d like.

I’ll accept entries until midnight (Mountain time) on January 31st. Then Jeremy, Assaf, and I will take some time to review the entries and announce the winner. Good luck!

You can also read my interview with Jeremy and Assaf.

Ola Talks Scala

Ola Bini is one of my favorite blog reads, and recently he’s been writing about scala a functional language that lives on the JVM.

Scala has been pretty hot in the blogging world recently and the recent availability of Programming in Scala pre-prints should only increase that interest.

Since I’m working for a Java shop, maybe it’s time for me to spend a little more time getting acquainted with Java, JRuby, Scala, and the other languages that run on the JVM. Doing so would certainly tie in with the ideas of learning a new language each year and becoming a polyglot programmer.

Thursday, January 03, 2008

Data Visualization Review

O’Reilly recently sent me three books to review. They’re all slightly outside my normal Ruby and Linux range, but they’re also all intriguing enough that I had to pick them up and give them a read. I’m planning on reviewing all three over the next week or so since I’m enjoying all of them. (UPDATE I've reviewed the second, Programming Collective Intelligence, here.) The one that most caught my eye though was Visualizing Data by Ben Fry (developer of Processing, the data visualization tool used in the book.)

At first, I was a bit put off that the heavy emphasis on Processing wasn’t part of the title, but was ‘hidden’ in the sub-title and description. The more I read though, the less it seemed to matter. Ben gives good reasons for his use of Processing in the book, and does a good job of teaching the reader about Data Visualization first, and processing second.

The first chapter lays out ‘Seven Stages of Visualizing Data’ as a process that is then followed throughout the rest of the book. The next two chapters give a tutorial for Processing. Next are five chapters covering basic kinds of data and data visualization: “Time Series”; “Connections and Correlations”; “Scatterplot Maps”; “Trees, Hierarchies, and Recursion”; and “Networks and Graphs”. The book closes with three chapters that deal with tasks underlying the data visualization: “Aquiring Data”; “Parsing Data”; and “Integrating Processing with Java” (this last one is pretty Processing oriented).

If you’re working in a Java shop where Processing is either used or could be brought in without too much effort, buying this book is an easy choice. If Processing doesn’t look like a good choice for you, this book is still worth a look for the concepts it teaches.

Wednesday, January 02, 2008

Practical Rails Projects Interview

To celebrate the close of the Holiday Practical Ruby and Rails blogging contest I wanted to post this interview with Eldon Alameda. He and I spent some time talking about his new book Practical Rails Projects and other Ruby and Rails topics. I'll be announcing the contest winners soon, until then, enjoy this interview.

Why do you think Rails is a practical framework for Web development?

Eldon I think my answer for this is pretty boring because it’s the common things that we always hear about when describing Rails. Rails not only made it easy for web developers to adopt many of the best practices that we had been giving lip service to for years, but it did it in an attractive and fun way that consistently rewarded you for doing things “the right way”.

For me – Rails made web development feel right again which brought back a lot of the fun that had originally attracted me to becoming a web developer.

What makes a Rails project ‘practical’?

Eldon I guess that depends on your definition of practical ;-) But in the context of the book I was thinking of the projects as “practical” based on the ideals that they were involved in the practice of Rails Development. So for me, what I really wanted to do with this book – was to create a series of projects that would help people learn Rails. I’m a big believer in the idea that the best way to learn a language or framework is to build a variety of projects with it. On my own hard drive I have a huge mess of dummy applications that were built just to play around with a specific feature or plugin, and I think that type of playfulness and experimentation is one of the keys to my success in web development.

How did you settle on the projects in the book?

Eldon Well like i said I for this book I wanted to try and help the readers by guiding them through the creation of a series of projects designed to explore features of Rails that I thought were important to understand but that might get breezed over in other books. So in that regard – I considered the project practical if I thought it was teaching something that would help build a better understanding of a principle or feature that would be beneficial to Rails developers as they launch into their own projects.

On the other hand though, I really wanted to try and make sure that the readers would actually have the opportunity to engage their own minds while building the projects, so I wanted to avoid too much “spoon-feeding” of the information to the reader. So I also tried to choose projects that might have a fun twist or angle on them – that we could get up and running fairly quickly in the book but leave ample opportunity afterwards for the reader to expand upon the ideas and projects to customize them for their own needs.

I really wanted to help people move from book/head knowledge of Rails to more hands-on experience.

How important do you think it is for a Rails developer to really know Ruby? What about getting involved in the Ruby community?

Eldon I actually think that a Rails developer can get pretty far with Rails before having to dig into Ruby. I like to joke that Rails is the doorway drug to using Ruby, because Rails really does make it possible to build some pretty complex applications with code that can appear fairly simple. Someone building a site in Rails may not even realize that they’re using things like blocks, iterators, etc. So in that regard it helps ease people into Ruby. Of course to provide balance to that statement – there is a pretty significant glass ceiling that comes into play for Rails developers who don’t dig deeper into Ruby (and SQL for that matter).

I do however think that it’s critical to get involved with the Ruby community at least at the local level. I’ve been attending my local Ruby users group for over two years now and I can’t begin to express what a wonderful blessing it has been to me along the way. It’s one of those few meetings that I truly look forward to every month. As I was starting out with Ruby – it was fantastic to be able to interact with Sr. Ruby developers who were kind enough to explain things or point me in the right direction for solving problems.

What did you learn about Rails while you were writing this book?

Eldon Curiously enough, writing a book like this has a dual edge sword – On the one hand I had to dig deeper into subjects that I wanted to teach in the book. So I have a much deeper understanding of subjects like caching, REST, and ActionWebService than I had before (or probably even need). On the negative side – focusing on the projects in the book meant that I had less time for personal experimentation which means that I “missed out / fell behind” on some pretty interesting developments within the Rails community this last year. Most notably was the wide adoption of Rpsec and the BDD paradigm – so once I’m past the holidays – I’m looking forward to going through the Peepcode screencasts on Rspec and setting up some test projects to experiment with that deeper.

What can Rails learn from other web frameworks?

Eldon I don’t want to stir up controversy but the only other web frameworks that I’ve built some production applications in are a few of the PHP based ones and the ones that I used really felt like they were behind the curve compared to Rails. Too much configuration, too many common problems left unsolved, etc. In some cases they just felt like they were simply trying to copy Rails rather than adopt the good pieces of Rails and forge their own path.

I haven’t had the opportunity to use Django yet but some people that I really respect are huge fans of it so I’m hoping to dabble in it a little next year—especially now that the official book is out.

Other than that now that I’m finished with the book—I’ve finally had a some more time to spend playing around with some of up and coming alternatives to Rails. This last month I experimented with things like Sinatra, Merb and DataMapper (which isn’t a framework but an alternative ORM). Of those – my favorite so far is DataMapper – even though it still has a ways to go before it would be ready for production – it’s a really fun library.

The interesting thing to me about these new ruby libraries is how reactionary they feel to Rails. There seems to be an emphasis on simplicity with how these new libraries are being developed that is kind of refreshing. Performance with these new libraries feels really good too.

How do you think the Rails community should react to these libraries?

Eldon Honestly I think that the relevant ideas will naturally make their way into Rails over time. While these other libraries have some fun and interesting ideas – I don’t see any pressure to try and push those ideas into Rails. One of the things that has always impressed me about Rails is it’s ability to glean best practices / ideas and integrate them into it’s own core in a fairly clean and natural way. Yet at the same time – Rails has maintained it’s strong opinions about what is the “right” way to develop web applications.

So in the case of DataMapper – I think it’s wonderful that we as developers can have other ORMs that we can use within our Ruby projects, but I don’t think I would ever want to see Rails adapt to support it and Active Record. Choice isn’t always good.

What plans do you have to follow up on the book? (A sequel, a blog/website, or something else?)

Eldon That is the million dollar question isn’t it? Unfortunately I don’t have a solid answer at the moment.

My days lately have been filled with doing contract work for the National Weather Service. It’s not Rails but it’s a lot of fun because of the scale of the web applications there. Those are fun problems to solve and I’ve been having a little extra fun there by introducing Ruby into a lot of the backend processing.

I’ve also been looking around to see if I could find a side Rails project that would interest me and fit with my schedule. But haven’t found anything thats really jumped out at me. I don’t have a ton of free time and I don’t want to waste it building “yet-another-social-networking site”.

I have been thinking about writing a sequel to this book as there was a lot of stuff that I didn’t get to cover in this book. I consider this book to be a beginner to intermediate book, so if I was to write another one I would want to take it up to the intermediate – advanced level. To do that I’d probably need to decrease the scope from seven to eight projects down to only two or three but dig much deeper into those few projects.

What projects did you want to include, but didn’t make the cut? Why not?

Eldon Actually, for me the bigger challenge was stopping development on each of the projects when I felt I hit that sweet spot where the project was up and running yet there was still plenty of room to customize it or take it in some fun new directions. But my goal was that if I was having all kinds of fun ideas about where to take the project next and it was hard for me to stop the project then hopefully it would also be hard for a reader to abandon it at that point as well.

That’s not to say that all of the projects I had originally wanted made it into the book. For example – I had planned on a completely different approach for a project that would utilize the new Active Resource library. The original plan was to develop three small applications (an inventory system, a simple wiki and a help desk ticketing system) that were going to be designed to communicate with each other via Active Resource. I think it could have been a fun and cool way to see how multiple applications could intercommunicate their data with each other. I even had built and written the chapters for the simple ticketing system and had some rough stuff started for the wiki and inventory systems—unfortunately I was using the needs of my employer at the time to help shape how that project should work. When I left that company – I had to make the hard decision to also abandon those projects from the book as well.

That’s too bad, it sounds like it would have made a cool set of projects.

Eldon Yeah – it was disappointing because I thought it was going to be a fun set of projects and nice that it was actually solving a problem for some people. In it’s place I put a project that used Active Resource to pull contact data from 37signals Highrise application and display those contacts on a Yahoo map. So I think that even though the projects are different the same principles were demonstrated / taught in a way that a reader could build the original projects if they desired.

What books would you recommend to get people ready for yours? What about as potential follow-ups?

Eldon I tried to write the book so that it could be a good next step for someone who has read an introduction to Rails book. As for a good introduction book I’m still a huge fan of the Agile Web Development book (even if it is a little out of date these days). The key is that the reader should have some basic familiarity with building a Rails application and some general idea of the features that Rails provides.

As for follow-up books – I’m a huge Ruby book junkie ( I think I have all the available books ) and every book will typically have at least something good in it. But for best value of learning code vs. price – I would recommend these:

Ruby Specific:

Why’s Poignant Guide to Ruby — Just such as fun book to read and a great introduction to Ruby. Free

Ruby By Example by Kevin Baird — is a fun little book which teaches Ruby principles through code examples $34

Design Patterns In Rubyby Russ Olsen — This is a new title but one that I’m really enjoying. $50

Rails Specific:

Money Train by Ben Curtis — PDF only book on building an e-commerce system. Doesn’t show you how to actually build anything but it’s filled with plenty of code examples $12

Rails Code Review by Geoffrey Grosenbach (Peepcode) — Another great PDF book that demonstrates code $9

Advanced Rails Recipes by Mike Clark — Although I’m not normally a big fan of the recipe format for programming books and this isn’t published yet, there seems to be a good amount of helpful tips in this one $40

RailsCasts Screencasts — not a book but the information in these is invaluable. FREE