On Ruby: Cardinal: A Behind the Curtains Look at Parrot

This week (and next), I'm taking some time away from the rubinius serial interview to let those guys get back to work. To keep everyone on their toes, I've got a couple of other bits lined up. This week, I asked Dan Sugalski, the original architect of Parrot, about a design decision that he made. Parrot might seem a bit far afield for a Ruby blog but there is the Cardinal project which is trying to build a Ruby front-end for it, so I don't think we're too far off the path.

Next week? You'll have to come back then and see.

In recent discussions in the Ruby world, the Parrot VM has taken some heat because of its register based design. I know you guys went through the trade-offs, but those discussions are lost in the mists of time. Could you fill everyone in?

Dan: There were a couple of reasons.

Firstly, I just like 'em. Yeah, I know, "taste" is a dodgy reason to do things, but when you're designing stuff like this, you need to do stuff that you're kinda fond of, and I was always comfortable working with register machines. Wrote more assembly than I care to think about for 6502 and 68000 family microprocessors. I liked them, was comfortable writing code for them by hand, so it was a good reason to design a register machine.

Second, there is a lot of literature out there for writing optimizers for register machines. All modern CPUs are register machines, and a lot of people have spent a lot of time figuring out how best to generate code for them. I knew that Parrot wouldn't be able to use it right away, but I was trying to plan for things in 2005, 2010, and 2020 when I started. (This was back in 2000)

Thirdly, register machines are generally more efficient than stack machines. Yeah you might see the bytecode look a little denser (which is certainly important) but there's an amazing amount of bookkeeping with a stack machine. There's an extra store for every opcode, which adds up. For example, storing 1 in register X is:


machine reg = regbase + x
store indirect 1

while a stack machine is:


machine reg = stackbase + 1
store indirect 1
store stackbase

Also on the efficiency front, stack operations are destructive, so something like:


x = x + 1
y = y + 1

requires putting that 1 on the stack twice — on a register machine you can reuse it, saving a full op. (This requires some intelligence on the part of the compiler, but it's actually a pretty simple optimization and easy for anyone past first year compiler construction to do.)

Fourthly, for JITting, we know we're targeting a register architecture — the CPU we're running on, which has registers. It's easy enough to do a naive stack VM -> native machine translation, but it's a stack transform, treating your register hardware as a stack machine, and that's inefficient. Doing a good JIT from a stack to register system is much tougher.

A register VM -> register machine translation is a lot easier. You can, of course, grab a register and use it as your 'vm register' base pointer, which is as naive as the stack transform above, but it's much simpler to get relatively efficient register machine code out of a register VM system, assuming some thought went into desinging the register VM. (Which it did.)

(The usual argument that "well, compilers build up a sort of stack-based intermediate representation before generating code" is interesting but bogus, since almost all the useful information in that intermediate representation is tossed away when the low-level stack ops are created.)

So that's basically it. One part taste, one part lots of free work to bootstrap on, and two parts efficiency.