Monday, January 15, 2007

Serial rubinius Interview: Episode V

This week's episode hits three points; Evan's next steps with rubinius, garnet (the new name for cuby), and Software Transactional Memory (STM). MenTaLguY joins us for the third discussion, and provides a long list of resources to help get up to speed. Happy reading, and happier hacking!

Evan, after hiding out in your secret lair to hack on the newly released GC, what's next on your coding schedule?

Evan: Well, I've just committed some error reporting code. The code is invoked when rubinius crashes and prints out a C backtrace, ruby backtrace, and part of the rubinius stack.

I've got a few tasks that I see as coming up next:

  1. Finish up RNI (or whatever the name will be). It's the VM interface to external C functions. This includes a shim library which has the same API as the 1.8 C interface, so current C extensions will work under rubinius with little or no change.
  2. Work to stabilize the compiler running under shotgun. Currently, lots of people are running into problems with it. The problems lie in the implementation of the core methods that the compiler uses, so fixing the problem would in fact be ironing out the core and kernel of the VM.
  3. Add method lookup caching. This probably would come in a few different forms.
    • A global cache, like the one in 1.8. This is easiest to implement.
    • A per class cache. This cache is faster than a global cache, but requires more complicated logic to support flushing the cache (an important operation)
    • Inline cache. The fastest, but again, the most difficult to implement. This is the caching that the fastest smalltalk implementations use.
  4. Tail-call recursion optimization. After a discussion on #rubinius, I realized how easy it will be to integrate tail-call optimization to the compiler and VM.

Most likely, I'll finish RNI first. I prefer to finish a major task before going on to another task.

Wilson, Evan tells me you're behind the cuby -> garnet name change. What's up with that? And, more importantly, are we going to see any more additions to garnet (say, a tutorial, or some docs) in the next couple of weeks?

Wilson: I have spoken to a (large) number of people about Rubinius since the interviews began. When Cuby comes up, almost everyone says "What? How do you pronounce that?" or similar. Garnet came to mind while trying to think of things 'lower level' than Ruby. Hopefully I can stop explaining how to pronounce it now.

A tutorial will definitely be forthcoming. I'm hoping to have one written up in the next week or so, partly to teach myself what it is good for. Heh.

Evan: Garnet docs will begin to appear soon I hope, as soon as I've gotten the GC stabilized and I begin to actually use Garnet.

Recently, MenTaLguY has been hanging out in #rubinius and talking a lot about STM. What is it, and how does it fit into your plans for Rubinius?

MenTaLguY: STM stands for "Software Transactional Memory". Operations on shared data are grouped into atomic transactions which either succeed or fail as a whole. While one thread is performing a transaction, none of its changes will be made visible to other threads until the transaction is complete. If there is an error during the transaction, or if the transaction ends up conflict with others, its changes are discarded and an appropriate error reported -- at which point the offending thread is free to try again, with the other threads' updates taken into account.

A common misconception about STM is that it doesn't involve locks at all. In reality, as with database transactions, some implementations use locks directly to implement transactions, while others do not. The important thing is that you, the user of the STM API, don't need to deal with locks directly. It's a bit like what automatic garbage collection does for memory management, making it easier to write composable abstractions.

It's worth noting that there are a number of other models competing with STM, including dataflow concurrency (lazy evaluation and futures), actors (a la Erlang), and various realizations of the join calculus (per funnel, Join, etc.), but I think STM and the join calculus are recently the most promising. STM is interesting in that, while it's been an active research topic for quite a while, it didn't really "break through" until 2005, when the paper "Composable Memory Transactions" (Harris, et al) introduced key primitives for blocking (retry) and choice (orElse). [Editor's note: See below for a great list of links about this stuff.]

My personal plans for Rubinius are really pretty modest; I heard Evan and Wilson were considering STM for use in Rubinius, and I volunteered my existing STM-in-Ruby implementation in the hope that it would save them some work. Time permitting, I'd also like to implement the other concurrency models, whether atop STM or in some other way.

Evan: STM is going to be used in rubinius to hopefully make it easier to write multi-threaded code. My hope is that the core libraries will use STM so that out of the box they work properly when used multithreaded. And btw, when I say multithreaded, I mean both green and native threads.

Wilson: MenTaLGuY has covered the options in great detail, and I'm glad to have him on board. I intend to leech him of of concurrency knowledge like some kind of psychic vampire.

I'm a big fan of STM for two main reasons:

  1. It lets you write concurrent library code without worrying that user code will break it. This is a huge problem for fine-grained threading.
  2. The major CPU developers are planning to implement it in hardware in the coming years. That should offer a nice speed boost, and offer yet another excuse to spend money on computer hardware.

MenTaLguY: Hopefully we'll see hardware support for sub-transactions, as that's what allows interesting stuff like 'retry' and 'orElse' to work, and hopefully the OS APIs won't bind transactions to hardware threads (maybe someone needs to get on them about that once such APIs begin to be developed).

MenTaLguY was also kind enough to provide a list of links if you want to read more about STM and its alternative:

This episode is brought to you by: Garbage Collection: Algorithms for Automatic Dynamic Memory Management. .

Previous episodes of the Serial Rubinius Interview are available here:

  • Episode 1, in which we talk about the rubinius community
  • Episode 2, in which we talk about cuby and testing tools
  • Episode 3, in which we talk about rubinius documentation
  • Episode 4, in which we talk about cooperation with the JRuby team and YARV

  • Episode 1, in which Charles, Thomas, and Ola talk about their plans for JRuby.
  • Episode 2, in which Charles, Thomas, and Ola talk about cooperation with the rubinius team and YARV.

1 comment:

Anonymous said...

One minor clarification: sub-transactions aren't required for 'retry' by itself, but you'll need them once you bring choice ('orElse') into the picture.