Tuesday, October 03, 2006

JRuby Interview (Part 2)

Don't miss the first part of this interview.


The JVM gets a lot of flak (from the dynamic language camp) as being a static language VM and not really suited for a dynamic language. Obviously you don't believe that. Why not?

Ola: The JVM is a really great piece of engineering. Of course, not everything is great, but for most parts it has got what it takes to run dynamic languages really well. First of all, hardware gets faster and faster, and it's now practical to have a VM run a VM inside of it, which is our current approach. But most important parts can be compiled even further, through various tricks.

What have you seen/experienced that supports your point of view?

Thomas: Actually, I think the JVM was designed with only a statically-typed language in mind. That said, the JVM provides features like garbage collection which makes writing a high-level language like Ruby much easier. So, using the JVM requires some heavy lifting because the underlying machine does not innately support our language features, but the sophistication of the JVM along with what it does provide is not such a bad place to be. We could have to implement all features from scratch. As it happens some thought is being put into making the JVM even more attractive to the dynamically typed crowd like JSR292's invoke dynamic. It is unclear how close this will get to what we need, but I think it is another signal that Sun is interested in supporting other languages on the JVM.

What have you had to fight with on the JVM to implement Ruby?

Ola: For me personally, the static typing of invocation is the biggest problem. It means that right now we have to jump through hoops to support the dynamic nature of a method call in Ruby. But that seems set to go away with invoke_dynamic (the new bytecode slated for addition in Java 7).

Apart from that, stack control is the second obstacle. Being able to save a stack frame and continue it later on is what makes Continuations practical. It also allows pretty nice optimizations of closures and other things. A fully featured goto for bytecode would also be nice.

Thomas: We do take advantage of what we can on the JVM, but Ruby has features which do not jive with the JVM's definition of a class or an object. Take Ruby's open definitions as an example. In Ruby, we can add, remove, or replace pretty much anything in a class definition we want. In Java, once you define your class you are done. To implement JRuby in Java we need to create 'bags' which contain methods and attributes. We cannot just define a method in Java's class format. We have to dynamically manage this stuff one level higher than if the JVM actually had the features to support this. In Gilad Bracha's talk about JSR 292 he mentioned an idea aboutcreating a second class format (flagged with a bit) that when seen by the JVM would allow that class to be mutable with regards to methods and attributes. He even mentioned a handler for the equivalent of method_missing. If that ever happened the JRuby implementation would get a lot simpler.

Juha Komulainen wrote (on Gilad Bracha's blog post about JSR 292):

Having written a toy-implementation of Scheme on JVM, I can certainly appreciate invokedynamic, but that's really just half of the story: continuations were the real problem.
Since I wanted to support full continuations, I ended up implementing my own stack, which obviously killed performance. Furthermore, while I could call Java objects and implement Java interfaces so Java code could call back to Scheme, the continuations wouldn't work when there was Java code on the stack between the continuation point and current point of execution.

This becomes really interesting though, when you compare the JVM with Parrot which already supports native continuations and was designed specificly for dynamic languages. How do you view the tradeoffs of the JVM vs something like Parrot?

Ola: For me personally, the JVM's major point is that it is here, it is working, and it's working incredibly well in many places. Of course, something like Parrot is really good, and if it gains traction Parrot and Cardinal will be good supplements to the Ruby world. But right now we need something that works well now, and there is no JVM (that I know of) that have seen as much work as the JVM. Doing something from scratch, like Parrot, means you have lots of flexibility to include features that make implementing dynamic languages easier, but that flexibility will also make it harder for you to make the VM really fast, since the tendency will be to include as many features as possible in the VM. From that point of view, being constrained by the JVM is actually a good thing.

Charles: There are certainly newer and flashier VMs than the JVM. Some are targetted at dynamic languages, some have additional bytecodes for stack manipulation and tail calls, some are register-based rather than stack-based. Some of them may run specific dynamic languages extremely fast, and of course diversity is always going to be a good thing. The fact is, however, the JVM provides a much larger collection of features and libraries with equal or better performance. No, you can't manipulate the call stack. No, you don't have direct control over thread scheduling. No, you don't have VM-level support for dynamic invocation (yet) or tail call optimization. What you do have is a core set of features that have been examined and re-examined, optimized and re-optimized over a decade by some of the brightest folks in the industry. It supports a narrower set of features, but supports them extremely well — so well, in fact, that missing features can usually be wired together without much trouble. Any set of tools has its own tradeoffs. I can live with the tradeoffs on the JVM because I know I can trust the available features to fill any gaps. I can't say that for any other VM today.

Thomas: I am going to chime in with the boring answer of pragmatism. The JVM has been in a released state for over a decade. Parrot from what I see is far from a "1.0" release. Every time I check in on the Parrot project they have totally rewritten some portion of Parrot of the other. I think Parrot has great promise and I have little doubt that it will be a competitive VM at some point, but until that point a comparison is not really worth much effort.

What are your plans for continuations?

Charles: There's numerous tricky ways to add continuation support when you don't have control over the stack, and they're all equally nasty. We've considered a few of the options, as well as more complicated designs like making JRuby stackless and using tricks to avoid ever deepening the Java call stack. In the end, however, I fall into the camp that hasn't seen any compelling use of continuations yet. The web application use of continuations, as found in Seaside and friends, is probably the closest thing to being useful. Unfortunately while it's clever and visibly easy to use, it's much more difficult for anyone to wrap their head around than other more verbose models. I'll put myself up for attack and assert it gains you little over a state machinethat's aware of your current position in a sequence of pages, while being far more complicated to support and understand.

At RubyConf 2005, there was an entire session on continuations run by Jim Weirich and Chad Fowler. They did a masterful job of explaining how continuations work and what they're doing, but many attendees were still left scratching their heads. I'd say a single language feature that you can't fully understand in a 30-60 minute session plus workshop probably shouldn't be there in the first place. I'm sure some language mavens will disagree.

Thankfully, the continuation issue is at the moment solved for us, since Matz has declared that Ruby 2.0 will support neither continuations nor green threads. That coupled with the fact that noneof the major Ruby applications use continuations means we're simply not planning to support them right now.

Ola: I would like to chime in here, being a staunch defender of all things Lispy. Continuations can be very nice for a few simple reasons. Most of the other control structures in Ruby can be implemented incredibly elegant with continuations. All flow control primitives can actually be mathematically expressed with continuations. This is the reason it will take some time for most people getting used to it, incidentally. If half your language can be expressed in one feature, maybe that feature is qualitatively different from the other features in your language?

I see that continuations are important for language research, since it allows implementations of new control primitives quite easily. A good example of where continuations excel is in the Generator library. That's a typical implementation that can't be done in JRuby right now.

I had this issue when porting PyYAML to Ruby. PyYAML's parser is a pretty simple stack machine. To make the algorithm explicit, Kirill Simonov used the Python generator/iterator feature. This made each method extremely simple to understand. Now, a generator can be seen as the simples kind of coroutine, and coroutines are a typical special case of continuations. My port of the parser didn't use continuations, though, since it was supposed to work on JRuby. The first version did the whole parsing in one go, which obviously was quite memory intense. After that I did a complete rewrite of the parser, and ended up with a stack based table-driver parser. But both of these versions were several degrees removed from the original simple LL(1)-definition of the parser and this decreased readability very much. With continuations the original structure of the parser syntax was apparent in the code. And this is one of these cases where more power means better (and smaller) code.

I'm sad to see continuations go from Ruby, since I believe they have a place there, but on the other hand I know the pain of implementing it efficiently. (and the current Ruby continuations are _not_ efficient). So, it's probably the right decision, but not because they aren't needed. It's just not practical (pragmatic) to leave them in the language.

Continuation based web frameworks are neat, and will play a big role in the future of web. But it is not a silver bullet.

Charles: Ola's point is well-founded, and correct. Continuations provide a mechanism for implementing many other language features. And it's also true that Ruby is a language frequently used to make other languages. However the barrier to understanding and implementing continuations while simultaneously supporting other features people want out of Ruby (improved performance and native-threading, for example) outweighs their utility in the vast majority of applications. I'd go on to say that continuations probably don't fit well into the "making simple problems easy" and "make coding fun" aspects of Ruby, since they're most applicable to the hardest problems and are uncomfortably difficult to digest when compared to the rest of Ruby. Call it a sacrifice of language flexibility for language and implementation utility. I'm sad to see interesting features go away, but I also wasn't looking forward to supporting continuations on the JVM.

Which (C) libraries from the Standard Library have been the hardest to get going on JRuby?

Thomas: YAML was a library that languished with the original pure-Ruby version until Ola wrote his own implementation. It was a large undertaking which I think made the library fairly intimidating. Socket has been a challenge since it took quite a bit of tinkering by multiple people to get it working with applications like webrick. Socket's challenge is trying to emulate its lower level API's using Java. Java abstracts some things which make matching that module more difficult.

Charles: The hardest libraries to implement have been the ones for which there was no equivalent in Java. YAML, for example, had only one or two primitive implementations available for Java that didn't suit our needs. Ola did a tremendous job porting the Python YAML parser first to Ruby, and then writing a completely new pure Java version. Danny Lagrouw did a great job implementing an HTTP parser library for our Mongrel support as well...which has now enabled Mongrel 0.4 to start running Rails apps under JRuby.

Other libraries have just required wrapping existing Java libraries or functionality, such as strscan, zlib, socket. There's some disconnect between the Java interfaces and what we need to provide to Ruby code, but usually things map up pretty well.

Ola: I would probably say YAML was hardest, since it's so big. I'm actually working right now on the next version of YAML support for JRuby. Most other C extension libraries are pretty small and has been easy to port to Java. The two most important ones were YAML and ZLib in my opinion, since those where real blockers for RubyGems, and RubyGems was a very enabling application for JRuby.

You've been working on mongrel, Rake, Rails, and some other extensions. What other Ruby extensions/applications are in your plans to work on next?

Ola: I haven't really planned any further regarding more extensions and applications. There are obviously lots of fun things to take a look at, but I don't have any plans for that right now.

Thomas: We want openssl and Bigdecimal working. These fit into supporting Rails better.

Charles: There are also a few libraries remaining that could prove challenging. The openssl library, for example, is Ruby's only SSL support. It provides a fairly thin wrapper over the OpenSSL C library, so there could be challenges making Java's SSL support appear interface-compatible. There's also the bigdecimal library, which implements both BigDecimal operations and some numerical algorithms against them. Other than the basic BigDecimal support in Java, there's no first-class numerical methods support, so we'll have to implement our own.

There are also many more places we could add new extensions to great benefit. Rake is a good example. When compared to Ant, Rake is a far more attractive tool for performing builds. By writing task extensions to Rake that perform the Ant operations Java developers have come to know and love, existing Ant scripts could quickly be ported to Rakefiles with a resulting shrink in overall size and an increase in build engineer happiness. That's a perfect example of what can result when we marry Ruby's elegance with Java's capabilities. And that's exactly the sort of thing we're working on.

Charles, you've mentioned 'Rubifying' some existing Jave tools and libraries. Can you give us some examples?

Charles: A large part of our focus has been trying to fit Ruby into a Java-centric world. There are countless libraries and frameworks out there in Java-land...libraries that would be very useful for Ruby applications like Rails. However the effort required to hand-wrap those libraries in a Ruby lib is sometimes prohibitive; the set of interfaces provided in the Java code can be extensive and not particularly "Rubyish". We seek to make accessing those libraries simpler.

A good example would be EJB. Although Spring and its ilk have made inroads for service or component-based development EJB still sees wide usage. With the simplifications made to EJB3, it could see an upswing in popularity. Therefore, it would be time well-spent to make accessing EJBs from Ruby code simpler. In the case of Rails, this might come in he form of scripts to generate services based on all beans in a given JNDI tree, or create ActiveRecord-like shims on top of javax.persistence entities. To make it as Railsy as possible, there might be a generator to handle this...something like "jruby script/generate scaffold_ejb jndi://myloc/mybeans". The point is to make the Java world accessible to Ruby code in a friendly and Rubyish way.

Other projects are already working on the same issue (Rubifying Java libraries). The Ferret project has done some good work at making a Lucene like system more Ruby-like, but it has also done a lot of work to speed it up by making algorithmic improvements and rewriting the core in C — in fact Ferret is now faster than Lucene. When I asked him about interaction with JRuby, he thought the Java developers should stick with Lucene. So, how do you feel about pointing JRuby users at a slower, less Ruby-like library? And how can you help developers like David 'get on the JRuby bandwagon'?

Charles: If Ferret turns out to be faster or easier to use, I don't see why Java developers wouldn't prefer to use it. We have other projects underway or already working that have reimplemented C extensions, so Ferret would be no different in this regard. The other thing to consider is that people will be deploying "plain old" Ruby apps under JRuby that may want to use Ferret. If we want to be a complete solution, we'll need to support as many Ruby apps and libraries as possible...and that includes apps and libraries that use C extensions today. Ferret is just another Ruby library, as far as I'm concerned, and we want to be able to run it.

In the first part of the interview, you talked about using Emma to verify your code coverage. It looks like Emma is a C0 coverage tool (I could be wrong though). What do you think about the sentiment that code coverage (and especially C0 coverage) just isn't enough? How are you mitigating the danger of missing important tests/bugs because Emma tells you things are fine?

Thomas: Code coverage is just an additional tool. If I generate a code coverage report and I see a woefully under-covered area, then I know I should consider making some tests. Seeing a good code coverage report is like seeing an all green unit test run. You know even though things look rosy, that software is rarely perfect and you will have more problems to deal with. We also have users reporting problems...That goes a long way in not takes Emma's glowing reports too seriously :)

Charles: Tom's right, we don't put a lot of stock in the fact that our code coverage approaches 70%. Coverage is useful for ensuring you're not breaking that X percentage of code, but as everyone knows the hardest, sneakiest, most dangerous bugs are in the code called only 10% or 5% of the time. Some have remarked that our coverage number seems high; to me it's dangerously low. As far as we know, 30% of the code could be completely broken. Naturally we have other tools to ensure that things are mostly working, but inferring anything more from 70% than "70% of the code is tested regularly" is foolhardy. We can't make *any* assertion about the correctness of that last 30%, and 70% is really, really far from perfect.

What other Ruby projects are you guys envying, and why?

Ola: Oh. That's a hard question. I usually tend to stick my finger in every project I envy. But well... All projects coming from _why are amazing, and I try to read the code just to learn some new tricks. I'm also very fond of Mongrel. Rails is a given.

Charles: RubyInline is very cool, and if others in the Ruby world would take it seriously it could make building a JIT compiler for C Ruby very easy. Heck, once this JRuby thing is out the door I might just look into doing it myself.

MetaRuby has a lot of promise. If Ryan, Eric, and Evan can really make solid implementations of Ruby builtins and VM features, they'll have a good chance to do some great things. Even better, if we can get our core Ruby interpreter running blazing fast and build out a solid compiler, we could start swapping out our own builtin classes for those MetaRuby provides. Then we lift all boats at the same time.

Of course there's Rails. Rails has the potential to change the whole direction of Ruby, since so much community emphasis is placed upon it. If Rails includes a powerful, digestable solution for the Unicode question, for example, it may signal a change in how Unicode is perceived throughout the Ruby community. The Rails devs have a responsibility as thought leaders in the larger Ruby community, and so far they've handled that responsibility very well.

I'm also interested in other alternative implementations. There are two implementations of Ruby for .NET already in progress, and a number of bridges between the JVM or CLR and the existing Ruby interpreter. All have something to teach us. And of course there's YARV, which many in the Ruby world believe to be the ultimate answer to Ruby's ongoing performance woes. Koichi Sasada has done an impressive job so far, especially considering the near-impossible problems he's been tasked with solving. I hope in the future we can have more cross-pollenation between all our projects, to the benefit of the Ruby community at large.

If you could push one thing out of JRuby and into MRI, what would it be?

Charles: I honestly believe that MRI needs to break away from the current extension mechanism and make some tough choices about its internals. It needs a new garbage collector, a new thread scheduler or native threads, and a clear isolation between extension code and internal code. None of those things can easily happen today since so many extensions call freely into Ruby internals. I'm also still not convinced that Matz's multilingualization support in Ruby 2.0 is going to solve today's problems with Unicode. I still believe there's an inherent paradox in providing both a byte array and a character sequence in the same type, though I can't put my finger on what that paradox might be. Other additions would be improvements or enhancements of existing libraries: net/http and friends need an overhaul; rdoc's parsers could use a faster implementation (mainly because they're so slow in JRuby :); and rexml's faults are well known. I wish more folks would step up to the challenge of fixing these problems and contributing to the C implementation, as they have in the JRuby community.

My major concern, however, is the community's almost complete disregard for performance. Yes, Ruby is usually "fast enough". Yes, you can still get as much done with it. And I understand when Ruby developers say "I don't care" to the performance questions; I usually agree. But far too often when a performance-related issue comes up, the stock answer is "write it in C". This completely avoids the issue: that Ruby needs a performance boost today. It needs to run faster and scale across multiple processors. It should be able to run dynamic code at speed comparable to native compiled static code. Saying "write it in C" doesn't put high enough expectations on the core Ruby implementation, and without expectations not enough attention is paid to making Ruby performance better. I don't know about you, but I don't want to "write it in C" or "write it in Java" to get the performance I need...I want Ruby code to be blazing fast all the time. We're working to make sure that JRuby is blazing fast all the time; I hope the users of C Ruby will demand the same.

Ola: As Charles noted, there are many things that MRI could need. The thing MRI would need most, though, is probably the freedom to refactor internals. That's one of the main reasons JRuby is going so strong now. We have started work on changing the internals in great ways, that MRI never could do.

Technorati tags:

2 comments:

Steve said...

Don't give up on continuations. I hear Matz says they'll be back in Ruby 2.0 (Rite) asap.

Anonymous said...

Good idea!!!