Friday, September 29, 2006

New JRuby Interview (part 1)

Don't miss the second part of this interview. This was too long to keep in one piece.


We met Charles and Thomas in a previous interview, Ola would you take a moment to introduce yourself?

Ola: I'm a 24 years old systems developer working at Karolinska Institutet in Stockholm, Sweden. I have been coding Java since about '96, but I began my coding carrier with C (in the Scandinavian demo scene). I'm an out-of-the-closet Lisp-lover who tends to find Ruby more pragmatic for most tasks. I got interested in JRuby in the fall of 2005, but didn't begin to pitch in until springtime, when I finally got some free time to do it in. My vision about programming is a platform where you can use the right tools for a task, whatever the tool or task may be. Programmer time is too valuable to emulate closures in Java, hack objects in Perl, search for macros in Ruby or Recursion in classic Fortran.

And how did you discover Ruby?

Ola: I don't remember exactly when or where, but I do know it was when I read Pragmatic Programmer. After that I realized how fed up I was with Java and ASP (which was the major technologies at my workplace at that point). I found the first edition Pick-Axe online, and the rest is history.

Okay, this one's for all of you. How much do you use Ruby in your day to day work and hacking? (How much of that is actually done with JRuby?)

Ola: At this point, after much lobbying, my work is maybe 40% Ruby and the rest Java. I'm trying to increase that figure of course. But on the other hand, I do a fair amount of hacking at my free time, and I tend to prefer Ruby for most things except when I'm working on some of my pet projects involved with getting better languages to the JVM.

My use for JRuby at this moment is mostly confined to making my life as a Java dev easier. I have a jirb session open constantly and try out different paths through the Java API. Interestingly, the one part I most often find myself trying is Java's Regular Expressions library.

I am planning on getting the first Rant/Jake-tasks out the door soon (Rake tasks for Java development), and when I do I'm planning on switching wholesale to JRuby and Rake for my builds.

Thomas: This question may be worth asking again in a few months since I just started working at Sun. At my previous job, I did not get much chance to use JRuby. I did plenty of Java and occasionally I would write a script here or there in JRuby to access/exercise that Java code. I also created small scripts to access databases using JDBC. I also have not had many opportunities to do pure Ruby programming. Between a day job and JRuby hacking after work I did not have much time for pure Ruby development (other than simple data manipulation scripts here and there). I am hoping that changes more now that my day job is JRuby.

Charles: It's a sad truth about working on the JRuby project that up to now we've not really had much chance to write Ruby code. Working deep in the bowels of the interpreter keeps us pretty far removed from day-to-day Ruby work, and until coming to Sun I was a straight-up Java EE architect. I had started to sneak Ruby into the system, however, handling a Bugzilla/CVS/ClearQuest integration layer and starting to write administrative and service management scripts. I'd also been advocating Ruby and Rails to the folks on my team with a little success.

I expect that the Sun move is going to give us more time to focus on Ruby-land tools and code. In particular, there's a lot of work to be done to integrate Ruby apps — including Rails — into a Java world. There are plugins, scripts, and generators to be written, integration scenarios to be tested, and Java-land tools to "Rubify". Now that we can concentrate on JRuby full-time, there should be hours left over to climb outside of the interpreter internals more.

Tom, I'll make a note ... we'll have to try to work in another interview in early 2007. Let's get back to JRuby today. Charles has written that you're starting into some performance work on JRuby. How's it going? What's the process you're using? How are you measuring change?

Charles: Performance work has been a constant throughout my time on the JRuby project. Ruby itself is not known for its blazing speed, and since JRuby was originally just a port of the C code, it suffered from that implementation's issues as well as a number of new ones introduced during the translation. It scored at the bottom of the pack on almost all JVM scripting-language benchmarks and was visibly slow in interactive mode. On top of this, Ruby is a fairly complicated language to implement, and we have never had a bytecode compiler.

In past months, however, we've taken a more radical approach to improving performance. Where the original focus was on correctness above all else, we're now starting to make performance a top priority. As it stands today, JRuby is more correct than it ever has been, and most Ruby scripts now "just run" without modification. Rails is working better every day and odd interpreter failures are fewer and farther between. Our confidence in JRuby working correctly has reached a point where we can start making larger changes to the core interpreter; because we've built a substantial library of tests for core interpreter features, we're in better shape to refactor for performance. Where we used to try to mirror the C code exactly, we now have a much better understanding of what the C code *intends* to do. Because Java code is easily refactored and redesigned, we can take that intent and create more Java-friendly code. Each little step helps performance a bit more, and the current trunk code is already 25% faster than the previous release...while correctness continues to improve.

We're also starting to look at the compiler issue more closely. There are many different levels of compilation we could attempt, and we're going to pull in resources and help from Sun to give us a better idea how to tackle the problem. I've also worked on and off with a partial compiler that's shown 50-75% performance improvements. There's a lot more work to do, but there's now a lot more time in which to do it.

Ola: As Charles reported, progress is good. Really good, actually. I'm tackling performance from my side, by creating extensions in Java instead of Ruby (YAML and ZLib, for example). I'm aiming to continue down this road, taking a look at the Java-integration and see if that can be made faster by porting it to Ruby. I have also taken a sharp look at different parts of the parser (and especially the lexer), to see if we can speed that part of the process up. As of now, there are probably two ways ahead; either switch to lexing bytes instead of chars, or port the lexer into an auto-generated one. I have done a few tests with JFlex and the performance gains seems substantial, but not as good as just lexing on bytes. I usually measure performance pretty ad-hoc. I first find the a specific use case that exercises whatever I want to improve performance on, and then make small changes and see what gives. If the test case is Ruby, I prefer to use the benchmark library, otherwise I write a test case in Java with System.currentTimeMillis() or just use the UNIX time-command.

Thomas: You have probably seen several entries on Charles blog about various performance patches + experiments. All in all JRuby keeps speeding up. We know we still have more room to speed things up because we know we are creating too many stacks/lists during execution. Some amount of this data being managed dynamically represent static relationships. Scoping relationships are one example. Another area for exploitation is our method dispatch call chain. Years of ideas has made it longer and more complex than it should be. We also use reflection for dispatch which adds more time yet. Going over these areas will speed things up quite a bit. All of this is ignoring Charles compiler work which is pretty exciting in itself.

Up until the last 4-5 months the focus has mostly been correctness over performance. We found it more important to have a Ruby implementation capable of running real software than some screaming fast interpreter that is incapable of running anything. Had we been the original project authors of JRuby (none of the original authors are currently part of the project) maybe this goal would have been different. In any case we are where we are and we are now focusing more aggressively on performance.

Your answers make me wonder how much of your performance work might be be portable back into the C Ruby implementation.

Charles: I truly believe that most of the changes and optimizations we're making could be done to C Ruby as well...but only to a certain extent. Much of what we're changing will alter the way methods are invoked, the way variable scope is managed, and the way we utilize the JVM's systems for object, thread, and IO management. While we're keeping the Ruby side of things as correct as possible (and really trying to continue improving correctness as we go) we're mercilessly refactoring and breaking apart subsystems in the JRuby world. This is a luxury we have...since almost all code that runs on JRuby is part of JRuby proper or wholely isolated from it, we can basically rewrite and redesign much of JRuby's internals without breaking third-party code in hard-to-fix ways. Where Matz and others have emphasized keeping C extensions working and interpreter internals consistent, we have no such restriction. So although these optimizations could certainly be applied to Ruby proper, it would likely be at the cost of backward compatibility.

We also have one other advantage in the Java world that Ruby lacks: Java's extensive refactoring capabilities. In the Java world, if we want to rename a method or extract an interface, we can do so at the touch of a button. Such refactoring techniques have allowed us to pull out whole subsystems and reimplement them without breaking any other code in the process. Most Java IDEs provide support for continuous compilation, and many keep a dynamic in-memory model of all project code. Naysayers call the large size and complexity of an Eclipse or a NetBeans bloat, but they fail to realize the twisted thrill to be had from renaming a method with thousands of invocations or migrating an entire subsystem from direct calls to a factory/interface pattern. None of this is easily possible in the C world. Ruby doesn't have an enormous codebase — it's certainly not more than one or two people could understand completely — but the level of footwork required for even simple refactorings is prohibitive. When you combine this with the fact that many or most C extensions call core (potentially internal) Ruby APIs directly, you can start to understand why Matz intends for Ruby 2.0 to break as much as possible. It's simply much more difficult to evolve a C-based project over time, especially one as easy to extend and with as little isolation as Ruby.

(Tom and Ola...chime in if I'm saying anything that's incorrect)

Thomas: I suspect that the C team could use some stuff we are doing. I am not sure I would spend that time right now. Once we are closer to 1.0 status I think our internals will be quite a bit simpler and this may be a good time to look at how we are doing things. At a minimum, it may generate a fresh set of ideas for them. The sharing of ideas is always good for creating a better piece of software.

On the other hand we delegate as much as we can to the JVM. So, for example, we do not need to worry about garbage collection at all. All the memory management stuff C programmers need to manage is no work for us. Our designs may take features like this for granted and by extension may not be easily ported to C.

Ola: Most of the changes we are doing to the implementation makes better user of Javas strong points. This means that changes from our side won't necessarily be good from a MRI [Matz's Ruby Interpreter] point. I do hope that the YARV people will pay attention when we start looking at compilation in earnest. The most important points for all implementations will actually be our drive to get some kind of functional spec for Ruby. For example, the Marshalling code could probably be cleaned up if we just had a spec of it. And we aim to create one, if we can.

Okay Charles, since you brought up refactoring tools — you and Thomas, are supposed to be looking at programmers tools (which most people read as NetBeans). What do you think has been holding back refactoring tools for Ruby? More importantly, what can/will you be doing about it? (Ola, I'd love to hear your thoughts on this too.)

Charles: Refactoring a dynamic-typed language is inherently hard for reasons that are pretty easy to understand. If you can't look at a piece of code and know exactly what type a given variable is, an IDE won't be able to either. Even if you can do so with some difficulty, IDEs are out of luck. This makes the most interesting and useful refactorings monumentally more difficult to implement. If you can't know what type a variable is or what type of object it contains, you can't tell until runtime what operations it actually supports. Ruby's fluid typing and support for metaprogramming complicates things further: sometimes even the list of operations on a known type will change during a given run. The more dynamic and fluid the type system, the more difficult it is to write ahead-of-time refactoring and analysis tools. There's research going on and creative potential solutions in the making, but there's still a long way to go.

I like to think of dynlang refactoring as being in the same class of problems as AI for the game of Go. In Go, the spatial relationships on the board and the broad array of possibilities make long-term strategy much more difficult. For a computer, this causes no end of problems. In general, the best game AIs are the ones that can brute-force a problem by trying more combinations than a human. In Go, where there's such variation, that method quickly becomes intractable. Where humans can infer future results from the current position and get a "feel" for where a game is progressing, computers can only make use of hard, tangible information about the game. When we develop in dynamic languages, we're much better than the computer at feeling our way through the code, inferring types based on parameter names or methods we've seen called, or reading backwards to get a more concrete feel for incoming object types. Perhaps it's why dynlang programming starts to feel so right...it's putting trust and control back in our hands.

You might say that dyntyped languages trust the programmer to do the right thing where static typed languages force the programmer to do the right thing. That trust opens up a world of possibilities, but it also takes some responsibility out of the computer's hands — namely, responsibility for exactly that information that's needed to refactor.

Thomas: I have not had much personal experience with the refactoring issues of Ruby so my thoughts should be taken with a grain of salt. For a more weighty discussion I recommend talking to Jason Morrison or Mirko Stocker. They both have been working towards type-inference/refactoring.

One philosophical barrier I see is that computer scientists like 100% solutions. Refactoring in Ruby is probably always going to be a 90% solution. It's open definitions and detecting callers,etc will only be accurate if you are actually running the code. Which is something you really don't want your IDE/editor to do. I think being only a 90% problem makes it a less desirable or more frustrating problem to tackle.

We have worked well with the RDT team in the past and we hope to continue in that vein. In fact, we really want to help support all IDE/Tools makers in any way that makes sense. Originally, for RDT we added better positioning information in our AST, so that constructs could be highlighted/folded. Next we plan on merging the refactoring changes that Mirko, Thomas, and Lukas did during this summer.

Ola: Refactoring is a great subject, and I really believe Ruby could do incredibly much. I believe this since Ruby isn't really more dynamic than Smalltalk, and the best Refactoring tools have traditionally been developed for Smalltalk. The Refactoring browser was a great tool that I would really see someone port to Ruby. The big difference between Smalltalk and Ruby lies in the fact that Ruby code is file-based, while Smalltalk always has been memory-based. This is a caveat, but not a big one. If I had the time I would love to write an implementation of it for Ruby. (And of course, this would be for Emacs in the first place. =)

Charles: Ola's point about Smalltalk has a lot of weight as well. Smalltalk's refactoring browsers are fully-capable refactoring environments, and they did refactoring before it was even fashionable to do so. However they line up better with Ruby's interactive shell IRB. IRB is an essential tool for any Ruby programmer in the same way that Smalltalk's browser was an essential tool for Smalltalkers. IRB provides a "live" environment where you can query objects, call methods, and inspect types. This could certainly be extended out to an IDE scale, but there's a new class of problems when you go that direction. You need to ensure that you're not causing side-effects as you traverse code. You need to know when metaprogramming bits are coming into play. Perhaps most importantly, you need to be able to roundtrip an in-memory picture of the system back to code. These features are all absent from Ruby and IRB today, so they represent challenges to building a full-featured refactoring Ruby browser. Even then, Ruby's ability to rewrite itself through a wide variety of eval calls makes refactoring with 100% confidence nearly impossible. I believe that some combination of static analysis and online refactoring will be necessary to sufficiently solve the refactoring question.

Are you guys looking at any tools (like FlawFinder) to verify code correctness? MRI is part of the coverity open source scanning (although it doesn't seem to be gaining much traction among developers). Are there other kinds of automated testing, verification, whatever from the Java world that you're using to make JRuby better, faster, stronger?

Ola: Well, this isn't really my domain, but I do know that we have a large amount of test-cases written with a mix of miniunit for Ruby and JUnit for Java. We also use Emma to check test coverage.

Charles: I don't really know what other tools out there might help us, so we're open to suggestions. We've got a lot of known issues to clean up in the code before we start running tools, however; we know there's a lot of cruft and we're actively scrubbing right now.

Thomas: We also want to set up cruise control. Right now cruise control has some issues running though a web proxy (or something like that) that we need to work through. After that is settled we will have a nice generated report and a set of tests to help perform various tasks.

Has Sun's hire of Charles and Thomas (and the publicity around it) increased interest in (and contributions to) JRuby around the 'Net?

Thomas: The hiring sparked a lot of blogging around the net. Some of this translated to new people poking their head on the mailing list. I would speculate that this announcement also helped JRuby/Ruby gain more visibility in the larger Java community. I think most Java developers have heard of Ruby and especially Ruby on Rails. This announcement may influence them to do more than wonder what the big deal is.

I also hope that beyond JRuby/Ruby interest itself, the announcement has also sparked interest in additional languages on the JVM in general. I think Sun hiring us is a strong message that Sun has an interest in fostering additional languages on the JVM. Java is not a one size fits all language and other languages exist that can solve some problems more elegantly than Java.

Charles: I've read just about every blog post that's floated by since the hiring, and they're almost all overwhelmingly positive. Many people commented that they didn't know JRuby existed or had thought it wasn't under development anymore, but that they were very excited to see a major corporation throw some weight behind Ruby. The impact of the hiring was felt throughout the relevant parts of the blogosphere.

Two weeks later I delivered a presentation at RailsConf Europe called "JRuby on Rails", and by then just about everyone knew about the hiring and offered congratulations and best wishes for the future of JRuby. Almost all initial concerns folks had have been put to rest: we're not being shuffled off to NetBeans or other work immediately, we're not going to create JRuby 1.0 Special Edition 4 Release 8 that's incompatible with MRI, and we're not planning to push Sun toward Ruby to the exclusion of all other dynlangs. People are now getting past the little initial FUD and really seeing the vast potential of both Ruby and Rails running in the JVM alongside existing Java code. Contributions have started to tick upwards as we funnel people toward the mailing lists and bug repository, and I expect them to continue climbing as more folks get involved.

Are you seeing interest from internal Sun resources?

Charles: Folks from Sun have been extremely interested in helping, to such a point that we've had to say "we'll get back to you" in a number of cases. Individual contributors keep coming out of the woodwork and most of the major Java engineering teams have stepped up to offer whatever help they can. In a sense we just got lucky; Tom and I are co-leads of "just another Java interpreter" but that interpreter happens to be for the most buzz-worthy language today at exactly the time Sun has decided to make good on their "multilingual JVM" promises. Many folks internal to Sun have known for years that dynlangs are the wave of the future, and there's a lot of excitement internally about the future of Ruby and other languages on the JVM. Now those folks are finally seeing serious progress on not-invented-here language support, and everyone's got ideas on how to put Ruby to work. I'm very pleased at the response; I expected some foot-dragging and sour grapes, but the level of internal excitement has exceeded my expectations.

Thomas: Even at JavaOne, this spring, we had several Sun employees approach us to say how much they like Ruby. Since getting hired we have talked to even more Sun people who like Ruby. So internally at Sun, there is plenty of interest in JRuby/Ruby. We even have offers of help. I think this help will be a huge positive over time versus going it alone (e.g. Not being hired by Sun).

How much effort have you put in to making JRuby work with other Java implementations (IBM's, Kaffe (is this one even still viable), GCJ, etc.)? How will Sun's involvement help or hinder this?

Ola: Well, we try to avoid using Sun-specific libraries, and the code is straight Java 1.4, so in theory all 1.4-compliant JRE's should be able to run it. From what I've heard, GCJ seems to work fine but I haven't tried it.

Charles: We've stuck to Java 1.4 and written plain old Java code for all our changes. JRuby was originally written in 2001, so there's some old-time lineage there as well. Others have tried it out under GCJ and had no major issues, and Tom and I now uses Java 5 almost exclusively for development. The ongoing FUD about Java code not being forward-compatible, really cross-platform, or portable to non-Sun JVMs is really a bunch of nonsense. In the ten years I've worked with Java, it's been trivial to update to a new release or to run across all platforms and implementations. Because we don't do anything to specifically tie JRuby to one platform, it should work fine almost anywhere. That said, we don't go out of our way to test it under every Java VM under the sun, so we're looking to the community for that.

What plans do you have for JRuby over the next six months?

Thomas: It is difficult to know how much progress we will make, but I will make some guesses. We will be running Rails decently soon. Our Marshalling is subpar right now and we still have some smaller bugs. On top of just running Rails we want to help add value to Rails in a J2EE environment. This could mean running many Rails instances and sharing an app servers datasource to the database. Many interesting options.

Java integration is another area that will see some changes. For example, today you cannot extend a Java class from within Ruby and have Java see the methods that are overridden in Ruby. Kresten Krab Thorup has a patch that will help solve this missing feature in JRuby. Recently, we started discussing the semantic gap we have in regards to Java interfaces. Currently, we can only implement a single interface. We plan on changing the syntax to support implementing multiple interfaces.

People have been asking for more work on the AST so projects like RDT can expand their efforts.

Performance of course!

Charles:

It's hard to say how full-time work will affect schedules, but I think it's safe to say you'll see a JRuby 1.0 in the next six months, but we haven't started defining what "1.0" should actually be. We're probably almost there on correctness but we want performance to be greatly improved. We want to claim "full support" for Rails and we want solid integration with Java EE and other frameworks. There may also be a renewed investigation of Ruby grammar and parser work, both to mitigate our own parser woes and to provide a better understanding of Ruby syntax for community projects at large. I'll guarantee at least one major milestone before Javapolis in December, but it's hard to say what milestone it will be. By JavaOne next year we should have JRuby in really good shape.

Ola: Well, Charles and Tom are in charge, of course. But I have my specific needs and wants. I see that right now is the time to start getting JRuby-libraries and applications working. I see Mongrel as a big milestone and I'm also planning on taking charge of the Rant/Jake-ideas. AR-JDBC needs some love too, but that isn't as desperate right now. In 6 months, I would like to see JRuby 1.0, with 30-40% better performance compared to 0.9.0, complete YAML as Java extension, usable Rant-tasks for Java development, Mongrel working "fast enough", and the ability to work with Rails in development mode without too much pain. I would also like the start-up time to be about 100% faster.

Technorati tags:

No comments: