Sunday, May 31, 2009

Code Reading follow-up

I'm glad to see that my Questions Five Ways posts are starting to draw comments and that the discussions are starting to get some legs of there own. Part of a comment on the Code Reading post really stood out to me:

"Your post also prompted me to think about having group code reading sessions, where we can pick up a small open source project or a part of a larger project and then try to understand it in a group. Perhaps like a 'code reading dojo'." -Parag Shah

I've been thinking about something similar. Over on a post about code reading on the Ruby Best Practices blog, I wrote:

"You know what would be fun? A 'code of the week' reading club. I'd love to have a formal excuse to dive into code, learn what others are seeing in it, and get exposed to a wider range of code since I wouldn't be the only one deciding what to read."

So, what do you think? Would code reading be better/more fun/more useful as a group effort? Would you rather participate in person, or on a mailing list/blog?

Saturday, May 30, 2009

Books, Bits, or both?

"Heh, I've been working thru the PDF for a while. Now that the book is here, I'm digging a little deeper."

Jim Weirich

This is a pattern I've seen in myself too. I love ebooks because I can carry a bunch of them around on my laptop, search them easily, and grab the occasional snippet from them.

On the other hand, I find myself not reading them as seriously as 'real' books. There's something about turning pages that keeps me involved. I'm just glad that most publishers are selling paper and ebook combos.

What do you think? Paper, bits, or both?


This post is part of a collection of articles about Publishing, Growing Markets, and Books.

Wednesday, May 27, 2009

Questions Five Ways - Static Code Analysis and Testing

This week in Questions Five Ways I've assembled a group of testers and tool builders to talk about code analysis and code testing. Kevin Rutherford (@kevinrutherford) is the co-author of the upcoming Ruby Refactoring Workbook and the creator of reek. Andy Lester (@petdance) is a longtime proponent of testing in the Perl community. Kay Johansen (@utahkay) is an agile testing guru. Russ Olsen (@russolsen) is the author of Design Patterns in Ruby. I hope you enjoy the discussion, but please take a moment to add your thoughts in the comments.


What is the right interplay between testing code (unit test and the like) and static code analysis (lint, Reek, and their ilk)?

Kevin Tools such as Reek[1], lint, flay are intended to be helpers for the "refactor" step of TDD. Not everyone is good at finding the smells in their code, and I certainly find that I detect fewer smells in code that I'm familiar with; so having a tool to help my sense of smell can be quite beneficial.

But there are a couple of clear problems with using tools here, and so I'll start off our discussion by trying to phrase them clearly enough for group discussion...

Firstly, tools such as Reek attempt to make a subjective concept (code maintainability) into an objective one. In any codebase there will be code smells we're happy to tolerate — at least in the short term — and yet the tool will continue to nag us about them. (This is why Reek provides a config mechanism that allow certain code elements to be ignored.) On the plus side, Reek sees smells that I miss; and fixing them usually improves my code. But on the minus side, Reek can be more pedantic than I would be, and sometimes I have to spend time doing refactoring just to keep Reek happy. So, is it worth the effort to keep code quality tools quiet?

Andy Not everyone is familiar with these tools, in the Perl world we use perlcritic. Reek and perlcritic are very similar, and yet so very different. Here's what perlcritic tells me about some code (PBP stands for Perl Best Practices):


alester:~/ack : make critic
perlcritic -1 -q -profile perlcriticrc ack-base Ack.pm Repository.pm
Ack.pm: Use character classes for literal metachars instead of escapes at line 919, column 15.  See page 247 of PBP.  (Severity: 1)
Ack.pm: Null statement (stray semicolon) at line 328, column 14.  (no explanation).  (Severity: 3)
Ack.pm: Subroutine with high complexity score (31) at line 183, column 1.  Consider refactoring.  (Severity: 3)
Repository.pm: Subroutine does not end with "return" at line 21, column 1.  See page 197 of PBP.  (Severity: 4)
Repository.pm: Ambiguously named subroutine "close" at line 45, column 1.  See page 48 of PBP.  (Severity: 3)
Repository.pm: Subroutine name is a homonym for builtin function at line 45, column 1.  See page 177 of PBP.  (Severity: 4)

Kevin Good point Andy. Reek's output looks like this:


"samples/optparse.rb" -- 116 warnings:
OptionParser has at least 59 methods (Large Class)
OptionParser#CompletingHash#match/block/block is nested (Nested Iterators)
OptionParser#Completion::complete calls candidates.size multiple times (Duplication)
OptionParser#Completion::complete calls k.id2name multiple times (Duplication)
OptionParser#Completion::complete has approx 23 statements (Long Method)
OptionParser#Completion::complete has the variable name 'k' (Uncommunicative Name)
OptionParser#Completion::complete has the variable name 'v' (Uncommunicative Name)
OptionParser#Completion::complete refers to candidates more than self (Feature Envy)
OptionParser#Switch#RequiredArgument#parse is controlled by argument arg (Control Couple)
OptionParser#Switch#initialize has 7 parameters (Long Parameter List)
...

Pat Let's go back to Kevin's question, it's a tough one. I like seeing all of the warnings from a tool like Reek. It makes me question how or why I've written something. On the other hand, having seen it once and having made a decision about it, continuing to see it becomes annoying. I hadn't known about configuring it to ignore code elements, thanks for pointing it out. Maybe it would be less annoying if tools like this that kept some metadata and separated new warnings from the ones you've seen before.

Andy It can be harder to manage the results of static analysis, because they're rarely binary, unlike unit tests which are pass/fail. With static analysis results, I like to track trends over time, since I rarely ever have a clean run of Perl::Critic or splint.

Static analysis tools are also hard to manage becasue their results can often lead to arguments, if only with yourself! Which automated exhortations do we need to follow, and which can we ignore? Is it OK to have this long loop in just this one case? And when you decide to ignore a setting, do you annotate the source code? Source code annotations are subject to the same bit rot as source code itself.

All that said, static analysis is very different with C, where lint, splint and gcc warnings may tell you about actual bugs where memory may get corrupted, not just stylistic improvements.

Russ The thing that you have to keep in mind is that the code analysis tools are optional helpers: They look into your code and tell you things that might be wrong. They might help, they might not and you can take them or leave them. Tests, on the other hand, are life and death: If you don't have tests or good tests or if your code isn't passing the tests that you do have, then you have no idea whether your system works or not. This is true no matter if you are using Java or C#, but it is particularly true when you are working with dynamically typed languages like Ruby.

Having said that, I think that one good thing that static analysis of code can do that is a real help is to enforce a uniform style on your code, so that everyone is coding more or less the same way. In the long term that has real value.

There is also an organizational aspect to this issue. I've worked for a lot of big, bureaucratic organizations, and the mindset that develops in those kind of situations is that anything that is not required should be forbidden. So you tend to get mandates: Your code shall have test coverage; You shall run this or that code analysis tool. Now it's hard to argue with requirements for test coverage, but mandating the use of this or that secondary tool almost always becomes counter productive. Of course, the fault here lies not in our tools but in ourselves - or at least in our managers - but it is a factor anyway.

Andy Static analysis is the fun part that lets you refactor. It's the testing code that lets you refactor with confidence.

Kay I find that code written "test-driven" comes out pretty clean. However, I don't always live up to my discipline goals, and of course there's always legacy code to deal with. So I find static analysis tools very helpful in identifying "cleanup" areas.

Kevin My second problem concerns process. I want to remember to run Reek frequently, because whenever I let it slide for a while I do find that my code deteriorates. And so I create a test that runs Reek and fails if there are any smells. But now the existence of this test pushes Reek into the 'Green' step of TDD, and that breaks the TDD micro-process. Using the tool within a test pushes refactoring upstream; no longer can I "do the simplest thing" to get to green, because there's a failing test requiring me to refactor right now. What to do?

Pat It almost seems like we need a different tool here. Testing belongs on the Red/Green border, while analysis belongs on the Green/Refactor border. (Maybe we can talk about Cucumber[2] and similar tools being the Refactor/Red border in other post.) What would happen if there were an umbrella tool that kept us in the specify-test-analylize groove?

Andy What do you mean when you talk about the Red/Green border?

Pat Well, when I think of Red/Green/Refactor cycle, I see testing as the process that helps us move from "Red" to "Green" by letting us know what code to write and that it makes our existing tests pass. Cucumber is similar in that it lets us know which tests to write so that we can move from refactoring to to a red state before we start writing new code.

Kay I prefer to run the static analysis tools on the continuous integration server, freeing my coding time up to focus on the test-code-refactor microcycle. Although to keep me on track, I'll often run a code coverage on my new code before I check in.

I think consistent style and formatting is important too. I do a lot of Java programming, where IntelliJ does a good job of keeping formatting consistent across the project.

1 You can read more about reek in my interview with Kevin.

2 Ben Mabey gave a great presentation on cucumber at the 2009 MountainWest RubyConf.

Click here to Tweet this article

Thursday, May 21, 2009

Help the One-Click Installer.

Luis Lavena has been doing a huge amount of pretty thankless work on the One-Click Ruby Installer for windows. But now he's stuck.

Luis is a hacker and, by his own admission, not a designer. The project needs a new home and some new web features. (You can get the details at Luis' call for help.)

If you'd like to help make a difference in the project, go make a donation at pledgie

Click here to lend your support to: Help One-Click Ruby Installer to get a New Home! and make a donation at www.pledgie.com !

Wednesday, May 20, 2009

Questions Five Ways - Code Reading



Welcome InformIT readers!  You might also like to see my review of Eloquent Ruby  Enjoy!

When I came up with the idea of Questions Five Ways, I hoped to create some discussions of lasting value to programmers, posts that would stimulate some meaty discussion. I think this post has fulfilled my vision about as well as anything I could expect.
I owe a big 'thank you' to the participants in this discussion: Steve Yegge, James Gray (@JEG2), Rick DeNatale (@RickDeNatale), Diomidis Spinellis (@DSpinellis), and Don Stewart (@donsbot).
As good as the discussion below is though, it's merely a jumping off point. What do you think about code reading? What code have you read (or are you reading) to become a better programmer? Update: Gregory Brown has posted a great article on code reading.

Several years ago, I worked for someone who would occasionally hand me print outs of part of the company's code base. "Here, read this," he'd say. "I think you'll really learn something from it." Now, I wish I'd taken more time to sit down with him to really learn the art of code reading. Why is code reading so important to a programmer, and how can we become better at it?
Rick Well, I think that the art/craft of programming is much like the art/craft of writing prose or poetry. Just as an author gets better by reading the literature, so does a programmer.
And I think that it's important to read a wide variety of "the literature" not only in your own main programming language but in others as well. I've been a bit of a language polyglot for many years, back to my days in college when I devoured any new language I could get my hands on and play with, from Fortran, APL, PL/I, Lisp, even a little bit of COBOL ... whether it was part of a formal course or not. I've kept this attitude for decades now, through C, Pascal, Smalltalk, Java, and Ruby, and probably other languages I'm failing to remember right now.
So by looking at such a wide range of programs and languages, I've learned a lot. Developed a sense of good style vs. bad style, as well as various approaches to the problems you encounter in your projects.
The trick is to develop a sense of how the features of a language interact and that something you saw in one language or program might or might not be a good idea in another, or at least not without some adaptation to a different environment.
James To me, one of the simplest possible reasons is that we are required, by our jobs, to read code. Maybe it's when we are pair programming, maybe it's a library we're bug hunting inside of, or maybe it's our own code after enough time has passed that it too looks foreign but we are required to read code. Given that, it only makes sense that we can practice and improve that skill like any other.
Rick I think that reading code while pair programming is not quite the same thing. To me it feels more like an extension of what I do when I'm writing code, not reading someone else's code. Of course you read what you're writing, but that feels different to me, and when I'm pairing, it feels more like that even if I'm not "driving." But maybe that's just me.
James You're right and that probably wasn't a very good example on my part. Pair programming has some unique skills, like when is it or is it not a good idea to speak up. That's a different conversation though.
Diomidis Given that in our work as developers we spend most of our time and effort in maintaining existing code, code reading is the activity that will occupy the majority of our working day. And this is before we take into account that we also read code (our own or that of our teammates) as we gradually build a project from scratch. Finally, we need to read code when we review the work of our colleagues, in real time (as in pair programming) or during a code review to sign off their code.
Rick How do we get better at all this, well I'd say it's the old cliche, practice, practice, practice, and learn how best to use your toolset at hand to navigate the code you are looking at, but I'll stop for now on that thought.
James Like most skills, there's no great shortcut for improving your code reading that beats just doing it. It's a little hard to force yourself to get started, but once you get into it you might be surprised by how natural it becomes. One trick I've been using lately is whenever I see a presenter mention a library, I open up the code and start poking around in it. If they show a neat feature, I'll try to find the code that is responsible for that. It's a pretty good way to get a lot of exposure and you can learn a whole lot from it.
Rick Yes, that's one of the reasons, I get upset when I don't have WiFi during conference sessions, like at last year's RubyConf.
Also this reminds me of something I was thinking of saying before. One of the reasons why code-reading is such an important skill is that in a lot of cases it's the only way to understand how to use a library. In this case, I look for the test/spec code, which should be there and should be a valuable asset in understanding how a library is intended to be used.
So in many cases it's RTFC instead of RTFM. <G>
James Absolutely. This is even more true for libraries without documentation or with out-of-date documentation.
Diomidis We can improve our code reading skills in several ways. At the lowest level we need to know our programming language, its coding standards, and its idioms and conventions: the way its features are used together to achieve specific results. We then must have a good enough grasp of higher level building blocks: data structures, abstraction mechanisms, design patterns, architectures. Often, however, code reading is about taking shortcuts, and for this we require a mastery of tools that can aid code reading either on the static code or by interacting with a running program. Finally, nothing beats experience. The more time we spend reading code the better we become at it.
Rick Tools can make it easier. One of the things I really loved about doing Smalltalk was how much the typical Smalltalk IDE aided code reading and comprehension. Smalltalk lets you easily find all of the implementations of a message, or everywhere a message is sent for example. You can select text in source code and ask for an explanation. All this works at a semantic level because the IDE knows about the relationships between classes method, and even instances since in Smalltalk you run your program in the IDE, and object inspectors and the debugger all work together pretty seamlessly. An old Smalltalker trick was to run a program, and interrupt it to find out where it was executing in the debugger as a way to find out where to begin reading.
I miss all that, but these days I get along in Ruby, and Textmate, with friends like the "Ack in project" textmate extension. It's text based searching, but most of the time it suffices.
Diomidis Text-based searching and tools, as exemplified by the Unix tool suite, are extremely important in code reading. They work on any language, and nowadays on most development environments, so you hone the same skills as you jump from one language or platform to the other. The cumulative gain from this experience can be tremendous, especially because there are so many things to learn. There are only a limited number of buttons and menu commands in an IDE you can learn (and re-learn as you change platforms), but the ways you can combine Unix tools to achieve a very specific result are endless.
Rick True enough. Although I don't want to get into an IDE vs. text tools and compilers war <G>, I think that there are things which tools like IDEs can do because they 'understand' both the syntax and semantics of the language, and can as in the case of Smalltalk fold in knowledge of run-time state, that are difficult if not impossible just 'reading' the text. Things like refactoring tools need a level of semantic understanding to be really effective, and Smalltalk showed that such things are possible, even without static types.
That all said, I don't use IDEs much at all lately.
Diomidis An important element that receives scant attention is the fear factor. Developers often look at multi-million line code bases with awe and fear. Yet, it's important to realize that such code is often quite easy to understand and modify, because its creators have explicitly put effort into making it readable[2]. Along the same lines, depending on why we read code (maintenance, evolution, reuse, or inspection) there are specific techniques we can employ to move forward with efficiency toward out goal.
Rick Yes the best developers do that, and hopefully by reading their code and learning to distinguish it from run-of-the-mill code you develop a sense of style and expressiveness in your own code.
And writing understandable code is somewhat influenced by the tools and the language. A lot of Ruby programmers have latched on to Kent Beck's classic "Smalltalk: Best Practice Patterns" as a guide to Ruby style. There's a lot of good stuff there but some of it needs to be adapted, or is inapplicable, some of Kent's suggestions to the Smalltalk programmer are driven by the idea of making the code more understandable using the IDE tools I just went on about.
But that goes back to my first comments about seeking a broad scope of influences and adapting them intelligently to your current needs.
Steve I think Diomidis has pretty much nailed it from my perspective. To put it another way, you get better at reading code by writing a lot of it.
The fear factor has dramatic implications for how a company's code base will unfold. If you can't read someone's code, your first instinct is often that it's crap and needs rewriting. You may waste time rewriting otherwise good code. I did a lot of that, 15 to 20 years ago. Such rewrites are bad for the project, but they're also bad for you -- you're going to learn more by puzzling through exactly how the old code works. You should never conclude that code is crap just because you don't understand it. Even if the code is crap, you may still learn something by reading it carefully, and you'll be able to avoid repeating its mistakes.
Diomidis Absolutely right. There's a nine-year old essay by Joel Spolsky, describing why it's a big mistake to rewrite a functioning product from scratch (Things You Should Never Do, Part I), and this alone is one of the most important reasons to become good at code reading; it's a skill and an attitude that can make the difference between the sustained growth of a company and its demise.
Rick This reminds me of something from a long time ago in a galaxy far away.
One of the projects I worked on at IBM was a pilot project called Info/Access to let mainframe customers have access to the RETAIN database which was the IBM internal bug tracking system. IBM used to have a Field Engineering Division comprised IBM employees who were on-site with the customer to fix problems, and folks back home to support them and run RETAIN and other things. The on-site employees who did software support were called program support representatives (PSRs) and spent much of their time reading hex dumps and code (usually on microfiche[2]) so that they could interact with RETAIN to find fixes and file bug reports.
This was a tremendous cost, and IBM had several programs to offload some of this on the customers. While writing the client code to access RETAIN, I worked with an ex-PSR. The sessions with him were painful, because since all he'd looked at was code from a bug-finding perspective, he thought that any design or implementation he'd seen before was going to be horribly flawed, and anything he could dream up on his own, no matter how complicated or unusual would be better.
For the most part the code he had looked at wasn't total crap, it was that he was always dealing with little crappy bits of code.
James I do exactly as Steve said, quite often. I look at a piece of code, think this is crap, rewrite it to remove 10 lines, and am always surprised by what a better grasp the author had on the problem than I did.
That's why I love test driven projects so much. They instantly show all of the assumptions I just violated.
Don Refactoring is interesting.
This is actually one of the reasons I like pure (side effect free) code so much (especially when it has a good type system): I can refactor and be /sure/ I did the right job. Purity gives us a guarantee — in the language — that I can replace definitions with equivalent ones, throughout a program, and be sure things are still correct. This goes a long way to having hackable, fluid code that's a pleasure to hack on.
The "I can't break this even after a few beers" principle. :)
Steve Ultimately you want to get to the point where you can understand any code you look at. It's just like reading natural languages. You don't want to go through life with certain books being inaccessible. You probably won't like them all, but you should be able to read and understand them.
It's much easier to learn to read a programming language than to write in it. You should learn pidgin-X for as many values of X as possible. Pidgin == read it well enough to be able to guess what the code is doing and explain it to others, at least at a high level. If a new language starts to gain popularity (e.g. Scala, Clojure), every programmer should spend enough time reading the docs and tutorials to be able to follow the code and understand the basic facilities offered by the language. This should only take a couple of days at most, even for fairly unusual languages like PostScript, Forth or Prolog. After some practice with this, learning pidgin-X usually only takes an hour or two.
Reading code in an unfamiliar language or style can be a painful experience for us. Nobody likes to be a newbie again. The important thing is to take it slowly, and do the necessary research to ensure that you understand what's going on.
Don I remember as an undergraduate printing off and memorizing sections of the Haskell Prelude (the core library of functions for Haskell). Each function felt a bit like a jigsaw puzzle piece, with the type specifying precisely how it would glue with other pieces. This kind of thing:

  id                      :: a -> a
  id x                    =  x

  (.)       :: (b -> c) -> (a -> b) -> a -> c
  (.) f g x = f (g x)

  const                   :: a -> b -> a
  const x _               =  x

  flip                    :: (a -> b -> c) -> b -> a -> c
  flip f x y              =  f y x
Reading this code was like reading a haiku. Dense, rich, beautiful.
I think our goal as programmers is to become fluent in this language of concepts, to be able to converse directly in it. And I think languages that get at the concepts more directly are closer to that ideal of programmers communicating directly in this computational language. Some languages get at that better than others, and abstraction is a key part of it. I was happy to see a comment on twitter a couple of days ago along the lines of "the #haskell people are just talking in code!"
Diomidis Your reference to the Haskell prelude brings back fond memories; this was indeed beautiful code. Some years earlier (early 1980s), another revelation for me occurred when I looked at the source code of the C library that accompanied the Computer Innovations C86 compiler. That was the first time I realized that one could build a rich (for the standards of those days) library of functions using just the facilities of the C programming language and a few system calls. Similarly, I marveled at how one could write efficient yet structured, readable, and well-documented code when I laid my eyes on the spreadsheet example that came with Borland's Turbo Pascal. Finally, I appreciated how comments and meaningful identifiers could make even the most dense code easy to follow, when I read the listing of the IBM PC BIOS.
I wonder which code would today's programmers use to learn from. With the prevalence of open source software, one problem is that there is so much of it that we're spoiled for choice. For the same reason, I think few appreciate being able to read the code behind a functioning artifact, and therefore don't take the time to read it.
Rick And if they don't they are missing a lot.
One of the great things about open source is that if you have questions, the answers are there for the cost of a bit of code reading. I've been involved lately on ruby-talk in yet another discussion of the subtleties of eigenclass vs. singleton class vs. metaclass. I spent some time today once again reading the C implementation of MRI, and comparing and contrasting what the code says with what has been written in various Ruby books, blogs, and fora and got a better appreciation of how the community arrived where it is on the issue today.
Often when someone asks a question, either directly to me or on an open forum, I find myself scratching my head a bit and going to the code to find the answer. Since I've built up my code reading skills over the years, I can usually find it.
James Rick has a great point here. Code is the language of our shared knowledge. Those who can read it are as helpful as the those who could read were in times when literacy was uncommon.

1 Here's another little blurb that didn't quite fit into the discussion, but was worth preserving.
James This is so true.
When I was less experienced, I often wondered, "how can I get this down to one line?" In contrast, yesterday I was hunting around in one of the largest code bases I maintain for areas of too much magic I could unwind into boring code. Incidentally, we did that so our documentation tool could read them correctly.
I think my love of clean code shows my growth more than any other trait. I'm even twice as likely to use a pretty code library I've read, just because I know I can always pop the hood and see how it's doing something or track down a bug if there is one. I guess I'm a code snob now.
2 While it really doesn't fit into the discussion above, I really couldn't resist this little interchange from the mail we exchanged.
Steve Umm.... to be fair, if you find yourself reading hex dumps on microfiche today, then it's probably OK to rewrite it without thinking first. ;)
Rick Well the dumps were printouts, the source code was on fiche. This is back in the days when eliminating goto statements was SOTA. <G>
Click here to Tweet this article

Tuesday, May 19, 2009

ruby-prof Post Collection

A long time ago, back in 2006 to be exact, I started writing about profiling ruby code with ruby-prof. Since then, these have been some of the most read pages on my blog. To make life a little bit easier on everyone, I've written this page to track all of them. Maybe it will even inspire me to write some updates looking at profiling on Ruby 1.9, and the many improvements and new features in ruby-prof. Here's the list, most recent entries are listed first:

Questions Five Ways - Overview

Digg thisor Tweet it

I think one of the better ideas I've had on this blog is my 'Questions Five Ways' series. For each post, I'll ask a guiding question of five leading hackers, some from the Ruby community and some from outside it. My intent is to build some great resources for everyone who's trying to become a better programmer. If you'd like some more details about the seris, you can take a look at the Questions Five Ways Introduction.

As the series grows, I'm going to list all the entries here.

It might also be interesting to list who's taken part in these discussions:

Friday, May 15, 2009

About About.com and Being Nice (or not)

"I used to think of ruby.about.com as just yet another source of dull and uninspired Ruby content designed to churn ad clicks. But that is because I never noticed that they do an evil, Digg-bar style content hijacking on their outbound links. . . ." — Gregory Brown (@seacreature)

Gregory Brown is up in arms about the way about.com handles outgoing links, and he's absolutely right. You can go read about it at About.com Fail. Then you can decide what to do about it. May I recommend a nice email or comment to Amanda Morin at ruby.about.com? If she gets enough feedback, she might be prompted to push for change at about.com.

Thursday, May 14, 2009

RubyNation Mini-Interview: Aaron Bedra

RubyNation (June 11-13 in Reston, VA) is coming up pretty quickly. I'm running a series of mini-interviews with speakers and organizers there to help people get a feel for what RubyNation is going to be like. So far, I've interviewed Hal Fulton, Russ Olsen, and Gray Herter.

This time, I'm talking with Aaron Bedra (@abedra). Aaron's talking about rcov and Ruby 1.9. I think it's going to be an awfully interesting talk.

If you're interested in getting more that this little taste, I'd recommend that you go register soon. I don't think the seats are going to last long.


What makes regional Ruby conferences special?

Aaron A smaller, more intimate setting always creates a unique atmosphere and encourages everyone to get involved. This usually leads to a higher level of interaction and some really fun hacking sessions. This, in my opinion, is the most valuable thing a smaller conference can offer.

Other than your own talk, what are you most interested in seeing at RubyNation this year?

Aaron I am very interested in attending David Black's Ruby 1.9 talk as well as finding out what's coming in Rails 3 from Yehuda.

Rcov is an awesome tool for Ruby, how did you get involved with it?

Aaron I got involved with Rcov last summer when Chad Humphries (@spicycode) pulled the project into Github to work on some fixes related to the recently released Rails 2.1. After getting involved with the project, it just seemed natural to keep helping and pushing the project towards compatibility with Ruby 1.9.

What other tools should people be looking at for improving their Ruby 1.8 and 1.9 code?

Aaron The most important practice anyone can use is peer review. If you don't have a person to sit with you, make sure you put your code up on Github and let others give you feedback. As far as actual development tools are concerned I like to include Flog in my applications, as well as Safe-ERB and Tarantula in my Rails applications.

Click here to Tweet this article

Wednesday, May 13, 2009

Questions Five Ways - Concurrency

It's time for the first of my Questions Five Ways posts. This time I approached five programmers that do a lot of work with concurrency. Three of them responded (Tony Arcieri (@bascule), Venkat Subramaniam (@venkat_s), and MenTaLguY (@mentalguy). Here are the responses they came up with.

Please help continue this discussion by sharing your thoughts in the comments below.


Which 2-3 languages/approaches should a programmer be studying to move toward a more concurrent future?

Tony Well, besides Reia... :)

1) Erlang: Obviously I believe in the actor model quite a bit, which is the basis of Erlang's approach to concurrency. You can diagram a concurrent system on a whiteboard in terms of different components which talk to each other with messages, and pretty much translate that diagram directly into code. Concurrency is localized to actors, which makes it a lot easier to reason about. The Erlang VM has some issues with scalability on massively multicore systems now but they're being resolved, and the Erlang model makes optimizing concurrency in the VM incredibly easy. All that said, far and away the thing that makes Erlang so incredibly cool is the nearly seamless distribution across multiple systems, which is something Erlang can do far better than any other language in existance. Also, Erlang makes handling faults in concurrent systems comparatively easy, and when it comes to concurrent systems simplified fault handling is invaluable.

2) Haskell: In some regards Haskell is farther along than Erlang when it comes to concurrency, and provides multiple different models for different types of concurrency, whereas Erlang pretty much forces you to use one (the actor model/shared nothing concurrency). However, I prefer Erlang to Haskell as I'm not a fan of pure functional programming/monads and think Erlang's dirty imperative features like its approach to I/O actually make writing programs simpler and more practical. All that said, at Erlang Factory Ulf Wiger pointed out how horribly Erlang performs on shared state concurrency problems like chameneos-redux, which Haskell does great on. I appreciate Haskell providing multiple approaches to modeling concurrency in the same environment.

3) Clojure: uses Software Transactional Memory (STM) for modeling concurrency, which has been around in other places (like Haskell) before but Clojure does some neat stuff as far as simple Lispy syntax for marking sections of the code atomic and also permitting mutable state within the atomic sections. This is great for shared state concurrency problems, but most concurrent problems don't require shared state, and I think reasoning about STM systems is more difficult than actor-based systems because there's no logical mapping between what the system is doing concurrently and how the code is structured. It's sort of like throwing a bunch of queries at a database, and when things start going unexpectedly slow, or break, you're kind of left to wonder what's going on.

Venkat Studying a language is not about learning the syntax, but to learn the idioms and beginning to think along the lines of designing applications using those. From a practical point of view, I don't think you — a busy everyday programmer — have the time and energy to study multiple languages at the same time. So, I don't recommend studying 2 to 3 languages.

At any given time, as a professional programmer you should be studying a language.

One one hand, the language you pick must be quite different from the one you are using extensively. In addition to exercising your mental muscle, it helps you not get boxed into the paradigms and idioms promoted by one particular language you're used to.

On the other hand, if you intend to put some of the studying to real use relatively soon, it will help for that language you learn to integrate well with the language or the platform you're working with. However, if you learn continuously, you'll find it easier and quicker to pick the language with features you desire and integrates well with your current platform or language. So, focus on one language at a time.

From the point of view of concurrency, rather than learning a language specifically, I suggest learning the different approaches. Rather than assuming a particular approach is the right one (or the wrong one), learn the pros and cons of each. Understand what problem they solve, where they would be useful, and what their limitations are. Don't restrict your learning to the approaches supported by one language, your favorite language, or what's currently creating the buzz. Sometimes what's old becomes new again. Learn about the shared state vs. message passing, Communicating Sequential Processes, the Actor based model, Nested Data Parallelism, Software Transactional Memory,...

MenTaLguY Erlang and Haskell, but I'd like to offer a different sort of justification from those which are usually offered. The thing is, both of these languages sort of force you to take the functional programming beast by the horns. It isn't actually hard, but it requires learning new habits — and those habits actually happen to correspond nicely to the habits required for writing good concurrent programs (at a high level — low-level optimizations are another story).

It does also happen that they each more or less represent the two main modes of concurrent programming — message-passing (as represented by actors in Erlang) versus shared-memory (as represented by STM in Haskell), so you'll get a flavor for each that way. But do realize that there are lots of different ways of looking at message-passing besides just actors; joins as in JoCaml, for instance.

Also, in general, I think the main goal for learning new programming languages should be to stretch your brain and find things to take back to your work in your usual languages. (No matter how different it is from what you're used to — and it should be different if you intend to stretch your brain — there will always be something.) You may find a language you end up falling in love with in the process (Ruby was one such language for me), but it's better not to approach new languages with such high expectations.

Click here to Tweet this article

Tuesday, May 12, 2009

On Ruby Interview with Pat Eyler

For a bit of a change-up, Sean Carley (@milythael) is running this interview.

Pat has become a well known online author in the Ruby community with frequent book reviews, interviews and post on various useful topics. When he asked the community who we would like to see interviewed, I turned the tables on him.


You've become a well known blogger in the Ruby community. You're active in organizing the MountainWest RubyConf and have started or helped start multiple Ruby brigades, including Seattle.rb, as well as other programming user goups. What makes you stay so outwardly involved in the software development community?

Pat Long before I was involved in the Ruby community, I'd become involved in the Free Software community. Before that, I was involved in other groups that had cultures of community involvement. Joining a community and then working to improve it had become sort of second nature too me. (It's a case of enlightened self interest though, not altruism.)

Some people give back to the Ruby community by creating cool new libraries (or improving the existing ones). Others by taking new programmers under their wings and helping them develop Ruby skills. I don't really have the programming chops to do that, so I found a niche as a community hacker.

Beyond the general value I derive from a better, stronger, friendlier community, there are also some more specific benefits that accrue. Since I hang out on a bunch of .rb mailing lists, I hear about things that are happening. I've been invited to meet with groups in places I travel to. There's also a sort of vicarious sense of accomplishment when I see a group I've been involved with do something really cool.

You were one of the co-founders of the Seattle Ruby Brigade. What was it like when the group first started?

Pat Well, the skit that Aaron Patterson did in the Seattle.rb presentation at RubyConf last year wasn't completely accurate — it is pretty fun, and not too far off though.

When things actually started, it was Ryan Davis, Doug Beaver, and me. We met at a little cafe the first couple of times, then moved to Seattle Pacific University's library. I think that's where Eric came into the picture.

At first we did a lot of talking. We started hacking together on a Ruby mail application, but it went nowhere fast. I remember meetings where we talked about the ruby debugger, ten cool libraries, and this cool window manager someone was writing in Ruby. There's just an incredible amount of Ruby talent in Seattle.

Having RubyConf in Seattle as we were starting really gave us a jump start. That's where I met Phil Thomson and talked with him about getting the pdx.rb rolling. It also got me thinking about doing something locally. We tried it with 'Ruby in the Rainforest'. A local min-conference we held over on the Olympic Peninsula — I think we had 6 people there, but it was fun.

What did the Seattle.rb do right and what made it so successful?

Pat I think some of the biggest things that the Seattle.rb does right are: they meet regularly, even if not everyone can make it; they combine hacking, learning, and socializing; and they advertise heavily. When you combine these three factors, you find that the ruby community knows that there will be a meeting and that it's going to be a good time.

Of course, it doesn't hurt that they've had some brilliant hackers there over the years. I mean, who wouldn't want to get and hear Evan talk about rubinius, hack with Eric and Ryan, or just hang out with some of the smartest Rubyists anywhere.

What other Ruby groups have you helped start and how have they done?

Pat I've been involved to some degree or another with a lot of groups. Most of them have taken off and flourished and I no longer have any role with them other than hanging out on their mailing list and wishing I could make a meeting. The group I'm most involved with these days is the URUG (Utah Ruby Users Group). This is really an umbrella group and includes the the Logan.rb, the Layton.rb, the SLC.rb, and the UtahValley.rb. I used to attend the UtahValley.rb most months. These days, I'm limited to making the occasional hacking lunch. :(

What drew you to Ruby in the first place?

Pat I was consulting at Fidelity and had just finished a fairly large Y2K monitoring app in Perl. Looking at my next opportunities, I decided I needed to become a better Perl programmer by learning another language. Since I got the idea from 'The Pragmatic Programmer', and since Dave and Andy had just released the Pick-axe, it seemed like a good fit.

For a while it seemed to work. I tested my Perl code better, it even became more legible ... Pretty soon though, I found myself not enjoying Perl anymore. I wanted to write things in Ruby instead. I'm sure the other guys I worked with got tired of hearing me praise Ruby, but it just fit the way I think. It seems like it fits the way a lot of other folks think too, Matz did a pretty good job with it.

Do you think of Ruby as a full fledged development language or do you think of it mainly as a web and scripting language?

Pat I actually do very little web programming, so I think of it more as a programming and scripting language that some people use for web stuff.

For me, things kind of sit on a continuum. At one end, there's BASH. I do a lot of quick, one-off stuff here, but I don't like to write anything over 20 or 30 lines in it since it's harder to maintain. On the other end is some kind of fast, compiled language like C. I only go here when I really need the speed, otherwise the pain just isn't worth it. Ruby sits in the middle and is where I'd like to do most of my work.

I know you actively learn other languages than Ruby. What languages currently have your interest and what do you like about them? Do you see other languages competing with Ruby or do you learn them to fill other gaps?

Pat I try to look at languages to learn ideas or approaches that I can use, but I'm also on the lookout for several things:

  • A fast, compiled language that's near-C speed but friendlier and safer. I'm currently looking at OCaml as a possibility here. Haskell is interesting too because it sort of sits on the border between Ruby and C in terms of how I could see myself using it.
  • Another language that fits my mind even better than Ruby. I don't know that I'll find one, but you never know — so far, I've found several that don't.
  • A good concurrent language. There are a bunch of options out there Erlang, Reia, Scala, maybe OCaml ... I need to learn more and do more with them to figure out where I should be looking hardest.

How do you think working as a system administrator changes your perspective on programming languages?

Pat I don't know that it changes my perspective on languages as much as my goals in programming. I'm a lot more focused on little tools that help me do my job. (Which is probably one reason I don't do much web programming.)

On the other hand, I spend a lot more time with programmers than a lot of other sys admins. I think my exposure to the Ruby community has really helped me there. I understand it when the developers dive into 'Agile mode' and start talking about sprints, unit tests, coverage, and what not. I think that makes me a better infrastructure guy.

It's also kind of fun to work on 'real' programming more as a hobby than a job. If I don't grok something right away, I don't mind filing it away to come back to later since I don't have a job or a deadline looming. I guess this could become a handicap at times too, since I don't always have an incentive to buckle down and figure something out.

How did you get started in technical writing?

Pat I used to do a lot of training, and the writing is just sort of an outgrowth of that. The first article I wrote was about functions and aliases in shell scripting. I just got tired of people asking me about it, so I wrote up an article. I was floored to find that I enjoyed writing it (I used to hate writing in school). And things sort of took off from there.

After writing some stuff for free, I landed some more 'professional' writing gigs. I ended up writing some magazine articles, a few tutorials, even a book (no, it didn't sell very well). Today, I'm pretty happy writing for my blog. Who knows where writing will take me next month or next year though.

Click here to Tweet this article

Monday, May 11, 2009

RubyNation Mini-Interview: Gray Herter

For my third RubyNation interview, I talked with Gray Herter, one of the organizers. (I've also interviewed Hal Fulton and Russ Olsen about this upcoming regional Ruby conference.)

Don't forget RubyNation will be held in Reston, VA on June 11-13, so you don't have to much time left to register.


Why did you decide to put RubyNation together?

Gray It started with a conversation that I had with Xandy Johnson, the former leader of the Northern Virginia Ruby Users Group. We discussed it as a way to help him raise money for our meeting pizza and sodas fund. He was having trouble getting sponsors every month for that. With the wealth of local Rubyist we have in the DC area, I thought if we could get five or so fairly well known local speakers to present something for a one day event at our normal location, we could charge a small fee and raise enough money to fund the group for the year easily. Russ Olsen and David Bock were up for it almost immediately. Russ, as it turned out, had also been talking to Xandy about it. After getting commitments from a few more well-known speakers, especially Rich Kilmer, and a good core of quality organizers, we decided we should make it a two-day event, using the regional conference model. It seemed like with the level of speakers we had interested in the idea, we needed to do that.

What makes regional Ruby conferences special?

Gray There are a lot of things I love about them. They get local ruby communities together. People get to know each other well since they are generally fairly small. The attendees and organizers feel a real connection to their community. You find out things that you never knew were happening right nearby. Last year, for instance, we had Mike Furr speaking, one of the creators of DiamondBack Ruby, which is a great project at the University of Maryland. You just had an article about it on your blog. He is a Rubyist in our area that is doing some really interesting work, at the Ruby language level, not just a new framework. I had no idea about it until he got involved in our conference. And when you get everyone together in a conference setting it is much easier to attract well-known speakers. Much easier than it is to get them for our user group, for instance. People appreciate that the conference can bring people like Yehuda Katz and David A. Black to our area.

What makes RubyNation stand out as a regional Ruby conference?

Gray I am not sure that we intend to stand out from the other regional conferences. Our goals are the same as the other regionals, I'm sure. That is, to put on a great event for the community. We do plan to run a very professional, high-value regional conference, and I believe RubyNation will be one of the better ones. I think the program committee has done a great job this year in putting together an interesting and relevant program. We have the two talks of David A. Black speaking on Ruby 1.9, and Yehuda Katz speaking on Rails 3.0. And Chad Fowler presenting some of the ideas from his Passionate Programmer book. The quality of speakers is very high. Even the local presenters are high quality. There is a lot going on in the DC area. The breadth of topics is very broad this year. We have several Rails topics, some database ones, user interface topics, and some non-Ruby ones, and so on.

What's your favorite memory from last year's RubyNation?

Gray Honestly, as the chief organizer, I had a grin from ear to ear when it was over and we had pulled it off successfully. It was a ton of work, and it was just a great feeling to have it work out so well. Otherwise, some of the lightning talks were surprisingly fun or me. Byran Liles did two rather long lightning talks, one each day, that I loved, especially the first one. It was a very quick version of his test all the f-ing time talk that he has since done at a few other conferences. He was our backup speaker last year, and that was a way of getting his talk in. In 15 minutes, he was the star of the conference. Not as PC as it should have been, especially in light of the recent controversies about whether or not we should be keeping the conference talks PG rated, but it was very funny.

What are you most looking forward to for this year's RubyNation?

Gray For specific topics I am interested in the Reia talk. It isn't Ruby, but in my opinion, the talks don't all have to be about Ruby specifically, just of interest to the Ruby and Rails community. I would like to hear what Bruce Tate thinks about it. He was the author of Beyond Java, a book written to explore the notion of what comes after Java. I am also glad to see several talks on user interfaces, like the ActiveScaffold talk, or the one on improving an application's perceived performance, or Bruce's talk on facet-based navigation. The database talks will be interesting, too. There are a bunch of talks that are interesting to me. Come to think of it, the Herding Tigers talk should be really good, too. Daniel sounds like a fun guy. I love the idea of a guy having an alias. He was in a punk band, and called himself Danny Blitz for a while. I need to met him.

[ed. you can learn more about Reia in my interview with Reia developer Tony Arcieri.]

Why should people come to RubyNation?

Gray It's fun. You can attend a really fun Ruby-centered event, right in our area. We are small enough that everyone can participate. You can ask questions, or talk to the presenters during breaks. They usually all attend the whole thing as participants, too. Or you can give a lightning talk. I would really like to encourage people to do that, especially if they are wondering what it would be like to actually give a regular presentation at an event like this. Maybe it will spark someone to submit a talk proposal next year, or speak at our user group. Growing the local community is what we are here for, after all.

Click here to Tweet this article

Thursday, May 07, 2009

RubyNation Mini-Interview: Russ Olsen

For my second RubyNation interview, I talked with Russ Olsen (@russolsen), one of the organizers. (I've also interviewed Hal Fulton about this upcoming regional Ruby conference.)

Don't forget RubyNation will be held in Reston, VA on June 11-13, so you don't have to much time left to register.

Russ is an awfully bright guy, and a real contributor to the Ruby community. You might also be interested in reading my earlier Russ Olsen Interview, or my book review of On Ruby: Design Patterns In Ruby.


Why did you decide to put RubyNation together?

Russ I think it was actually Grey Herter's idea originally, but I can remember sitting down with Gray and talking about putting on a little one day, 50 person mini-conference mostly to raise a little bit of money for the Our local Ruby users' group, which at the time was running on a shoestring.

From that initial idea it grew and grew, expanding to more than twice that original size, moving to a real conference center and spreading out to two days.

What makes regional Ruby conferences special?

Russ In general the regional conferences are fun because they all have their own feeling, each one putting a special twist on the what it means to be part of the Ruby community. I also think get a kick out of the fact that this is all coming from the bottom up, driven mostly by people who just want to be involved in the Ruby community.

What makes RubyNation stand out as a regional Ruby conference?

Russ There's really two answers to that question: In a larger sense, RubyNation can build on the thriving Ruby community that has grown up around Washington DC. We have a really diverse bunch of people here doing Ruby here, everything from Web 2.0 start ups to older established companies that are trying to find a better way, to individual, enthusiastic techies.

For me personally, RubyNation is great because I already know a lot of the people, but it's rare to see everyone in the same place at the same time.

What's your favorite memory from last year's RubyNation?

Russ I had just finished helping clean up (trust me, this conference organizing stuff is not all glamor) and happened to find myself walking out with the last two attendees. One of them turned to me and said, "So how long have you guys been doing this?" and I realized that we had pulled it all off with enough professionalism that at least one person thought we had been at it for years. Or he was just being polite. Either way it worked for me.

What are you most looking forward to for this year's RubyNation?

Russ I'm looking forward to hearing Hal Fulton talk about Reia, a new programming language that he has found. Remember, the last language that caught Hal's attention...

[ed. you can learn more about Reia in my interview with Reia developer Tony Arcieri.]

Why should people come to RubyNation?

Russ Come because we have a great line up of speakers. Come because it's not very expensive. Come because the Washington area Ruby community is just bubbling over with enthusiasm. Come because June in Washington is not nearly as bad as August in Washington. Come because you might hear about the next big thing. Come and tell us about your next big thing.

Click here to Tweet this article

Diamondback Ruby Interview

After the announcement of Diamondback Ruby on ruby-talk a bit ago, I decided to contact the developers to learn more about what they're doing. Two of the team members, Mike Hicks and Mike Furr, and I ended up having quite a conversation that I'm posting as the interview below.


What do you hope to learn from this project?

Mike Hicks There is a long-discussed tension between statically (or explicitly)-typed languages like Java and dynamically (or implicitly)-typed languages like Ruby. My hope has been to discover how to include the best aspects of static and dynamic typing in a single language. We all really like Ruby's design and features, and felt it was the right language to start with. As we go, we'll look to derive general design principles that make sense for most any programming language, to simplify and improve the process of programming generally.

Static types are useful at catching bugs early, and serving as useful documentation (indeed, RDoc includes a pseudo-type for Ruby library methods). But static type systems may reject perfectly correct programs due to their imprecision, and thus can "get in your way," particularly when doing rapid prototyping. On the other hand, dynamic types suffer no imprecision, but delay discovery of some bugs until run-time. It can be particularly annoying to mistype the name of a method in a call deep within your complicated program, and have the program fail after running for a long while with "method not found," when having a type checker would have immediately revealed the mistake. The challenge is to incorporate the best bits of both approaches, e.g., to not reject programs prematurely (as a static type system could) while finding as many certain errors in advance as possible.

Mike Furr We had two main hypotheses going into this research. The first hypothesis was that development in dynamic languages doesn't scale. This is a hard thing to measure, but there is a fair amount of anecdotal evidence that seems to support this. For example, just recently a few of the Twitter developers were interviewed about moving their infrastructure from Ruby to Scala, and static typing was mentioned as one of their advantages of Scala over Ruby.

The second hypothesis we had was that most people "think in types" even if they don't write them down while programming. Thus, we imagined that most code written by Ruby programmers would be accepted by a sufficiently advanced static type system. Thus we hoped to be able to design a type system for Ruby that was powerful enough to handle the kind of Ruby code people actually write, but not be so complex that it was impossible to use.

What made you choose Ruby as your test implementation instead of Python, Perl, or one of the other widely used dynamicly typed languages?

Mike Furr When we were first throwing around ideas about analyzing a scripting language, Ruby seemed to be the language with the most momentum, likely because of the release of Ruby on Rails a few years earlier. Ruby is also a rather young language and its syntax and semantics are continuing to evolve. Ideally, we hope that our research could influence future directions for Ruby, although much of our research would be applicable to other languages as well.

Mike Hicks I really like Ruby's design, particularly the principles that "everything is an object" and "duck typing." We also liked that Ruby was the core of Ruby on Rails, whose popularity was increasing. In our exploration of Perl, an early contender, we became frustrated with the huge number of overlapping language features, and quite surprising behavior in many instances. We didn't see how to write a useful static analysis for Perl programs without a lot of difficulty. We thought about Python only cursorily, and I don't recall any particular downsides that came up.

Why did you choose OCaml as your implementation language?

Mike Hicks Our group at Maryland uses OCaml almost exclusively for writing static analysis tools, particularly for analyzing C code, so it was natural to want to use it for this project, too. OCaml, in my view, is the perfect language for writing a compiler or analysis: its syntax is terse (as compared to Java, say), and features such as first-class functions and pattern matching very much simplify the process of writing tools that analyze or manipulate structured data, like abstract syntax trees. We followed the lead of the CIL (C Intermediate Language) project, also written in OCaml, in designing RIL (Ruby Intermediate Language), the core of DRuby. For example, both CIL and RIL syntax trees are simplified after parsing to make analysis more manageable.

Mike Furr OCaml is my favorite language to program in and I have been using it throughout my time in graduate school. However, I also think it is the right tool for the job. The quintessential example for functional programming languages is writing a compiler, and DRuby is essentially a compiler front-end. OCaml's type system is also a real asset in developing a complex code that manipulates abstract syntax trees.

A lot of folks seem to think that you've written a Ruby implementation in OCaml instead of a type analyzer for the existing 1.8 Ruby. Do you think an OCaml implementation of the language would be a good thing? Why or why not?

Mike Hicks This is a hard question to answer. Why might one want to implement an interpreter in one language vs. another? I can imagine several reasons: performance, portability, maintainability, and reliability, among others. Developers often implement interpreters or VMs in C/C++ for reasons of performance and portability. But C and C++ encourage programming "on the edge of safety," so mistakes can lead to crashes, security vulnerabilities, etc., hurting reliability and maintainability. By contrast, coding in a high level language, e.g., Java or OCaml, avoids many reliability problems, thanks to type safety and garbage collection, but at the cost of some performance. (For grain-of-salt interlanguage performance metrics, check out the Computer Language Benchmarks Game, http://shootout.alioth.debian.org.) And the language is really well-suited to writing compilers and interpreters, thanks to its rich structured datatypes and pattern matching. In my experience, an OCaml-based compiler or interpreter is much more succinct than one written in Java. So I think it's a good option.

Mike Furr There are already several implementations of Ruby available and so adding another one simply because it was implemented in OCaml doesn't seem like a good idea to me. Maintaining an implementation of Ruby is a lot of work since the language continues to evolve its syntax and semantics from version to version and the API of Ruby's standard library is also tied to particular Ruby versions. Instead, there would need to be a fundamental new feature that an OCaml implementation would provide that developers would find useful. For example, it might be interesting to explore compiling Ruby programs to native code using the OCaml bindings of LLVM and doing type driven optimizations based on Diamondback Ruby's type system. This would be a lot of work and I have no idea if the resulting code would be any faster than some of the newer Ruby virtual machines, but it could be a fun project.

How does static type inference affect the balance between testing and debugging? How does it affect the testing and debugging processes?

Mike Hicks Type inference is a debugging aid, I suppose. It is meant to help identify bugs that could come up, and do so without requiring you run your program. It is not a replacement for testing, though. Essentially it finds out whether you are programming with a certain level of consistently; if in one place you declare your method to take three arguments but elsewhere call the method with four, that's an inconsistency. But it doesn't prove that your code does "the right thing," e.g., whether you formatted your output string correctly. You need tests for that. Our approach allows one to help the other. When you write tests, DRuby will profile their execution to provide information that helps type inference. And type inference helps you identify some bugs without having to run tests.

Mike Furr Static typing is a tool just like testing frameworks and debuggers. All of them are meant to improve the quality of the software being developed, and each has their own advantages and disadvantages. The major advantage of static analysis is that it is able to reason about every path through your program simultaneously (and before it is run). Static types also provide terse, verified documentation. If you method has a type annotation that says it returns a Fixnum, that annotation will never become stale and can be trusted by any other developer who is calling your method.

However, static typing isn't perfect and is not meant to replace other QA techniques such as testing. One of the goals for DRuby is to allow programmers to incrementally add static types to their code bases, allowing them to benefit from extra checking where they want, without requiring changes to the entire code base.

You mentioned that you've found several potential errors in Ruby libraries and prgrams as a result of type inference analysis. What kinds of problems are you finding? How could the Ruby community take advantage of these kinds of discoveries?

Mike Furr The Ruby community has accepted test driven development as a standard practice and so we didn't expect to find a large number of errors. However, getting 100% testing coverage is often difficult and, not surprisingly, many of the bugs we found were in error handling code that was not exercised by any test cases. Some of these bugs were extremely simple, like misspelling a variable name, or referencing a method that did not exist.

One bug that I found particularly interesting was where a program called a method in the "File" class that didn't exist. This code was covered by the test suite and didn't cause a test failure. The reason for this was because the testing code monkey patched the File class to add the method before running the test suite. Thus, you would only encounter the problem if you executed the code outside of the test suite.

We hope that DRuby will develop into a tool that developers can run on their projects as part of their own quality assurance process. In the mean time, we have been filling bug reports for the errors we discovered so that the authors can fix them. For example, we found 2 errors in the Rubygems package manager that have already been fixed in their latest release.

What kind of feedback are you getting from Rubyists?

Mike Furr We've gotten some very encouraging feedback so far. In fact, despite the legendary flame wars between static and dynamic typing, I haven't received any negative comments about the idea of bringing static types to Ruby. A lot of people are using Ruby these days and so a tool that can help improve their development process through finding bugs or improving documentation is clearly appealing. Diamondback Ruby still needs some polishing so that programmers can begin using it on their own projects, and this is something we are going to continue to work on. Eventually, we'd like to perform some user studies to measure the effectiveness of the various features of Diamondback Ruby, and so its usability is certainly important to us.

How well do Ruby programs perform under Diamondback Ruby?

Mike Furr Diamondback Ruby uses a combination of static and dynamic checks to ensure that Ruby programs are well typed. Programs that can be checked purely statically (which we hope will be most of the time) will have no overhead at all since the programs can be safely run by a traditional Ruby interpreter unchanged. However, if the program does require a runtime check, then individual objects or methods may be instrumented to ensure they don't violate their types. When dynamically checking objects, we instrument the eigenclass of the individual object so that only methods calls to that object must be checked (not every object of the same class). Thus the checks are pay-as-you-go: the more objects that require dynamic checks, the higher the overhead. Therefore, it's hard to use a single measurement to quantify the overhead as it can vary from execution to execution of an application. I have run some micro-benchmarks and observed a 15% slowdown in one case, but this data point should be taken with a grain of salt, as it was merely to convince myself that the instrumentation was working and not egregiously slow. An application that rarely uses a dynamically checked object may see almost no overhead, but if the application calls methods on that object in a tight loop, it could be significantly higher.

Are you using the RubySpec framework to drive your implementation?

Mike Furr DRuby includes a dynamic analysis that allows us to reason more precisely about features such as eval(). This analysis requires us to parse the original Ruby program into our intermediate language, add any instrumentation code, and then write the transformed program out to a separate location on disk to be executed by the Ruby interpreter. This whole process was rather tricky to get right, and we used the RubySpec test suite to ensure that our transformations were correct. It was definitely a great help to have such a comprehensive test suite.

We haven't used the RubySpec tests to drive any type analysis for the standard library just yet, but I can definitely see using it in the future as we continue our research.

I'd really like to see OCaml get more play, but I keep seeing books like this and wonder when a good OCaml book for non-Math/CS types is going to hit the shelves. What will it take to get OCaml in front of more developers?

Mike Hicks My observation is that languages take off when library or framework support for some important set of applications hits critical mass. Then developers interested in that application intuit that it's easiest to build that app in a certain language or framework, and then go off and learn what they need to learn. Then those developers start building more libraries and the language is used for other things. I think we can see this trend in Perl (first killer app: text processing), Java (first killer app: applets), Ruby (first killer app: Rails), etc. We're starting to see more adoption of Erlang, thanks to the rise in multi-core and high-availability commercial systems, and we're seeing a growth in Haskell, at least in part because of all of the code you can get for it (though I can't speculate on what its killer app is).

When I first started doing work in static analysis, C/C++ were the languages of choice, oftentimes building on gcc or other existing tools. But then George Necula and his students wrote CIL (C Intermediate Language). Nowadays many, many tools are written using CIL as the front end and intermediate language, by people who never were "functional programming people." CIL was just so much better, clearly, than anything else, that people flocked to it. (As of today there are 297 citations to the CIL paper, according to Google Scholar — esp. noteworthy for an "infrastructure" piece of work.)

Of course, C analysis and other "symbolic computations" on programming languages are a niche area, and not likely to bring in the masses. OCaml still needs that breakthrough use-case of great interest that will push it over the top. It remains to be seen what that will be. But once it's found, the books, tools, etc. will all follow.

Mike Furr I agree with everything that Mike (Hicks) said but would also add that OCaml needs to overcome its branding as an "academic language". I have found that a lot of programmers think of functional programming as a fringe concept, perhaps invoking bad memories of struggling with it as an CS major. At the same time, one of the features people really seem to love about Ruby are blocks, which of course are a functional programming technique. I think that Ruby's syntax plays an important role here: programmers don't have to understand what a higher order function is to be able to use a block and yet they can immediately see the usefulness of the technique. However, a functional programmer might find this syntax restrictive (why only one block per method?). Perhaps the road to OCaml's adoption will be through Ruby which gives a gentler introduction to some of the same ideas used in ML.

Click here to Tweet this article

Wednesday, May 06, 2009

MWRC Interview Collection

I've done a bunch of interviews ABOUT MWRC. Some of these have been quite popular, while others are sort of hidden gems. If you've enjoyed one of them, maybe you'd like to see some of the others. I'll start out with three of my favorites, but there's a complete list down below.

1) My interview with Philippe Hanrigou has become one of the most popular posts on my blog. He has developed a reputation as one of the great MWRC speakers, and I think his interview is a good indicator of why. You might also want to see his talks on What to do when Mongrel stops responding (from 2008) and What The Ruby Craftsman Can Learn From The Smalltalk Master (from 2009).

2) I started an interview with David Brady on twitter. Before it was over, Kirk Haines and Jim Weirich had joined in. Not only did this interview spawn a great meme, the story got better when David came into the second day of conference from his sickbed and gave his talk during lunch (we'd already run the replacement talk). Now that's dedication. You should go watch David's presentation on TourBus, Kirk's, on vertebra, or Jim's keynote.

3) I like this last one because it provides a different take on things. I interviewed Josh Susser about his interest in attending MWRC. Maybe most telling about his interest was that he went off and organized GoGaRuCo (hmmm, isn't there something about imitation and flattery — just kidding, Josh and GoGaRuCo are great!).

MWRC 2009


MWRC 2008

Questions Five Ways, an introduction

Next week, on Wednesday, I'll be starting a new feature here at On Ruby, Questions Five Ways. I asked a friend of mine, Andrew Young, to put together a small graphic that I could use for these posts. I really like the way it turned out.

The idea is that I'll invite five different hackers to join a conversation centered around a guiding question. Once we've had a couple of days to kick things around, I'll summarize the discussion and post it here.

Hopefully the discussion won't stop just because I've posted it here. I'd love to see it open up and blossom with a wider audience, so please add your thoughts in the comments each week. If you'd like to follow along at twitter, let's use the #q5w hashtag.

I've got a bunch of questions, and a number of potential victims, err, participants in mind.

Just to whet your appetite, here are a couple of the questions I plan on asking:

  • What is the right interplay between testing code (unit test and the like) and code analysis (lint, reek, and their ilk)?
  • What kinds of activities are most important to building (or keeping up) a healthy users group?

I'm sure I don't have a corner on good ideas though. If you'd like to share your requests below, I'll try to work the best of them into the cycle.

Click here to Tweet this article