Wednesday, May 20, 2009

Questions Five Ways - Code Reading

Welcome InformIT readers!  You might also like to see my review of Eloquent Ruby  Enjoy!

When I came up with the idea of Questions Five Ways, I hoped to create some discussions of lasting value to programmers, posts that would stimulate some meaty discussion. I think this post has fulfilled my vision about as well as anything I could expect.
I owe a big 'thank you' to the participants in this discussion: Steve Yegge, James Gray (@JEG2), Rick DeNatale (@RickDeNatale), Diomidis Spinellis (@DSpinellis), and Don Stewart (@donsbot).
As good as the discussion below is though, it's merely a jumping off point. What do you think about code reading? What code have you read (or are you reading) to become a better programmer? Update: Gregory Brown has posted a great article on code reading.

Several years ago, I worked for someone who would occasionally hand me print outs of part of the company's code base. "Here, read this," he'd say. "I think you'll really learn something from it." Now, I wish I'd taken more time to sit down with him to really learn the art of code reading. Why is code reading so important to a programmer, and how can we become better at it?
Rick Well, I think that the art/craft of programming is much like the art/craft of writing prose or poetry. Just as an author gets better by reading the literature, so does a programmer.
And I think that it's important to read a wide variety of "the literature" not only in your own main programming language but in others as well. I've been a bit of a language polyglot for many years, back to my days in college when I devoured any new language I could get my hands on and play with, from Fortran, APL, PL/I, Lisp, even a little bit of COBOL ... whether it was part of a formal course or not. I've kept this attitude for decades now, through C, Pascal, Smalltalk, Java, and Ruby, and probably other languages I'm failing to remember right now.
So by looking at such a wide range of programs and languages, I've learned a lot. Developed a sense of good style vs. bad style, as well as various approaches to the problems you encounter in your projects.
The trick is to develop a sense of how the features of a language interact and that something you saw in one language or program might or might not be a good idea in another, or at least not without some adaptation to a different environment.
James To me, one of the simplest possible reasons is that we are required, by our jobs, to read code. Maybe it's when we are pair programming, maybe it's a library we're bug hunting inside of, or maybe it's our own code after enough time has passed that it too looks foreign but we are required to read code. Given that, it only makes sense that we can practice and improve that skill like any other.
Rick I think that reading code while pair programming is not quite the same thing. To me it feels more like an extension of what I do when I'm writing code, not reading someone else's code. Of course you read what you're writing, but that feels different to me, and when I'm pairing, it feels more like that even if I'm not "driving." But maybe that's just me.
James You're right and that probably wasn't a very good example on my part. Pair programming has some unique skills, like when is it or is it not a good idea to speak up. That's a different conversation though.
Diomidis Given that in our work as developers we spend most of our time and effort in maintaining existing code, code reading is the activity that will occupy the majority of our working day. And this is before we take into account that we also read code (our own or that of our teammates) as we gradually build a project from scratch. Finally, we need to read code when we review the work of our colleagues, in real time (as in pair programming) or during a code review to sign off their code.
Rick How do we get better at all this, well I'd say it's the old cliche, practice, practice, practice, and learn how best to use your toolset at hand to navigate the code you are looking at, but I'll stop for now on that thought.
James Like most skills, there's no great shortcut for improving your code reading that beats just doing it. It's a little hard to force yourself to get started, but once you get into it you might be surprised by how natural it becomes. One trick I've been using lately is whenever I see a presenter mention a library, I open up the code and start poking around in it. If they show a neat feature, I'll try to find the code that is responsible for that. It's a pretty good way to get a lot of exposure and you can learn a whole lot from it.
Rick Yes, that's one of the reasons, I get upset when I don't have WiFi during conference sessions, like at last year's RubyConf.
Also this reminds me of something I was thinking of saying before. One of the reasons why code-reading is such an important skill is that in a lot of cases it's the only way to understand how to use a library. In this case, I look for the test/spec code, which should be there and should be a valuable asset in understanding how a library is intended to be used.
So in many cases it's RTFC instead of RTFM. <G>
James Absolutely. This is even more true for libraries without documentation or with out-of-date documentation.
Diomidis We can improve our code reading skills in several ways. At the lowest level we need to know our programming language, its coding standards, and its idioms and conventions: the way its features are used together to achieve specific results. We then must have a good enough grasp of higher level building blocks: data structures, abstraction mechanisms, design patterns, architectures. Often, however, code reading is about taking shortcuts, and for this we require a mastery of tools that can aid code reading either on the static code or by interacting with a running program. Finally, nothing beats experience. The more time we spend reading code the better we become at it.
Rick Tools can make it easier. One of the things I really loved about doing Smalltalk was how much the typical Smalltalk IDE aided code reading and comprehension. Smalltalk lets you easily find all of the implementations of a message, or everywhere a message is sent for example. You can select text in source code and ask for an explanation. All this works at a semantic level because the IDE knows about the relationships between classes method, and even instances since in Smalltalk you run your program in the IDE, and object inspectors and the debugger all work together pretty seamlessly. An old Smalltalker trick was to run a program, and interrupt it to find out where it was executing in the debugger as a way to find out where to begin reading.
I miss all that, but these days I get along in Ruby, and Textmate, with friends like the "Ack in project" textmate extension. It's text based searching, but most of the time it suffices.
Diomidis Text-based searching and tools, as exemplified by the Unix tool suite, are extremely important in code reading. They work on any language, and nowadays on most development environments, so you hone the same skills as you jump from one language or platform to the other. The cumulative gain from this experience can be tremendous, especially because there are so many things to learn. There are only a limited number of buttons and menu commands in an IDE you can learn (and re-learn as you change platforms), but the ways you can combine Unix tools to achieve a very specific result are endless.
Rick True enough. Although I don't want to get into an IDE vs. text tools and compilers war <G>, I think that there are things which tools like IDEs can do because they 'understand' both the syntax and semantics of the language, and can as in the case of Smalltalk fold in knowledge of run-time state, that are difficult if not impossible just 'reading' the text. Things like refactoring tools need a level of semantic understanding to be really effective, and Smalltalk showed that such things are possible, even without static types.
That all said, I don't use IDEs much at all lately.
Diomidis An important element that receives scant attention is the fear factor. Developers often look at multi-million line code bases with awe and fear. Yet, it's important to realize that such code is often quite easy to understand and modify, because its creators have explicitly put effort into making it readable[2]. Along the same lines, depending on why we read code (maintenance, evolution, reuse, or inspection) there are specific techniques we can employ to move forward with efficiency toward out goal.
Rick Yes the best developers do that, and hopefully by reading their code and learning to distinguish it from run-of-the-mill code you develop a sense of style and expressiveness in your own code.
And writing understandable code is somewhat influenced by the tools and the language. A lot of Ruby programmers have latched on to Kent Beck's classic "Smalltalk: Best Practice Patterns" as a guide to Ruby style. There's a lot of good stuff there but some of it needs to be adapted, or is inapplicable, some of Kent's suggestions to the Smalltalk programmer are driven by the idea of making the code more understandable using the IDE tools I just went on about.
But that goes back to my first comments about seeking a broad scope of influences and adapting them intelligently to your current needs.
Steve I think Diomidis has pretty much nailed it from my perspective. To put it another way, you get better at reading code by writing a lot of it.
The fear factor has dramatic implications for how a company's code base will unfold. If you can't read someone's code, your first instinct is often that it's crap and needs rewriting. You may waste time rewriting otherwise good code. I did a lot of that, 15 to 20 years ago. Such rewrites are bad for the project, but they're also bad for you -- you're going to learn more by puzzling through exactly how the old code works. You should never conclude that code is crap just because you don't understand it. Even if the code is crap, you may still learn something by reading it carefully, and you'll be able to avoid repeating its mistakes.
Diomidis Absolutely right. There's a nine-year old essay by Joel Spolsky, describing why it's a big mistake to rewrite a functioning product from scratch (Things You Should Never Do, Part I), and this alone is one of the most important reasons to become good at code reading; it's a skill and an attitude that can make the difference between the sustained growth of a company and its demise.
Rick This reminds me of something from a long time ago in a galaxy far away.
One of the projects I worked on at IBM was a pilot project called Info/Access to let mainframe customers have access to the RETAIN database which was the IBM internal bug tracking system. IBM used to have a Field Engineering Division comprised IBM employees who were on-site with the customer to fix problems, and folks back home to support them and run RETAIN and other things. The on-site employees who did software support were called program support representatives (PSRs) and spent much of their time reading hex dumps and code (usually on microfiche[2]) so that they could interact with RETAIN to find fixes and file bug reports.
This was a tremendous cost, and IBM had several programs to offload some of this on the customers. While writing the client code to access RETAIN, I worked with an ex-PSR. The sessions with him were painful, because since all he'd looked at was code from a bug-finding perspective, he thought that any design or implementation he'd seen before was going to be horribly flawed, and anything he could dream up on his own, no matter how complicated or unusual would be better.
For the most part the code he had looked at wasn't total crap, it was that he was always dealing with little crappy bits of code.
James I do exactly as Steve said, quite often. I look at a piece of code, think this is crap, rewrite it to remove 10 lines, and am always surprised by what a better grasp the author had on the problem than I did.
That's why I love test driven projects so much. They instantly show all of the assumptions I just violated.
Don Refactoring is interesting.
This is actually one of the reasons I like pure (side effect free) code so much (especially when it has a good type system): I can refactor and be /sure/ I did the right job. Purity gives us a guarantee — in the language — that I can replace definitions with equivalent ones, throughout a program, and be sure things are still correct. This goes a long way to having hackable, fluid code that's a pleasure to hack on.
The "I can't break this even after a few beers" principle. :)
Steve Ultimately you want to get to the point where you can understand any code you look at. It's just like reading natural languages. You don't want to go through life with certain books being inaccessible. You probably won't like them all, but you should be able to read and understand them.
It's much easier to learn to read a programming language than to write in it. You should learn pidgin-X for as many values of X as possible. Pidgin == read it well enough to be able to guess what the code is doing and explain it to others, at least at a high level. If a new language starts to gain popularity (e.g. Scala, Clojure), every programmer should spend enough time reading the docs and tutorials to be able to follow the code and understand the basic facilities offered by the language. This should only take a couple of days at most, even for fairly unusual languages like PostScript, Forth or Prolog. After some practice with this, learning pidgin-X usually only takes an hour or two.
Reading code in an unfamiliar language or style can be a painful experience for us. Nobody likes to be a newbie again. The important thing is to take it slowly, and do the necessary research to ensure that you understand what's going on.
Don I remember as an undergraduate printing off and memorizing sections of the Haskell Prelude (the core library of functions for Haskell). Each function felt a bit like a jigsaw puzzle piece, with the type specifying precisely how it would glue with other pieces. This kind of thing:

  id                      :: a -> a
  id x                    =  x

  (.)       :: (b -> c) -> (a -> b) -> a -> c
  (.) f g x = f (g x)

  const                   :: a -> b -> a
  const x _               =  x

  flip                    :: (a -> b -> c) -> b -> a -> c
  flip f x y              =  f y x
Reading this code was like reading a haiku. Dense, rich, beautiful.
I think our goal as programmers is to become fluent in this language of concepts, to be able to converse directly in it. And I think languages that get at the concepts more directly are closer to that ideal of programmers communicating directly in this computational language. Some languages get at that better than others, and abstraction is a key part of it. I was happy to see a comment on twitter a couple of days ago along the lines of "the #haskell people are just talking in code!"
Diomidis Your reference to the Haskell prelude brings back fond memories; this was indeed beautiful code. Some years earlier (early 1980s), another revelation for me occurred when I looked at the source code of the C library that accompanied the Computer Innovations C86 compiler. That was the first time I realized that one could build a rich (for the standards of those days) library of functions using just the facilities of the C programming language and a few system calls. Similarly, I marveled at how one could write efficient yet structured, readable, and well-documented code when I laid my eyes on the spreadsheet example that came with Borland's Turbo Pascal. Finally, I appreciated how comments and meaningful identifiers could make even the most dense code easy to follow, when I read the listing of the IBM PC BIOS.
I wonder which code would today's programmers use to learn from. With the prevalence of open source software, one problem is that there is so much of it that we're spoiled for choice. For the same reason, I think few appreciate being able to read the code behind a functioning artifact, and therefore don't take the time to read it.
Rick And if they don't they are missing a lot.
One of the great things about open source is that if you have questions, the answers are there for the cost of a bit of code reading. I've been involved lately on ruby-talk in yet another discussion of the subtleties of eigenclass vs. singleton class vs. metaclass. I spent some time today once again reading the C implementation of MRI, and comparing and contrasting what the code says with what has been written in various Ruby books, blogs, and fora and got a better appreciation of how the community arrived where it is on the issue today.
Often when someone asks a question, either directly to me or on an open forum, I find myself scratching my head a bit and going to the code to find the answer. Since I've built up my code reading skills over the years, I can usually find it.
James Rick has a great point here. Code is the language of our shared knowledge. Those who can read it are as helpful as the those who could read were in times when literacy was uncommon.

1 Here's another little blurb that didn't quite fit into the discussion, but was worth preserving.
James This is so true.
When I was less experienced, I often wondered, "how can I get this down to one line?" In contrast, yesterday I was hunting around in one of the largest code bases I maintain for areas of too much magic I could unwind into boring code. Incidentally, we did that so our documentation tool could read them correctly.
I think my love of clean code shows my growth more than any other trait. I'm even twice as likely to use a pretty code library I've read, just because I know I can always pop the hood and see how it's doing something or track down a bug if there is one. I guess I'm a code snob now.
2 While it really doesn't fit into the discussion above, I really couldn't resist this little interchange from the mail we exchanged.
Steve Umm.... to be fair, if you find yourself reading hex dumps on microfiche today, then it's probably OK to rewrite it without thinking first. ;)
Rick Well the dumps were printouts, the source code was on fiche. This is back in the days when eliminating goto statements was SOTA. <G>
Click here to Tweet this article


Parag Shah said...

Hi, I enjoyed reading this post a lot. Looking forward to more posts in your '5 questions' series.


gnupate said...

Thanks Parag. I'm glad to hear people are enjoying it. Did you learn anything from the discussion?

Parag said...

Hi Pat,

Diomidis' comment about how we should consciously try to understand the lower levels of the programming language, and then work towards understanding it's idioms, coding standards, how different parts of the language play with each other and then work our way upto the higher level constructs, was very helpful.

Though I do not think this is a linear activity. Perhaps all of these will happen simultaneously. At the same time, I think it will be helpful if we look out for filling holes in our knowledge (at all of these levels) as we work with a language (either by reading or writing code)

Even though I have been trying to do this as I learn Groovy and Grails, I have not applied this concept well enough, and I feel like it is the one thing that can help a developer master a language and programming in general.

Your post also prompted me to think about having group code reading sessions, where we can pick up a small open source project or a part of a larger project and then try to understand it in a group. Perhaps like a 'code reading dojo'

tatiC said...

Great Post! :) Reading code is precious! Learn a lot with this practice! Cheers!