Wednesday, June 10, 2009

Reading and Testing Your Legacy Code

In this week's installment of Questions Five Ways, we're talking about testing and code reading as tools for dealing with legacy code. This week's participants are Cory Foy (@cory_foy), Dave Thomas (@PragDave), Antonio Cangiano (@acangiano), and Tim Bray (@timbray).


What is the relationship between testing and code reading when dealing with legacy code, and how can we use the two processes to improve each other (and the code we're dealing with)?

Cory When Pat first ran the question by me, I really liked it. I think primarily because it goes to the heart of what we've been saying for years, what Bob Martin says in Clean Code, and some of the core of the Craftsmanship movement — we have to be good at code, which means we need to be good at reading it.

Dave Sometimes I look at legacy code because I have to change it. Sometimes I look at it because I like to learn interesting new techniques (or learn not to do "interesting" new techniques). In both of those circumstances, I find code reading skills and unit testing to be complementary and mutually supportive.

Code reading lets me grok the structure of the code: to see how it hangs together, what the structure is, the idioms used by the developer, and so on. But all that research really just leads to a set of assumptions. And whenever I find myself building up a backlog of assumptions, I like to validate and verify. And that's where the tests come in. If the code permits, I'll slap in a quick test to document and validate an assumption I've made. If I think to myself ("OK, so she's storing dates as ISO strings" I'll see if I can write a test to check dates are indeed stored that way.)

If my reason for looking at the code is simply research and interest, I probably won't go much further than putting the occasional test stake in the ground. If instead I'm planning to make some change, I'll go further, and attempt to isolate the area where I'll be making the change with a ring of tests before making any changes. This is often tricky (sometimes to the point where there's a pragmatic decision to relax the standards a bit—there are few times where it's worth refactoring 10,000 lines of code simply so you can test a change that turns a field from green to blue). But every test you do manage to inject is one more fixed point. And the more fixed points there are, the fewer permutations you have of things that can go wrong.

You ask you you can use testing and code reading to improve each other: I think the answer is to use them in this way, iterating between the two, each time around the loop reducing your uncertainty.

Antonio Code reading is a required step to approach and understand any codebase. The existence of tests greatly simplifies the process of investigating features and how they work, while aiding the evolution of the existing code, thus providing some degree of confidence. In the excellent book "Working Effectively with Legacy Code", Michael Feathers characterizes legacy code as hard to work with, particularly due to the lack of regression tests. In my experience, while this isn't always the case, I've found that it's fairly common to deal with code that has no form of automated testing. In such situations, code reading and testing complement each other particularly well.

In the same book, Feathers defines "characterization testing" as the process of observing and capturing the current behavior of the application, so that tests can be added when none are already in place. Reading the code, running it, and creating some sort of safety net through testing allows programmers to gain insight into the codebase while attempting the arduous task of refactoring and improving legacy code. As Dave mentioned, it's a fundamental way to validate and verify assumptions.

Cory The interesting thing is that it is so natural to make the assumption that the skillset exists that we rarely mention it. I'm glad Antonio brought up Characterization Tests from WELC, because that was my first thought. I can come into a team that has a big heap of legacy code dropped on them and quickly show them how to write characterization tests. But if they don't already have the knowledge of how to read code, how to make the inferences, how to scan code blocks and branching structures - then we are going to have a much harder time.

The big joke is that Perl is "write-only" code. But I think too often any code we come across is write-only because we don't want to take the time to master the skill of reading code. Which really is the core of the NIH (Not Invented Here) syndrome - "I don't understand this code in 23 seconds of looking at it, therefore I reject it and declare we need to rewrite the entire module"

In fact, in my current team I fought the team hard against rewriting one of our applications. Oh, I agreed it needed to be rewritten - it was a monstrosity. But embedded in that monstrosity were business decisions and logic that had been built up over the past 6-7 years - domain knowledge that came from people that were no longer around.

I feel pretty confident that if our team had been much more skilled on the art of reading code - and not just producing it - we could have been able to be much more productive.

This is getting a bit long winded, but I want to tie this back to one other thing. In Lean Manufacturing (perhaps cliche, but bear with me) the concept of "stop the line" relies on people being able to know enough about what is supposed to be produced to recognize when something is wrong. I think we give lip service sometimes to "stop the line" in the software world, because we aren't calling it on other people's code. And while some of that is dysfunction of not wanting to challenge each other, some of it is because /reading/ code is considered a chore, and isn't nearly as glamorous as writing it.

You can't test what you don't understand, and you can't understand without reading and comprehending.

Dave I think that a lot of developers have that mindset, which in unfortunate, bordering on unprofessional. Perhaps that should be a qualifying interview question: "read any good code lately?"

Tim First, I haven't seen anyone echoing the so-true cliché: Reading code is harder than writing it. Furthermore, it gets harder as code gets older and grows a coat of defenses against corner cases and insane user input and so on; these obscure the original design, which is often hard enough to see through the code.

Dave I'm starting to learn the more formal side of music: I can just about read sheet music, but I read it beat by beat: it's slow and stuttery work. My oldest son is a really good pianist, and he sight reads very well. I get the impression that he kind of sucks in the shape of the whole page, and then just uses the notes to fill in the gaps. That's where fluency lies.

I think the same kind of fluency is there for the taking when it comes to reading code. When you first start, you're reading line by line. But when you get good at it, the program kind of speaks to you. You look at the shape of it, the indentation, the module and method structures, and somehow get a good feeling for how it all hangs together. The individual lines can then be read a lot faster, just as my son reads music.

Cory I think the music analogy is quite good. Great musicians can site read just about anything, but still take the time to learn it well when they need to actually do something with it.

I remember in high school our band competitions — both individual and as a group — were based on both rehearsed pieces and site reading pieces. An interesting exercise might be to pit individuals and teams in recreating logic in their language of choice — and then making the logic work in a language they've never seen before.

Tim How about reading test code? I think that's a big deal. In fact this very week I found myself forced to spelunk in among some of my own not-that-old code and had that familiar sickening "WTF-was-I-thinking" sensation. Fortunately, those particular classes had good test coverage, and ../test/* helped me a lot in figuring out why on earth, at the top of a really important loop, I'd said

next if path == '/'

Sometimes test code is easier to read than code code. Many of us go to a lot of work to eliminate duplicate logic and parameterize algorithms and encode behavior in data structures, and these are often good things to do, but they're not necessarily friends to readability. On the other hand a lot of us also tend to write unit tests in a fairly brute-force linear abstraction-free manner which while less elegant is maybe more obvious to the casual eye; and the eye is always casual, first time through.

Dave I personally read test code when I want to know how an API works. But generally, test code isn't satisfying to read, because you lose the context. You're not looking at the whole story of the program: instead you're looking at individual sentences, out of context. As such, I think test code has to be read very differently to the program itself think it's a more focussed, analytical kind of reading.

Antonio On a slight tangent to the subject, I think it's also worth considering the effect that testing has on the code that others (or ourselves three months later) will have to read. Code that was incrementally developed following a scrupulous TDD approach looks quite different from code that was developed without the aid of any unit tests, no matter how skilled the programmer.

Among many other obvious benefits, TDD leads to code that is far more modular and readable. This greatly benefits anyone involved with a code reading or code reviewing session. Reading tests surely aids in learning how an API works, but the sheer presence of good tests is a preliminary indication of the readability of the code itself.

Dave There are lot of code reading tricks. Ward Cunningham likes to load all the source into Word, shrink the font down to 2 points, and just look at the shape. I like to scroll it by really fast and just look at the patterns. But ultimately, I think the key to mastery is simply doing it. Choose code and just read it. But read it actively, not passively. Challenge yourself to answer questions about it, or find bugs in it, or look for refactorings, or whatever. But the more you do it, the better you'll get.

Cory I think the key is just do it. An adage that has been extremely helpful to me has been, "If something hurts, do it more". That holds true for TDD, for automation of tests, and for reading code. Code reading is something we all find painful, yet we don't bother to invest the time to make it right - which is as simple as looking at other code.

In fact, another helpful adage has been to learn a new programming language every year. By default you end up reading a lot of code to learn the new syntax and semantics, by default making you a better code reader.

Click here to Tweet this

No comments: