On Ruby: 2012

Thursday, November 01, 2012

Reviewing Annual Reviews

Almost everyone seems to hate end of year performance reviews. Done correctly though, they could be celebrations of your accomplishments. What would it take to make them more exciting, more interesting, or at least less painful?

How about this for an annual review?

Maybe we're not going to see videos with voice-over announcers, screaming fans, or a pulsating soundtrack. Surely we can do better than a drab document that briefly mentions the couple of things that we remember from the last year, right?

Here are some thoughts that might make your annual review a more positive experience:

Keep records during the year. What have you done? What have people said about it? What did it mean to your org? The more data you have the easier it will be to create a year end review that shines.
You might not be the one writing the review, but you can create something to send to your boss to help her remember what you've accomplished this year.
Think about how you want to organize your review. You don't have to write things chronologically. Maybe you'd rather pull out some specific themes and follow them?
Remember to prioritize your entries by impact too — 2nd, 3rd, 1st is a nice order to help keep the big wins 'top of mind'.
Include comments from others. Quoting an email or notes from a on-on-one is a great way to reinforce the value that others see from your contribution.
Work from established goals. Both yours and your organization's.
Use last year's review and your records from the current year to hold personal retrospectives periodically. Make sure you're on track and excited to move forward.

And, who knows, a soundtrack couldn't hurt, right?

Monday, June 11, 2012

This is a book that I wish was on my son's required reading list. Not that his code is hard to read (for someone in their first programming class), but that there are all kinds of bad habits that wouldn't need to be broken if he and his classmates spent some time learning what good code looks like before they started to write their own.

The Art of Readable Code from O'Reilly is a quick, easy read with a lot of useful ideas for new programmers. It weighs in at 180 pages, but there's a lot of well used whitespace and a number of (mostly on topic) comic panels in those pages making it seem shorter.

Part one covers naming, code layout, and writing comments; parts two and three cover the meat of refactoring; and part four discusses testing and gives an example of applying the ideas in the book to a small coding project.

The book's examples are in C++, Python, Java, and Javascript. I would have appreciated seeing some examples in other languages as well (haskell or scala might be good candidates), especially where that language might obviate or change the advice given.

Truth in advertising note, O'Reilly sent me a free copy of this book to review.

Thursday, June 07, 2012

The Linux Command Line

As a long-time, professional Unix/Linux sysadmin, I spend a lot of time on the commandline. I've grown pretty familiar with it, but I often find that junior teammates don't have the same familiarity. They often grew up in a world of windows and GUIs. That means I spend a lot of time helping them learn the ropes.

When I saw that No Starch Press had published The Linux Command Line, I wanted to take a look and see if it would make a good primer for the new guys on my team. Then, the good folks at No Starch sent me a review copy and I figured I'd better act on it.

This is a decent sized book, weighing in at 430 pages and 36 easily digestible chapters. The writing and layout are pretty good, and there are a ton of examples.

Part One is 10 chapters (~110 pages) that cover the basics: built-ins and simple commands; redirection; command line short-cuts (for bash in emacs-mode); permissions; and processes.

Part Two is 3 chapters that will jumpstart you getting beyond command line use. These talk about: setting and using environment variables and startup files; using vim; and making the command prompt more useful.

Part Three is a bit meatier. This contains 10 chapters covering tools for: package management; networking; working w/ storage media; regular expressions; working with text, and more. A couple of chapters in here seem less than necessary (E.g., Chapter 23 'Compiling Programs') but if you don't need them, you can always skip over them.

Part Four dives into scripting, where the true power of the command line comes out. There are 13 chapters which cover: designing scripts; conditionals; looping; parameters; and more.

My biggest complaint about this book is its bash centricity. It would have been nice to throw in a chapter and some sidebars talking about how other shells differ, or mentioning those parts that only bash provides. Since dash is becoming more frequent on Linux systems, it would seem like a natural comparison to make in the text.

Still, this is a good book on using the Linux Command Line. If you're just getting started with Linux, or you have a friend and/or coworker that needs some help, this is a good place to start.

Thursday, February 16, 2012

The Art of R: interview and mini-review

The Art of R Programming is an approachable guide to the R programming language. While tutorial in nature, it should also serve as a reference.
Author Norman Matloff comes from an academic background, and this shows through in the text. His writing is formal, well organized, and tends toward a pedagogical style. This is not a breezy, conversational book.
Matloff approaches R from a programmer's perspective, rather than a statistician's. This approach shows through in several of the chapters: Ch 9, Object-Oriented Programming; Ch 13, debugging; Ch 14, Performance Enhancement; Ch 15, Interfacing R to other languages; and Ch 16, Parallel R. I do wish he had spoken to using R with Ruby as well as C/C++ and Python. I also would have liked to see a chapter on Functional Programming with R, especially after the teaser in the Introduction.
I asked Norm and an R using friend if they could help me get my head around things a little better, and the following mini-interview is the result.

Almost every language has some kind of math support. Why bother with R? Where does it fit in a programmer's toolkit?
Norm: It's crucial to have matrix support, not necessarily in terms of linear algebra operations but at least having good matrix subsetting capability. MATLAB and the Python extension NumPy have this, but I'm not sure how far they go with it. And since MATLAB is not a free product (in fact very expensive) I'm summarily excluding it anyway. :-)
Second, R has a very rich graphics capability, which really sets it apart from the others. You can see some nice examples (with the underlying R code) in The R Graph Gallery.
Third, R is "statistically correct." It was created by top professional statisticians in industry and academia.
Russel: As something of a polyglot, I find that each language comes with something of an attitude of how problems should be approached. The grammatical structure and keyword vocabulary of each language drives a way of thinking about problems, as well as what sorts of libraries must be created to cover what may be base structures and functions in other languages. R has a particularly rich data representation vocabulary which lends itself very nicely to a data-centric problem solving mindset. While many more general-purpose languages can, with appropriate libraries, deal well with data, R reduces the cognitive load required for working with multidimensional data sets. In my (relatively limited) work with R, I've come to think of R as a domain-specific language that happens to have some general-purpose functionality, while other languages such as Ruby, Python, Perl, etc., are general-purpose languages with many domain-specific libraries.
I really feel drawn to the idea that languages drive approaches to problem solving. It reminds me of the ##PragProg idea of a language of the year. With that in mind, what do you think a dynamic language (Perl, Python, Ruby, etc.) programmer going to find new and different in R? What about a programmer coming from a system programming language (C, C++, etc.)?
Russel There is much in R which is from the "dynamic language" camp you mentioned: dynamically typed variables, an interactive shell, dynamically loaded libraries, etc. These will be pretty quickly noticeable to a C/C++/Java/C# programmer.
The structure and forced-forethought enforced by those languages are part of their value proposition: they force programmers into design paradigms and ways of thinking that scale up well, while dynamic languages, with their looser syntax rules, do not enforce that sort of engineering discipline on the programmer. For highly organized people who think in very structured ways, dynamic languages are "freeing", while less structured thinking programmers can find that the lack of enforced structure puts a lot of onus upon them to be disciplined in their coding as program sizes get larger. For example, a simple flat namespace is great for a small program with a few dozen lines, but namespacing becomes much more important as your programs come to the thousands of lines and dozens of individual functions or components -- especially as programs become the shared workspace of multiple programmers.
I personally use R as a dynamic language, most of the time not even writing programs in it so much as using it in interpreted mode for data analysis and "analysis prototyping." In that sense, R does for data analysis what dynamic languages do for task automation: it allows you to easily play with scenarios and prototype your thinking about data quickly and easily. You can then codify the best of those techniques into a small (or large) program that can automate that work for various data sets.
Similarly, R has a very powerful and interactive help system. Most packages not only have a quickly available set of API and help documents, but sample data sets built right into the library. From a command line, R users can get examples of how to use almost any library, with sample data included specifically for that particular library.
R has some inconsistencies from its history that can make it feel more "old school" in some ways. For example,there are two object models and the older (S3-style) object model is widely used in older libraries. However, it's nowhere near as "bolted-onto" as languages like Perl or C. R has an extremely rich set of libraries easily available via CRAN (a la CPAN), but the flip side of this wealth is that these libraries work in many ways, expecting data in various formats, etc. Again, it's not as spotty as CPAN or the Python Cheese Shop, or even Pear—most packages are quite good— but it can leave some beginners feeling a little lost when they want to accomplish a certain task. That's pretty common in the open source world, of course, but can be an issue.
R's rich first-class data types build a foundation that is nicely added to by the various libraries and simple interactive shell. Enough libraries are written in native code that performance is generally top notch. For my part, I almost always find that the available libraries far exceed my generally limited statistical needs, so I rarely find myself needing to rewrite some particular statistical code. I'm not a statistician, so I find it quite valuable to not have to worry about that aspect of the work I'm doing in any given project. Additionally, the rich libraries generally spur me on to doing a richer analysis of the data than I would if I did not have such a fully-featured tool available.
Norm, in the Introduction of your book, you talk about R as a functional language. I wish there had been a chapter on this. Can you give some examples of what you mean? Russel, do you have any thoughts about R as an FP language?
Russel: Many languages have recognized the value of functional constructs and added at least simple implementations of lambda and map functions, first-class functions and the like . FP is generally considered to be more easily parallelized, and should thus scale better on modern multi-core and CUDA-like systems. This will be quite advantageous in large data processing jobs.
Norm: Every operation in R is a function. For instance, the operation y = x[5]is really the function call y = "["(x,5) Same for + and so on.
This is brought up throughout the book, starting with the vector chapter.
The biggest implication of this, in my opinion, is in performance. One can often speed up a computation by a factor in the hundreds by exploiting the FP nature of R.
What are some of the things you've done with R that show off it's power and/or niche?
Russel R works beautifully for many types of data analysis problems. I recently used R to generate annotated graphs of Bayesian content filter scorings against timestamps, with lowess smooth and regression line and other enhancements, all built into the graphs without additional effort. This was done for all permutations of the 5 variables used in the study which had tens of thousands of data points. I was using this as a script because of my need to regenerate the graphs repeatedly, but before I'd codified that process, I used R in a "tweak and go" sort of way, as R lends itself well to ad hoc data exploration. Adding and removing data attributes, filtering data, generating data models, regressions, etc., are all easy to do in an on-the-fly manner.
Norm: A fun application I've done is R code to analyze the differences and similarities between the various dialects of Chinese. It can be used as a learning aid for those who know one Chinese dialect but not another. This is an example in my book, in the chapter on data frames.

If you're interested in adding R to your arsenal of programming tools, this is a great way to get started.
Truth in posting—No Starch Press sent me a free copy of this book to review.

On Ruby

Thursday, November 01, 2012

Reviewing Annual Reviews

Monday, June 11, 2012

Thursday, June 07, 2012

The Linux Command Line

Thursday, February 16, 2012

The Art of R: interview and mini-review

About Me

Subscribe Now: Feed Icon

Most Popular Posts

My Best

Blog Archive

Links & Blogs