Tuesday, October 31, 2006

Book Rview: Javascript Phrasebook

I've been reading through the new Javascript Phrasebook from Sams recently. I'm not much of a Javascript guy (which is why I wanted to get the book), but I'm liking the style, content, and concept of the book.

Sams' phrasebooks, if you haven't seen one yet, are a cross between a pocket guide and a cookbook. The basic unit of the book is a section (a phrase or recipe), with most sections being composed of a snippet of code with some narrative context to help you make sense of it and a larger script showing the snippet in situ. The approach works pretty well.

The one thing I wish they'd had more coverage of was the use of Javascript oustide of web development. That's a pretty esoteric topic though, so I'm not surprised that it's not well covered — in fast, I'm kind of surprised that there was any coverage of the prototype based OO system that makes javascript interesting from this perspective.

I thought the sections were well written and useful. The value of the text coupled with the portability and the US$15 price tag make this a very attractive little book. If you're getting into web development, or just need to sharpen up your javascript, this book is a nice little investment.

Technorati tags:

New At RubyCorner

The hackers behind RubyCorner have put together a nice little Google Custom Search tool. It sits unobtrusively up in the upper left corner of the main section of their page, waiting for you to try it out. Not only have they put together a good list of major Ruby and rails resources, every blog registered there is part of the search domain.

The Ruby aggregator field is getting a bit crowded, and I really haven't decided which of the various aggregators I like best, but RubyCorner is leading the pack from my perspective. I like the look they're developed, and they've put some effort into a number of nice little features (like filters based on language, favorites and blacklists, and a bit of user preference tweakability).

One thing I'd really like to see them add, is a remem style cross index. I've brought it up to Anibal and Edgar, but it's a big project, and it's not like I'm contributing code to make it happen. In the meantime, I'll just keep wishing.

Technorati tags:

Monday, October 30, 2006

Book Review: Mongrel

While I was at RubyConf, I picked up a copy of an Addison-Wesley shortcut on Mongrel by Zed Shaw and Matt Pelletier. This one is a bit longer than Rubyisms in Rails, and costs a bit more ($15). It is every bit as good though, and well worth the small investment.

One thing surprised me about this book. I'd assumed that Rubyisms in Rails was produced in a landscape format because of its heritage as a presentation, but Mongrel is laid out the same way. I guess this is a design decision by Addison-Wesley, but I'm not sure why they went this way.

My favorite part of the book is the inclusion of Zed Shaw's opinions on programming. They're (mostly) bundled into little sections called "Zed Says", and they're gems! I also appreciated the (repeated) warning that deploying an app into production is not something you're going to do in a day — not if you want to do it right. I've been a systems engineer for a long time, and I'm amazed at how many people forget about everything outside of the codebase when it comes to making a web-based application work.

Mongrel users are also going to appreciate the information about making it work with a variety of web servers, extending it to do more stuff, and generally making it do all the tricks you're looking for in an application server.

If you hang around the mongrel (or Ruby or Rails) community for a while, you'll soon realize that Zed can swear like a sailor. If that worries you, rest assured, this shortcut is fairly tame (you'll read worse language on Tim Bray's web page) — it's like Zed said in his RubyConf presentation, "swearing is so 2005".

So, what are you still hanging out here for? Go grab the mongrel shortcut and get reading.

Technorati tags:

Using Microformats

I recently finished a shortcut from O'Reilly as well, Using Microformats. Weighing in at only 45 pages, this one is packed with all kinds of good information. As I read it, a number of little things (pardon the pun) sort of popped into place. I picked it up because I was interested in learning enough about microformats to start thinking about them for some personal applications (book reviews, interviews, and genealogical data). While I'm not ready to start designing yet, I certainly feel a lot smarter.

Three sections of the book that I found especially useful were; "Elemental Microformats Catalog", "Microformat Design Patterns", and "Compound MIcroformats Catalog". Having these examples cemented my understanding of the rest of the text. The information in "The Value of Semantic Markup" and "Syling Microformats" should also prove useful as I expand my own use of microformats.

Using Microformats is a good little book on the topic. If you're looking at adding some structured information to your HTML, it will make a great guide as you get started.

Technorati tags:

Ad Hocracy

In between my other reading, I'm still looking at Software Creativity 2.0 (I also wrote about it here). The third chapter, "Optimizing vs Satisficing", is all about knowing how far to chase perfection. Robert Glass comes down squarely in the camp of 'good enough' software. Then I hit the third essay in that chapter "Ad Hocracy" — 'What's this doing here?' I asked myself. It doesn't really fit cleanly with the first two sections. It's a rant of a different color (or something like that).

I read through the essay carefully. I kept hitting up against the difference between this semantic battle to restore the good name of ad hoc approaches, and the more noble cause of shipping 'good enough' software. Then it hit me, right at the end. He's really talking about the same thing. It's the same fight, just on an even larger scale. The hunt for the perfect, generalized answer is the ultimate in optimizing — when we can give that up and write software that 'satisfices' by solving the problem at hand, we can ship 'good enough', ad hoc software.

It's essays like this that help me understand why the first edition of Software Creativity became a cult classic. I'm really looking forward to getting my hands on a 'real' copy of Software Creativity 2.0.

Technorati tags:

Hacker Interview: Aaron Patterson

Aaron, thanks for agreeing to this interview. Would you please introduce yourself?

Aaron: Sure! First off, thanks for interviewing me. I first started programming to meet women, and that didn't work out. Fortunately I found that programming was something I loved to do. I started doing mod_perl in 1999 for Vehix.com (it was AutomallUSA when I started), then moved up to Seattle in 2001 to work for Classmates.com. When I'm not programming, I like to play piano, attend hip-hop dance class, cook, and participate in activities required of me as a member of the Puget Sound Mycological Society.

How did you find your way to Ruby?

Aaron: When I started at Classmates, we were an all mod_perl shop, and we switched to Java. I liked programming in Perl much better because I prefer dynamic languages. Basically, I became a Java programmer that was hoping and praying for Perl 6 to come out. A little over a year ago, I was talking to one of my friends about Perl 6, and he suggested I check out Ruby. I read a bit online, picked up a copy of the pickaxe, and I was hooked.

How much do you use Ruby on a day to day basis?

Aaron: Unfortunately not as much as I would like! Currently at work I use ruby for testing web applications I'm using. When I am developing an application, and I need to do something repetitive, I'll put together a Mechanize program that will do that stuff for me. At home I'll use it for just weird one-off projects, for instance analyzing KEXP's play list for a given month. I also put together a Rails site with google maps integrated so that I could keep track of where I've found edible mushrooms, that way I can return next season.

I've just accepted a new job doing Rails programming, so I should be programming Ruby all day soon!

What other languages are you using?

Aaron: At work, I use Perl (we still have some mod_perl stuff), Ruby, C, and Java. Too much Java. At home I do almost exclusively Ruby programming with some C once in a while. I've switched all of my little scripts at home to use Ruby instead of Perl, just so I could code more Ruby.

Have you looked at JRuby at all? If so, what makes it work (or makes it not work) for you?

Aaron: I have looked at JRuby. JRuby is very cool, but we don't use it at work for a number of reasons. First our application is highly tuned, and JRuby doesn't have the performance we need. I don't really see that as too much of an issue though because we can get around things like that with clever optimizations and more hardware. The main problem is that very few of the developers I work with know ruby. This can lead to *lots* of confusion when working with our code base.

What projects are you working on these days?

Aaron: Currently I've been working on Mechanize, which is a library for navigating web sites. It will manage cookies for you, you can use it to follow links, or fill out and submit forms. You can even have it upload files to websites for you. I'm also working on an iTunes client written in Ruby called Net::DAAP::Client. It will let you interface with iTunes shares, and anything that implements DAAP. Unfortunately the release of iTunes 7 has totally broken my client! I've also been working on a BetaBrite LED sign library, which lets you manipulate BetaBrite LED signs. I think it will work on any sign that Adaptive Micro Systems makes, but I don't have the money to buy all the signs so that I can test! Oh, if only Adaptive Micro Systems would set up a web cam for me!

As far as mechanize goes, could you give us a short example of how you use it?

Aaron: Sure. Recently I needed to populate a test school with lots of users so that I could do some testing, and here is the script I used:


require 'rubygems'
require 'mechanize'
require 'logger'

agent = WWW::Mechanize.new
agent.max_history = 10

1000.times do
  page = agent.get(' http://localhost:8080/registration/registration.jsp?cType=school&communityId=15613381&location=Classmates%20Test%20High%20School')

  page.form('school_info') { |f|
    f.year                = '1999'
    f.birthYear           = '1980'
    f.currentFirstName    = "#{rand_string}"
    f.currentLastName     = 'asdfasdf'
    f.affiliationLastName = 'asdfasdfasdf'
    f.emailAddress        = "#{ rand_string}@classmates.com"
    f.emailAddressConfirm = f.emailAddress
    f.postalCode          = rand_zip
  }.submit

  agent.cookie_jar.clear!
end

You did a great job at RubyConf in your lightning talk on BetaBrite. Other than getting some more hardware (Is anyone at Adaptive Micro Systems reading this? Throw this man a bone!), what's on the BetaBrite horizon?

Aaron: Well, I've tried contacting Adaptive Micro Systems, but was unable to get through the customer service barrier to someone that might be able to help me. However, I have found a place to buy the new BetaBrite hardware which supports more colors and transitions. The new hardware is serial over USB, and hopefully that means faster communication with the sign. Its a good thing I have a day job to buy new hardware so I can test my code!

Coming out of RubyConf, I always find myself with a big pile of new ideas. Which talks inspired you? And, what new ideas do you think you'll be chasing because of them?

Aaron: The presentations at the RubyConf this year gave me more ideas about re-factoring my code than brand new projects. I am just starting a new project that will be a native extension, and the presentation on mkrf made me particularly happy. I have written a couple small native extensions before, and I honestly thought I was the only one who found mkmf hard to use!

Another example of refactoring ideas came from Ryan Davis' presentation on hoe. I am currently going through my projects and converting them to hoe. I have definitely been seeing my Rakefiles get out of sync, and I'm glad to see that someone has taken the time to solve that problem.

Gregory Brown's presentation on Ruport definitely inspired me to try reporting on some of our data at work using Ruport. I'm not part of the reporting group here at Classmates, but I wanted to see how easy it would be to reproduce one of our reports using Ruport. I ran in to a couple snags, but I was able to sort them out and impress a couple guys on the reporting team.

The Seattle.rb seems to have an abundance of really strong Ruby hackers, and really amazing Ruby projects. How do you think being part of that community affects you and your projects?

Aaron: I feel very privileged to be part of a group that has so many strong Rubyists. Having people so talented around me inspires me to constantly improve my skills in Ruby, which I hope gets reflected in the projects I work on. Also, having people like that around is an invaluable resource. I try to attend each weekly meeting so that I can ask questions, get new ideas, and hopefully give other people new ideas. Seattle.rb is definitely one of the best user groups I have ever been a part of!

Outside of your own, what are your five favorite Ruby libraries?

Aaron: My top five has definitely changed since RubyConf! In no particular order:

  • RubyOSA. I have been playing with this library every day since RubyConf. I love it!
  • mkrf. I was dreading coming up with an mkmf script for the project I just started and mkrf made my life much easier.
  • hoe. I am in the process of converting my projects to hoe. Hoe is another library that is making my life easier.
  • heckle. Ryan Davis gave a presentation on this at RejectConf. It is like Jester, but for Ruby. This is very cool stuff.
  • rcov. This library encourages me to test all of my code. I really like utilities that encourage me to test. Especially if they involve progress bars and percentages. I like making the progress bars go up, and that improves my testing, so we all win!

Speaking of rcov, what do you think about the claim some people are making that code coverage gives developers/projects an unrealistic sense of security in their tests?

Aaron: I think that relying purely on code coverage to determine whether your software is tested or not is a problem. Code coverage is not an indicator of good testing. Just because the code was run doesn't mean it was tested, or even tested correctly! That is not to say that code coverage isn't important. I find it is a useful tool in my testing toolbox, but it is not my only tool. As long as developers know not to rely just on code coverage, then there isn't a problem. Fortunately, I haven't met any developers like that, so this seems like a non-issue to me.

Code coverage can't guarantee you are fully tested. But neither can fuzz testing, or heckling, or any other method. I don't think you can ever be sure that your code is completely tested, but using all of these techniques together can help increase your confidence.

From your perspective, what should the Ruby-core team be focusing on in the upcoming year?

Aaron: Unicode. Tim Bray gave an excellent speech at RubyConf which convinced me that Unicode is one of the most important things the Ruby-core team should be working on. I don't agree with removing methods like upcase or downcase, but I definitely think we need better Unicode support.

What's next for you?

Aaron: I am currently working on adding Javascript support to mechanize. This is something that I know a lot of people want (including myself), and I want to get something out there. I know this is a very ambitious project, but I've been toying around with a few ideas about how to get it accomplished. I will support Javascript at some point, hopefully sooner than later!

Technorati tags:

Friday, October 27, 2006

MountainWest RubyConf 2007: Call For Papers

The MountainWest RubyConf 2007, scheduled for the middle of March (we'll be publish the final dates in the next week or two), is opening a call for presenters. We will have about nine 45 minute slots for talks (with 15 minutes of Q&A/bathroom breaks between them). Ideally, we'd like to see a good mixture of Ruby and Ruby on Rails talks, with an emphasis on doing useful stuff and best practices.

Proposals should be submitted to pat.eyler@gmail.com in November and December of 2006 -- midnight MST (hey, no complaints, that's where we live) on Dec 31st is the cutoff. No late proposals can be accepted, no exceptions. A proposal should identify the target audience, give a short blurb about the presenter, and have a short abstract of the talk itself. Proposals should provide enough information for a small panel of judges to use as a basis for selection.

The selection process will take place in early January, and all submitters will be notified by email by the end of January. We hope to publish our conference schedule in February.

Feel free to suggest ideas here, but *please* send submissions to me nstead of just leaving a comment.

More on Continuations in Ruby

Continuations and ruby seem to be the meme around the 'Net. Chad Fowler posted a small piece rebutting the rumor that continuations are getting the axe (permantently), which read to Patrick Logan's Ruby Sucks post. Lots of other bloggers have posted about it (blogsearch). It's even come up in email between the various implementers.

It seems a lot of the people who are doing the implementation work on the alternative Ruby implementations don't see the value of continuations if they're going away (even temporarily). I won't pretend to be able make the case one way or the other, but maybe I can organize some thoughts that will let other folks do so more effectively:

  • Continuations are a part of Ruby 1.8, and will be part of some (probably not .1 or .2) version of 1.9
  • Continuations are hard to implement on some VMs
  • Not a lot of people are using continuations in Ruby (but some are)
  • One key aim of the alternative implementations is to codify what Ruby 1.8 is and build that
  • Most of the implementations are going to differ from Ruby in some ways &mdash e.g., JRuby doesn't get to use OpenSSL, so they'll have to fake it.

So, what to do?

  • If you're a continuations devotee, show us why they're important! Use them in a really cool app where not having them would make things suck. (RHDL and expat might be good examples, Seaside certainly is[1] — heck, even Rails uses continuations (not much though).) Even better, write an implementation for them for the VM you most want to use Ruby on.
  • If you want to see them go away, show us why continuations are just a bunch of performance chewing overhead! Replace them in apps that use them (or in little mockups), make your case.

In either case, talking about it on this blog (or others), or on mailing lists will only go so far. At some point, code needs to be written (or shown off if it's already there), decisions need to be made, and we'll all need to move along.

[1]Borges is/was a Ruby version of Seaside, it's basically died because Ruby's continuations were neither serializable nor performant (if I've got my history straight).

Thursday, October 26, 2006

More Implementers' Progress

Wow! Charles Nutter is indefatigable. Since the implementers summit, he's joined the YARV mailing list and been asking questions and making comments there, he's gotten the RubySpec wiki fired back up, been plugging the rubytests project, and now he's announced that he's working on implementing a stack based machine based on YARV bytcodes on the JVM. This experimental work should help clear the way to a fast, stackless bytecode interpreter for JRuby.

I've also seen Kevin Tew cropping up in discussions about non-cardinal implementations. I'm looking forward to seeing more examples of this as the various implementers get more used to one another and as the communications channels get a bit more established. The future looks bright!

Wednesday, October 25, 2006

Just a quick Ruby Implementers post

It looks like Charles Nutter is taking another step in his quest for world domination. He's stepped in to help out on the Ruby Grammar Project. You can see the announcement here. You go Charles!

Tuesday, October 24, 2006

RubyConf 2006: Implementers Summit

Ruby Implementers Summit Attendees

To me, one of the highlights of RubyConf 2006 was the implementers summit we held. The idea was that we'd grab a quick dinner together, eat and chat, then sit down to some serious discussion of what it means to implement Ruby and how to get the various projects working together.

We had a number of people attend: Paul Brannan, Nathaniel Talbott (Test::Unit), Charles Nutter (JRuby), Evan Phoenix (rubinius), John Lam (RubyCLR), Nick Sieger, Devlin Daley, Kevin Tew (cardinal), matz (Ruby), Koichi (YARV), Eric Hodel (metaruby), Tim Bray (not pictured), Zed Shaw (RFuzz, not pictured), Ryan Davis (not pictured, metaruby), and I. There are five people in the photo who hung out to listen, Cease Larry and Doug Tolton came in from Utah, and I know another is Jack Wohr (from Dr. Dobbs) — if you can identify the others, please, let me know.

Since this was the first time we've tried something like this, a lot of the discussion was centered around "What kind of problems are you having?" and "How are you dealing with that?". This actually proved very useful in pointing out common ground for the more valuable (to me at least) part of the discussion. How the various projects could work together. (You can see an (ugly) pile of notes at wiki.rubygarden.org/Ruby/page/show/RubyImplemntersSummit2006Nov.)

There were a couple of specific tasks that came out of the meeting. First, there was consensus around the need for a 1.8.5+ spec that the various groups can work against. Charles Nutter has already put together a Wiki (and matz has already contributed to it), but it's down due to hardware problem at the moment.

The second task was to build a common testing suite. Ryan and Eric are contributing their BFTS code and tests, which Kevin and Evan are moving into the rubytests project at Rubyforge. The other projects will all add their own tests into this new repository. I'll also be adding Legion to it, and Zed will be helping build a fuzzing framework for the implementations to share.

We also agreed to hold more of these summits going forward. I'm hoping we'll see them as a semi-annual event.

Ruby Implementers Summit Attendees

Photographs curtesy of Tim Bray who, oddly enough, doesn't appear in either of them ;). Tim also mentioned the summit in this post

**Doug Tolton has identified himself (and corrected my spelling of Ceaser's name.**

Monday, October 23, 2006

Author Interview: Joshua Smith

Last week, I posted an interview with Matt Wade, the editor of Practical OCaml. Here's my interview with Joshua Smith, the author


Joshua, most of the readers here don't know you, would you mind introducing yourself?

Joshua: Sure. My name is Joshua Smith. I am a writer and consultant living in the suburbs of Washington, DC. I was a Unix Sysadmin for a several years, and have also been a programmer. Most of my professional experience has been in the financial industry (specifically trading and options clearing). I've been writing in Ocaml for a few years and have done extensive work in Ocaml (obviously), Perl, Python, and Java. I've also dabbled in quite a few non-mainstream languages like Prolog, Forth, a few LISPs.

Why functional languages?

Joshua: Well, for starters learning a functional language makes you a better programmer. While that may seem somewhat startling, I feel that learning any programming language makes you a better programmer (except BASIC, perhaps). This is also one of the reasons that some languages borrow features and idioms from other languages. Expressiveness and clarity are as important to writers in artificial languages as they are to writers in natural languages.

Another thing is that functional languages allow the programmer a different (and in some cases better) way to reason about programs. Some functional languages (like Ocaml) are also constant languages, which means that they do not have variables in the same way that languages like Ruby have variables. While this can make some things more complicated (keeping state, for example) it creates a situation where you cannot have a variable get "stepped on". Writing programs without mutable values means you have to think about solving problems differently than if you have mutable values.

Why OCaml?

Joshua: Ocaml offers (in my opinion) the best features of functional programming but does not require all of your programming to be purely functional. Ocaml offers a robust object system, for example, so you can use functions or methods depending on which solves your problem more effectively.

Ocaml is also very, very fast. The native-code compiler generates executable that run nearly as fast as C on many platforms. The native-code programs also can be distributed without a runtime, which can greatly simplify deployments.

Why a book on it now?

Joshua: There really haven't been any english-language books on Ocaml published by a main-stream press recently. I think there has been a lot of interest in Ocaml for some time , just look at the ICFP contest winners and you'll see a lot of Ocaml teams. I think the language and the community are now big enough that people are asking "Is there a book on this?"

Functional programming languages don't seem to have spawned many 'mainstream' (non-academic) books. Why do you think your book will succeed in this space?

Joshua: I'm not sure I agree with you. LISP is, by far, the most popular functional programming language I can think of and there are several non-academic books on Lisp. Functional languages have gotten a lot of bad press in the past. There is also the problem that popularity begets popularity. So, when a programmer thinks about starting a new project in a new language she is more likely to start that project in a language that is popular.

To more directly answer you question: I think the book will succeed because there is a demand. When people see things like the ICFP contest and see Ocaml entries they get interested. However, when they look at the documentation available and see the most recent book on the language is very old, they start to rethink: maybe this is just an academic thing. Now there is a book, it's a practical book and I think people are going to be very happy about it.

What will it take to see more 'mainstream' books on functional languages in general and OCaml in specific?

Joshua:I think this will come when more people become less dogmatic about design issues. One of the things that I think is missing from a lot of developers minds is the concept of ecology. Programs exist and run in an ecology of systems, more so now than ever. This also means that programmers should have a more free hand to choose languages based on how it helps them fill an ecologic niche or function, rather than how it fills a political function. I accept that I am an Idealist.

If the demand is there for Practical OCaml will you be doing a second book? If so, what aspect(s) of OCaml would you like to write about?

Joshua: That is a very good question. At this point there has been no talk of another book. But, If there is talk I would be happy to write another book. One of the things I would like to do is create a book length program and annotate it. This is more in the style of Knuth's MIX machine and language, which is pretty ambitious. But I think that more programmers would benefit from that kind of work than a cookbook or similar.

If you were to imagine your ideal book on OCaml, what would it be about? Who would write it?

Joshua: Probably the book I mentioned above. A book length program that is fully annotated, documented, and described. This would include everything from design to distribution and maybe even end-of-life. One of the things I feel a lot of programming books leave out is the "OK, what now?" part of programming. Think of it this way: simply being literate in English does not mean you can write a novel. In order to create in any language you need more than simple literacy, you need exposure to other works (perhaps great works), you need to know more. This "more" is what I would like to see in a programming book.

I tried to include some of that "more" in Practical Ocaml, but there is only so much one book can do.

Why should people be interested in OCaml?

Joshua: It's a great language. Learning a language like Ocaml is also a great way to deepen your understanding of why types are important. Type-safety is, I know, a real mine-field of an issue for a lot of people. Personally, I used to think that type safety was a waste. Then I found that strong, static checks where a great way to have better (and more reliable) code. I've never looked back.

Another thing that people would be interested in is tools for creating and maintaining Domain Specific Languages (or DSLs). I have seen some meta-programming in Ruby, but Ocaml has facilities that are well beyond what most people think of when they think of "extending" a given language. Camlp4 is a tool that can let you alter or extend the syntax of Ocaml programs. This goes way, way beyond macros and template systems found in other languages. Ocaml also has a more traditional Lex/Yacc toolkit that allows you to build languages, processors and whatnot from scratch.

Most of the readers here are probably Ruby hackers. What interest does Practical OCaml hold for them?

Joshua: Ocaml has a very different approach to solving problems than Ruby does. Ocaml is strongly and statically typed. Ocaml has objects, but really is more of a functional language. Ruby includes some functional programming idioms, like Lambdas, but is really an OOP language.

The type issue (I think) would be the biggest draw. Ruby supports operator overloading and runtime type identification. Both of these allow for rapid development and some very nice idioms for code. They can also allow for some very strange errors and other problems (many of which can be addressed by Unit testing, but not all). It's also important to point out that Ocaml has a notion of type and type safety that is worlds away from C or Java. I've seen a lot of people use C/C++ as their measure for static typing and that doesn't work well when talking about Ocaml. Types are inferred in Ocaml (meaning you don't have to declare a string to be a string, or a float to be a float) automatically. That's not all, but type is a pretty big deal in Ocaml.

As an ex-systems administrator, what role do you see for OCaml in that space?

Joshua: Ocaml is a really fantastic language for admins. The ability to generate native code is great, Ocaml interfaces very well with C and has great tools for dealing with text. Top all of this off with the ability to easily create and maintain DSLs and you've got a big win for admins. For example, in the book there is an application that processes complex (multi-line) log files and the definition of a tripwire-like utility and associated language. I've seen (and done) these kinds of things in Perl and Python and it is so much better in Ocaml that it's not even funny.

Ocaml is also a great systems programming language because it focuses so much on safety. Having a security flaw in an administrative utility can be a very large problem. Especially since most security flaws are really just bugs. Ocaml eliminates a lot of the things that make secure programming hard while not handcuffing you in the process.

Could you give us some (short) examples of OCaml code?

Joshua: Sure, I've attached a small program that prints a report of how much disk is being used by what person on a unix system. It includes documentation.

disk_hogs.ml


(** Program to return the disk usage (by uid) down a given path.  This 
application recurses through directories, but does not traverse symlinks.
@author Joshua B. Smith 
@version 0.3
*)


(** {2 Text Output of this program } 

{v
josh\@bebop: ocamlopt unix.cmxa disk_hogs.ml execfile.ml
josh\@bebop: sudo ./a.out /home/
ftp uses 0 MB
www-data uses 27 MB
root uses 16198 MB
Unknown uses 3 MB
Unknown uses 5 MB
josh uses 10874 MB
v}
*)

(** {2 The Code } *)

(** [traverse dlist acc] traverses all of the directories listed in dlist, and recurses into subdirectories. 

@param dlist a list of strings containing directory names
@param acc accumulator
@return a (string * Unix.LargeFile.stats) list which is (filename,lstat)
@raise Unix.Unix_error Can be raised
@raise Sys_error can also be raised
*)
let rec traverse dlist acc = match dlist with
    [] -> acc
  | h :: t -> let listing = Array.to_list 
      (Array.map (fun x -> let fname = Filename.concat h x in
      (fname,Unix.LargeFile.lstat fname)) (Sys.readdir h))
    in
    let dirs = List.filter 
      (fun (fname,f_stat) -> f_stat.Unix.LargeFile.st_kind = Unix.S_DIR)
      listing
    in
      traverse 
 (t @ (List.map (fun x -> (fst x)) dirs)) 
 (listing @ acc);;
(** [calcvals statlists acc] Takes a list of (filename,lstat) (usually from {!Disk_hogs.traverse}) 
@param statlists (string * int64) list
@param acc accumulator
@return a  which is a list of (username, disk usage) tuples
*)
let rec calcvals statlists acc = match statlists with
    [] -> List.map (fun (id,size) -> let name = try 
        (Unix.getpwuid id).Unix.pw_name
      with Not_found -> "Unknown" 
      in (name,size)) acc
  | h :: t -> let (u,other) = List.partition 
      (fun (y,x) -> x.Unix.LargeFile.st_uid = (snd h).Unix.LargeFile.st_uid) 
      statlists in
    let subtotal = 
      List.fold_left (fun acc (_,elt) -> 
   Int64.add elt.Unix.LargeFile.st_size acc) 
 Int64.zero 
 u 
    in
      calcvals other (((snd h).Unix.LargeFile.st_uid,subtotal) :: acc);;

(** [runreport startdir] Builds a report using {!Disk_hogs.calcvals} and
    displays the result neatly.

    @param startdir Directory to start from
    @return unit
*)
let runreport startdir = List.iter (fun (i,s) -> 
          Printf.printf "%s uses %Li MB\n" i 
       (Int64.div (Int64.div s 1024L) 1024L)) 
  (calcvals (traverse [startdir] []) []);;

execfile.ml


let _ = Disk_hogs.runreport Sys.argv.(1);;

Are there any blogs or websites that someone interested in OCaml should be watching?

Joshua: Absolutely,

  • First there the main Ocaml site. Then there is www.ocaml-programming.de which is home to a lot of utilities and the GODI Ocaml distribution. There is also the apress website, which will have all the source code from my book and forums, too.
  • comp.lang.functional and comp.lang.ml are both USENET groups that cover a lot of functional programming and ML (including Ocaml) topics. Oh and John Harrop's web site (www.ffconsultancy.com/index.html) Markus Mottl's website (www.ocaml.info) also includes a bunch of stuff.
  • I would also be remiss if I didn't include a link to my technical editor for the book Richard Jones (www.merjis.com). He helped out a great deal, and he is a _very_ sharp guy.
I'm sure I'm forgetting someone...

What technical books are on your list to read right now? What about non-technical books?

Joshua: Skyttner's "General Systems Theory" and "The Art of Error Correcting Coding" by Robert H. Morelos-Zaragoza are the next two technical books on my list, after I'm done reading "Stumbling On Happiness" by Daniel Gilbert and a re-read of "Slaughterhouse 5"

What books (technical or not) have influenced you the most?

Joshua: This is a really difficult question. I'm a voracious reader and I have been lucky enough to have many books influence my life. On the technical side, I would have to say "Object Oriented Software Construction" by Meyer changed my programming. I didn't "get" OOP until I read that book and it changed how I wrote programs. On the other hand, Holland's "Adaptation in natural and artificial systems" changed the way I solve problems. Personally, it would probably be a dead heat between "Getting To Yes" and "Pirke Avot" as to which book had the greatest and most broad influence on how I live, but that's a longer story.

If you ask me tomorrow you might get a different list on the technical books. While I'm a lover of books and the printed page I am a inconstant one.

Outside of OCaml, what programming languages should a Rubyist, who probably already knows a bit of C and/or Java, learn?

Joshua: Honestly, I would probably say Forth. I know, it's a strange choice. But Forth, as a stack based language, is fundamentally different than the ones you listed. It also can help a programmer better understand using stacks for computation and storage. So many applications have a stack component to them, that a deeper understanding of stacks and queues is always a good thing. Besides, it's a super easy language to learn.

If not forth, I would say Python. Python is definitely one of my preferred languages (sorry Ruby).

RubyConf: Day Two

The second day of RubyConf 2006 was also my last. I needed to fly out late Saturday night and ended up missing both matz's keynote and the RejectConf — not to mention all of Sunday's talks. I was glad to see Saturday's talks though, and to get another day of meeting up with people.

My favorite talks were Nathaniel Talbott's and Tim Bray's. I'd decided to step out during Laurent Sansonetti's, since I'm not a Mac user, but regretted it when I stepped back in toward the end of his talk to hear him talking about some changes to Ruby's core that he's experimenting with — I think he should have been at Friday's implementers' summit.

The lightning talks were great. (Josh Susser's talk had to be cancelled due to illness, so the filled the slot with an impromptu lightning round.) I thought about doing one, but the nine available slots were filled before I could sign up. Hopefully next year will feature a scheduled lightning round.

I was interviewed by Jack Wohr, of Dr. Dobbs, I haven't seen the interview show up at his blog covering RubyConf 2006 yet but there are a lot more good interviews there. It was pretty interesting to be on the other side of the mic. Hopefully I can use some of the things I saw him doing in my own interviews. Speaking of which, I was able to line up several interviews for the near future. It was really cool to be able to ask people in person, instead of having to coordinate everything by email. A couple of people also asked me about doing some of the interview as podcasts, I'm a bit apprehensive about it (I hate the way I sound 'on tape' and I'm not really ready to sink a bunch of money into audio gear) but maybe it will happen.

We (Mike Moore and I) also got some more work done towards our planned MountainWest RubyConf in the spring. We might have lined up a couple of major sponsors, which would help keep everyone's cost down. I also got some good feedback on the implementors' summit; it looks like people are interested in a reprise in the spring. I need to get a call for papers put together in the next week or so, so that I can post it.

Finally, I'd just like to tip my hat to two people:

  • James Gray — He's been a good friend by email and IM, and it was great to finally meet him. James, I looking forward to seeing you again at next year's RubyConf.
  • Hal Fulton — I've interviewed him, and have been reading an advance copy of his new book. When I caught up with him at RubyConf he signed the advance copy for my son, who's been wanting his own Ruby book for a couple of months. Hal, you really made my son's day. Thanks!

Saturday, October 21, 2006

RubyConf 2006: Day One

I did take notes, but Kevin Tew has been doing a better job than I did, so I sent him a copy of my notes to combine with his and create a better report. Nick Sieger also did an incredible job of note taking, so you might look there as well (I'll be linking to Nick's reports until Kevin gets our notes posted). So, instead of a detailed report of various talks, I'd just like to point out some of my thoughts and impressions.

Although all the talks were good, three really stood out to me on the first day; Takahashi-san's (yes, that Takahashi), Evan Phoenix's, and Zed Shaw's.

First of all, it was incredible to see Takahashi's presentations — 306 slides for a ~40 minute presentation! Getting the back story of Ruby was pretty cool too. I loved the (translated) chat log of the naming of Ruby. Who knew it might have been 'coral' instead. (Going with Ruby was the right choice matz.)

Evan's talk about rubinius was something I'd been looking forward to, and he didn't disappoint. rubinius is an implementation of Ruby in Ruby that translates itself into C. This allows all kinds of research opportunities, and should be leveragable (is that a word?) by all of the other ruby implmentations.

Earlier in the day, Zed mentioned to me that he wasn't going to swear during his talk ... I wanted to see it just to see if he could make it. He did (nothing stronger than minor scatology) — he also gave a great talk about the basics of Fuzz testing and RFuzz. Among other things, he mentioned my defunct RWB project and that he is in fact pulling some of the reporting ideas from it into the next release of RFuzz. He also convinced me that I need to fuzz r43, and learn more statistics so that I can make real sense of the results.

The coolest thing about RubyConf is meeting all the people though — and new. John Lam and Hal Fulton were both actively taking pictures throughout the day (what else would you expect?) Eric Hodel and Ryan Davis were there with a large contingent from the Seattle.rb. A big group of us (matz, Koichi, John Lam, Charles Nutter, Evan Phoenix, Ryan Davis, Eric Hodel, Kevin Tew, Nathaniel Talbott, Paul Brannan, Tim Bray, Zed Shaw, and I) held a Ruby Implementers Summit over dinner — lot's of interesting ground was covered, and some shared objectives were laid out (especially around testing and writing a specification).

I'm really looking forward to today's activities.

Friday, October 20, 2006

Before RubyConf 2006

It's really early right now, but it's also the first day of RubyConf, so I thought I'd make a quick post about days -1 and 0, so the deck would be clear to blog about all the cool stuff that happens at the conference.

Day -1 (Wednesday the 18th of October), was the day of matz's presentations at BYU. Things kicked off with a ~1 hour colloquium where matz talked about the background behind Ruby, and why it's a good language to use for many tasks. He made a number of great points that I plan on mining for future posts, but you can watch it here (sorry, it's a wmv file) — there were about 250 people in attendance. Then we moved on to a special URUG meeting (URUG is the umbrella group for the BYU-RUG, the UtahValley.rb, and the SLC.rb) where about 50 people spent an hour in a Question and Answer session with matz. Finally about 15 of us took matz out to Pizza and talked about Ruby and related stuff later into the night.

Day 0 (Thursday the 19th of October), proved to be both fun and full too. Kevin Tew I got to play tour guid for matz in Salt Lake City. Other than my nearly killing matz with a revolving door, things went pretty well. We even snuck in another hour of Ruby discussion during a brown bag Q & A at my day job. Then it was off to the airport to make sure matz got onto his plane to Denver and RubyConf, and meet up with more URUG Rubyists for our flights (URUG ended up sending 12 people to RubyConf this year — I think that number is amazing!)

I'd hoped to finish off a book review for Hal Fulton's wonderful The Ruby Way. Sadly, I hit my seat on the plane and fell asleep before we got off the ground. The review will have to wait, but you don't need to ... It's a great book, you should go buy a copy now. (If you're at RubyConf, I think Hal brought some copies to sell.) You can even read a preview and an accompanying article over at InfoQ.

Wednesday, October 18, 2006

Author Interview: Joshua Smith (Prequel)

Matt Wade is an editor in the open source line at Apress. As an editor, he's had the priviledge to work on titles such as Beginning GIMP and Pro PHP XML and Web Services. He's also a freelance web developer, database analyst, and all around technology junkie. Matt lives in Jacksonville, FL with his wife and three children. You can find some of his musing at opensource.apress.com.

I recently interviewed Joshua Smith, the author of Practical OCaml. During the course of our discussion, he brought up his editor and experience with Apress. So, I decided to take a couple of minutes to talk to Matt as well. Here's what we talked about:


What made you decide the time was right for a functional programming book? And an OCaml book specificly?

Deciding what to publish on and what not to publish on is a bit of a tricky process. Sure, there are places where you know you'll be publishing: PHP, .NET, MySQL, Oracle, etc. The key to success in publishing though is to find that gem in the rough that is on the verge of making waves. Functional programming has seen a rise in popularity over the past year or so. OCaml is one of the forerunners in that popularity. The rise of F# popularity is another item that tells up that functional programming is on the rise. Given those factors and a bit of a 'gut' feeling, we thought it would be a good time to publish on OCaml.

What will it take to see another OCaml book from Apress?

Buy this one and ask all your friends to buy it as well :). Seriously, publishers are here to make money. Sure, we love to publish books on great technology and enjoy knowing that people can learn and further their careers from the books we publish, but when it comes to push and shove we need to make a profit from a book. If we find that publishing in the OCaml space does that for us, we will certainly continue doing so.

What about a book on another functional language (like Haskell)?

We are actively looking at other functional languages and considering them for publication. James Huddleston, another editor at Apress, has brought on Robert Pickering to write 'Foundations of F#' for us. You can find more information about that book at its page on the Apress website. It's expected to publish early next year. James and I would love to look over any proposals for function programming books. If you'd like to pitch an idea to us, send an email to editorial@apress.com.

To you, what's the best thing about OCaml?

I think one of the best things about OCaml, when compared to other functional languages, is the libraries available. I hate reinventing the wheel! OCaml has a huge number of libraries available to cover just about anything you'd want to do and they are all easily found.

Why the average non-OCaml programmer pick up a copy of this book?

Pick up this book if you want to learn how to program a functional language. You'll find a variety of practical projects that you can apply immediately to help you create your own applications. Personally, I find programming books that have nothing but academic examples boring and tedious. I want practical, real world examples that I can put to use. This book offers that.

(Look for my interview with Joshua on Friday.)

Tuesday, October 17, 2006

Book Review: Rubyisms in Rails

A little while ago, I interviewed Jacob Harris

about the Rubyisms in Rails shortcut he wrote for Addison-Wesley. This time, I'd like to let you know a little more about the book (or booklet, or whatever we're supposed to call these things).

Let me start out with some praise up front. This is a great little book, and a good investment for learning more about the Ruby idioms that undergird Rails. Jacob's writing is clear and concise, his examples are well chosen and well explained. Buying this book in $10 well spent.

Jacob starts out with a quick run at the Philosophy of Ruby, and the nuts and bolts mechanics of the language. This section takes up about the first quarter of the book. He follows this up with solid explanations of Duck Typing and Symbols. From there he moves on to a section on Blocks and another on Metaprogramming. Finally, he moves on to discussion Domain-Specific Languages.

At each step along the way, Jacob starts with simple examples to cement the readers understanding before moving on to show Rails code, or code in Rails that exhibits the idiom he's explaining. It's a pattern that works quite well.

This shortcut actually started life as a presentation Jacob made to the NYC.rb — a heritage that still shows through in the slide-like landscape pages of the book and Jacob's laid-back writing style.

Given my positive experience with this title, I'm looking forward to reading more of Addison-Wesley's Ruby Shortcuts.

Monday, October 16, 2006

A Peek ahead at Two Soon-to-be Released Books

I recently got pre-press copies of two books that I've been looking forward to for a while, The Ruby Way (by Hal Fulton) and Software Creativity 2.0 (by Robert Glass). I recently interviewed both authors (Hal and Bob) which only whetted my appetite for these books.

These look like two great books, although of very different types. The Ruby Way, 2nd Ed. is a great compendium of Ruby knowledge — it's part reference, part cookbook, and part guide to the dao of Ruby. Software Creativity 2.0 is a language agnostic look at the creative tension that needs to exist to make sotware development great. Since these are previews, I'm not going to do a full review yet ... but I'd like to share something about each book that made me grin.

In his description of local Ruby groups, Hal uses my preferred term (Ruby Brigade). This strikes close to home, since I made up the term. You see, Doug Beaver, Ryan Davis, and I were hanging out at the first Seattle meeting trying to figure out what to call ourselves. I tossed out the idea of the Seattle.rb (to match the foo.pm Perl groups). Doug and Ryan both liked it, but we needed to come up with an expansion for the rb acronym. 'R' was easy, but the 'B' had us stumped for a little while, eventually I hit on the idea of a Ruby Brigade and things just went from there. So, (queue my Paul Harvey voice) now you know the rest of the story.

In essay 1.7 (The Strange Case of the Proofreader's Pencil), I got a glimpse into Bob's early (pre-programming days). There were a couple of times during our interview that I felt like I was the wrong person to be doing the interview. Bob's forgotten more about programming than I know, and I felt very much like the young whippershnapper getting involved with his betters (not Bob's fault at all, he was incredibly gracious to put up with an upstart like me). Reading about Bob's interactions with a boss and being able to put myself in the same role as I read the essay made me feel a little bit closer to him. Maybe there's hope for me yet.

Thursday, October 12, 2006

Author Interview: Jeremy McAnally

Today's interview is with Jeremy McAnally, the author of the Humble Little Ruby Book. Jeremy, would you mind introducing yourself?

Jeremy: My name is Jeremy McAnally, and I'm a 20-something developer, designer,and author. My wife and I live in Knoxville, TN, where we are studying the Bible and Theology at Johnson Bible College. I've been programming for about 10 years now, starting with Visual Basic 3 (who didn't start that way, right?) and moving through the language web up to dynamic languages most recently. You can visit my blog (which I try to update at least once per millennium), the book's page, and my rarely used but ever present "personal" page.

How did you discover Ruby?

Jeremy: I was working on a project for my last staffed development position; it needed to take data received from fob scanners that were hooked into the serial port and behave differently depending on which scanner was scanned (e.g., some needed to were posted by doors that they needed to unlock while others needed to simply note the scan in a database). The existing version was in Java, very annoying, and impossible to maintain. I started rewriting in C#, but ran into the hurdle of a dire lack of reliable serial libraries mostly because I was using Mono rather than the "official" .NET and opening the serial port as a file was unreliable. I looked at a few languages, and then ran into Python. The dynamic language paradigm struck a chord with me, and so I looked into a language that didn't require whitespace. I love Python, but I was very frustrated by the prospect of having tabs delimiting my code. I found Ruby and toyed with it, but at the time the speed was a hindrance to actual use (i.e., having 800+ people scanning across four scanners within two to three minutes was a daily occurance; speed was essential). I settled on Python, but didn't forget Ruby. After I finished that project, I was shopping around for a PHP alternative, found Rails, and never looked back.

What role does Ruby play in your day to day work?

Jeremy: I develop Rails applications for contract clients and my college. I've developed a number of Rails applications (including some for myself) and a good deal of Ruby applications of varying complexity (all for contract clients).

Recently I've been using Ruby in the graphics design shop I work here on campus, namely using acts_as_taggable to create an image tagging database. We have thousands of images floating around, and the only way we can navigate them is the filesystem (which is very one-dimensional, and very static). I decided to create a richer interface for locating images, and Tagbit was born. I've also used Ruby to automate a number of processes down there, such as processing our yearbook mugshots and a few little other things that were mostly 5 minute hacks to save my fingers.

How have other folks reacted to having Ruby brought into the mix?

Jeremy: Great, actually. Most of them aren't tech savvy and never notice, but those that do notice are really fascinated by the syntax. "It's like I can read it!" It's neat to be able to explain what's going on and they actually understand it.

How did you come to write a book about Ruby?

Jeremy: When I learned Ruby and Rails, I had to pony up for the PragProg books at $40 each. That was expensive, especially to a struggling college student. I had been rolling around the idea of creating eBooks and using Lulu to print them, but I had never taken it seriously until I realized the potential impact of a low-price, beginner's minded Ruby book. I asked a few people about interest, received a very positive response, and began work immediately.

Why did you choose to self publish?

Jeremy: First and foremost was cost. I wanted to keep my book affordable to keep the entry barrier low for those who wanted to learn. I get frustrated when I see books (or even more appalling, Bibles) that are $40 because I know that production isn't that expensive in most cases, but then again, I also must weigh the fact that they have a full-blown staff behind that book that has edited, typeset, and designed all of it. So, I realize the reasons for the cost of traditionally published books, I just don't think they're completely necessary (especially for some projects).

Secondly, I liked the idea for a community controlled and, essentially, produced book. I e-mailed a number of people and asked what they would like to see in another Ruby book, what could I do differently? I put up a bugtracker during the "beta" stage where people could open issues and I would fix them. I proposed a number of font and cover options to development-savvy designers. Nearly every part of the book (outside of actually writing it) was a collaborative effort with someone, and that made me confident that I was producing something that people would be interested in. I couldn't do that sort of thing with a traditional publisher. Of course, I also left the security of an advance and promise of marketing, but I felt that if people were so willing to invest themselves in it, then they would probably invest a small bit of money and time to promote it also.

Another factor I considered was agility. With self-publishing, you get the luxury of being in full, flexible control of every part of the process. Obviously, this means that you can write a book on a very timely subject, such as the new features in Rails 1.2 or Ruby 2.0, and still get published (because you choose the books you publish), but it also means that you can publish these books and keep them up to date. You can't do that with a typical traditional publisher. If something changes with what you wrote on, no problem; simply update the book and it's up to date. You avoid hassles with production channels, schedules, etc. Of course, existing print copies won't be up to date, but the e-books are. And using most POD services like Lulu, future purchases of print books are up to date, too. This flexibility is a God-send if you have errata (and who doesn't?) that you want to fix. Fix the PDF, update it on the access channels, and you're done.

How do you feel about the growing competition in this free to low cost niche?

Jeremy: I say bring it! :) Really, though, I think this is the future of tech publishing for the most part. I think it's awesome to see people weighing between the community and money, and choosing the community. Even so, people are smart, and much more so than publishers give many of them credit for. People are now turning to this area, this self-published, do-it-for-the-love-of-the-language niche, and realizing that they can contribute without the approval of a publisher and without a middle man. I think it's great to see more people like Geoffrey Grosenbach and the Little Book of Ruby folks producing stuff that is top-quality, but doing it without the "help" of publishers.

Regardless of the popularity of these books, there will still be the bastions of technical writing that are the huge publishing houses with meticulously reviewed books and very shiny graphics, but I think that this low cost, grass roots produced way of doing things is much closer to the heart of the Ruby and Rails communities. The postmodern culture shift has brought about sites like Digg and Reddit, where everything is community driven; I really think that is where technical publishing in general is (and should be) drifting.

Traditional publishers seem to balance the desire to fix problems with the ability to maintain versioned editions/printing of their books. How do you plan on dealing with that, or is it a non-factor to you (and if so, why?)?

Jeremy: I really think that's not that much of a factor for me; the way I understand traditional publishing, they print books at huge quantities at a time and then serve orders and distributors from a warehouse. That's fine for them; they get faster order service and distributors can have immediate access to their books. The huge stock, though, is the biggest reason that publishers push for seldom updates and fewer editions (I believe); you have to make sure that the book you're printing is something you can sell 10,000 copies of (or whatever). Publishers need to minimize errata and problems at each printing. If they pop up, they have to wait until the next printing because they don't want to shovel out the cash to do 15-20 printings a year just to fix little things. Using POD, though, if big things or little things pop up, they can be fixed. You're afforded the flexibility to be self-published and self-sufficient, which is essential since most of us don't have a full editorial staff and typesetting department in our garage.

Of course, I also see the merit in saying, "Well that was a problem in the first edition, but the second edition fixed that" or "That information was added in version 2." It keeps some sanity in trying to deal with an error here and an update there, but I think that in general, most books don't have enough errors or updates (without a significantly huge update such as from Rails pre-1.0 to Rails 1.2) to really warrant that sort of structure. I plan on setting up a place with errors and when they were fixed very soon (I've only had one be pointed out to me, thankfully, but surely there are more). The biggest reason I see is the money, time, and effort, and it's a very understandable concern for them.

What sets your book apart, why should people buy it?

Jeremy: I wrote my book with Kernighan & Ritchie in mind: be concise, get the idea across, and explain concepts. In covering the language, I covered only what was necessary and left the syntactic wizardry to the blogs and reference manuals. I tried to cover libraries that I found to be adaptable; that is, libraries that I thought once you learn these, you can build a lot of applications that use them. I avoided providing too much in that regard, given that there are websites such as http://www.rubymanual.org/ and http://www.ruby-doc.org/ that essential serve that sole purpose. In laying everything out, I've tried my best to have every chapter cover only the essentials and continue to build on the previous chapter, which builds on the one before it, and the one before it, and so on. I spent probably the first month thinking about and laying out all of the pedagogics of the book before I even wrote a single word, so I hope people find it to be a very refined, very intuitive learning experience.

I want to provide something that the community can fall back on when the PickAxe is out of someone's range, whether in comprehension or price. I don't think I necessarily compete with the PickAxe; they're two different volumes, with two different targets, and two different voices. They're targeting more of a reference manual seeking seasoned developer who has every printing of the Perl Cookbook on his shelf simply because he liked to pick out the errata. On the other hand, I see myself as targeting the person who knows a little about programming maybe but might be weak in some other areas. I'm targeting the guy who just doesn't get object orientation right out of the box and needs just a bit more more coddling. I also target those who already know how to program, but my primary audience is those who don't or have never programmed in a dynamic language before. I try to spend at least a little bit of time explaining concepts behind what is going on, something I feel is lost in some of today's technical writing.

Lastly, I tried to make my book fun. A lot of people mistake my writing style for _why's (and I do kind of see the similarity), but I've been writing "like that" for a long time, long before I even encountered Ruby. My apologies to the conspiracy theorist who accused me of actually being _why.

Now, back to the style. I love _why's book. A lot. I used to sleep with the first page of it tucked under my pillow, hoping that some of its delicious knowledge would seep into my brain. I learned a lot from it when I first starting picking up on Ruby. But a lot of what I heard when I initially asked about writing this book is that his book is _too_ fun, that there's _too_ much chunky bacon (like that's even possible), and people wanted to see a "middle ground." So, I've tried my best to strike a good signal vs. noise ratio in being fun while also being informative.

So, if you're looking for a concise, well-paced book with a fun feel that resembles a fiesta at dawn featuring Ricky Martin, Ricky Ricardo, and Ricky Scaggs, then buy it. If you're looking for a bean burrito, then go to Taco Bell.

I love the anachronistic 'feel' of the book (I wonder whether I could bind a copy in a slightly ratty hardcover like you'd find in a thrift store).

Jeremy: Haha, that would be awesome. I think I might try that when I get my copy.

How and why did you settle on it?

Jeremy: Well, I knew that I wanted something that wasn't your typical tech book. I slapped a centered image on there will some plain text and looked at it and decided it didn't fit with me and definitely didn't fit with the text. I wanted something that created some visual and mental tension to make it interesting and set apart. Who learns to program from a book that looks like it came from the 1600's? So I put together a few typesetting comps and started showing them to people. People really liked the "OldStyle" font that I picked initially, but I found it to be pretty hard to read at the text size, so I changed it to Lido. Lido is a great font because it's readable and bold without being too generic and too "in your face." It's a good balance.

You mentioned the cover design was a collaborative work, can you tell us the story of how it came to be?

Jeremy: Sure. I came up with the idea to do something "old" as I mentioned above. I threw together something really quick from some medieval woodcarvings and other various medieval images that I found in a stock library and some I scanned from old theology books we have in our library. I liked it OK, but I decided that maybe I wanted something a little "prettier." Tan and black aren't 'pretty' (even though I do plan to use that cover for the final binding of the Ruby AND Rails book, but for this one I wanted something different). So, I made a few concepts and posted them to various forums and lists and let people give me input. As people gave me input and ideas, the concept started taking its own shape. "What about how old biology diagrams kind of have those 'tag' things that come in from the side?" "Paisley is an old looking pattern...that's my favorite classical pattern." "Use an older looking set of Garamond fonts to get a better effect." As we tweaked it together, it finally took shape into its current form. It turned into something that was built by a great number of people together with me.

If a publisher were to approach you to do a 2nd edition of your Humble Little Ruby book, would you do it? What do you think are the tradeoffs that would affect your choice?

Jeremy: The financial security would be great; I put my wife and I through a great deal of financial duress during the writing of this book. I actually only worked half-day during the summer, which put us in a tight spot. Now that the book is out and selling, it's been slightly easier, but it's never easy when you're both in college and you have bills.

It would also be nice to actually let an editorial team go to work on the text. I always freak out that I've made a bunch of errors, so having someone finally make that decision would be a great comfort. :)

Of course, the notoriety and resume fodder would be a great benefit. Being young with no degree but having experience hasn't gotten me far (at least in the jobs I've been able to apply for); having a book credit under my belt with a publisher would make it tempting.

But even with all of that, I would never give this book over to a publisher. I want this to be something for the community, and I feel like most publishers would take that away.

What other books are you reading right now?

Jeremy: Nothing Ruby related right now, I'm sad to say. I can't really afford them new, and my local used book store hasn't stocked any Ruby books for a long, long time. I just finished Ender's Game, which I read in 2 days, and I've about finished Speaker for the Dead (which I started yesterday). I'm not sure why, but I'm on this strange sci-fi kick now; I haven't read sci-fi since middle school. Now I'm reading Speaker for the Dead, Foundation, and I, Robot. I'm also reading some interesting books on Paul's epistles and re-reading Donald Miller's book on Christian spirituality, Blue Like Jazz.

Since you're studying the Bible, what verse(s) have impacted you the most as a developer, designer, writer?

Jeremy: Man, that's a tough question. Always on my mind is 2 Thessalonians 3:10: "For even when we were with you, we gave you this rule: 'If a man will not work, he shall not eat.'" ;)

All joking aside, I think that the most impacting is probably Romans 12. It talks about being a living sacrifice, and giving your whole being and life as worship to God. That permeates everything I do; I feel like Moses, "God, I can't do anything!" But somehow, God makes it happen.

This verse also reminds me to be humble, no matter how much praise I receive or how great I am. My work is merely a piece of the body of Christ, moving in unison for him; I'm no better than another Christian who writes or one who can't. We're all the same in His eyes.

What are your favorite five Ruby libraries?

Jeremy: I think Rails is a given, so I'll name five others. My most recent toys have been a few linguistic libraries, the newly released text and the Ruby Linguistics Framework. Both are really nice; I'm a bit of a linguaphile, so this is pretty cool.

In that same line, I just picked up rparsec this week and played with it. I want to do some interesting stuff there...maybe next summer when I really have time to build something of use. ;)

RMagick is quickly becoming one of my favorites; it's much more intuitive than many of the GDI+ functions I'm used to from C#. I'm interested to see what I can do with it when used with one of the GUI frameworks.

Ruva, the pure Ruby JVM, knocked my socks off the other day. It's not something I'll really _use_, but it's something that has been teaching me a good bit about how to construct a simple VM. My lack of formal CS education cripples me at times such as this year's ICFP competition. It's really cool to see people pushing the limits of the language, breaking out of the box, and doing something totally off the wall but still beneficial.

Lastly is probably _why's sandbox. I love that thing. I've been using it on a new project I'm working that is supposed to help people learn Ruby better; I'm integrating it in as part of the course software.

What's next for you?

Jeremy: Right now I've got a bunch of projects competing for attention; we'll see which one comes out on top. First, I'm writing the Humble Little Rails Book of course. Next in line is a book about the other Ruby and Rails "stuff" that no one really thinks about when learning the language, like testing, deployment, metaprogramming, etc. It will be targeted at people who know the language, but may not understand why this other stuff is important.

I'm also working on the above mentioned Ruby learning project; I'm developing a college-targeted Ruby course with illustrated workbooks, course software (which may be driven by a Rails site...not sure yet), and lesson by lesson coursework. It will be based off the Humble Little Ruby Book, but more "academic."

What do you think is next for Ruby?

Jeremy: I'm very interested to see where it goes in the enterprise. People have been sort of flirting with dynamic languages for a while now, and I think that things like Rails are pushing it even further into the mainstream. Hopefully we'll hear about a Ruby-solution soon, but I don't think that will happen until we 1) get a definitive learning course down and certification program, 2) teach people the power of dynamism as opposed to static languages (metaprogramming, flexibility, and so on), and 3) be patient. :) I think things like this take time; look at how C++ and then Java took over. Their rise to power was slow and hard fought against the onslaught of COBOL, FORTRAN, and C zealots. I think we are at a tipping point, but we just have to be patient and see if the world takes to Ruby like we have.

I think Ruby can also be used in other areas, such as embedded scripting. Python has been used this way in a number of areas, but I think that Ruby would be better suited in places where people's technical knowledge is almost guaranteed to be weak. Ruby's syntax is a bit easier to grasp than Python, so I think in areas like scripting graphics design applications, office applications, you could use Ruby to get the power of a decent language (rather than a crappy BASIC variant) but still have something that's usable and very teachable.

I think the language itself is about as beautiful and elegant as you can make it, but then again, I look at the upcoming changes in 1.9 and it makes me drool. I always hated the verbosity of C# and PHP, so to have a language that wants nothing but to be usable and elegant is a God-send.

Technorati tags:

Tuesday, October 10, 2006

Improving Ruby Performance, One Library at a Time

I've been looking at performance lately, and several threads are starting to come together for me:

Zed: The whole process is really just the scientific method. Since I have limited information from Ruby about performance I have to just test, evaluate, adjust, and repeat until the measurements improve. What really helps is using statistical tests to confirm that each change made a difference, or at least didn't hurt things. Without these tests I could make changes that seemed to improve things but actually made no difference.
Zed Shaw

Dave: I spend a lot more time thinking about the algorithms than anything else. I use gprof to find the bottlenecks in my code and try to rework the algorithm so that that part of the code gets called less. Then I may try and optimize the code but only in extreme cases. The other tools I really like for C development are gdb and valgrind. For those who don't know, valgrind is a debugging and profiling tool which is particularly good for finding memory related errors in C programs. I usually use it for debugging rather than profiling and I don't know how I lived without it. Unfortunately it doesn't play nice with Ruby as Ruby's garbage collector throws up a lot of red flags so I've had to overcome this by building a pretty large suppression file to get valgrind to ignore all of the Ruby errors. I still worry that I'm also suppressing errors that could be raised by Ferret but it seems to be doing a good job. Another tool I'm really starting to like is gcov which is great for checking test coverage as well as profiling.
Dave Balmain

zenspider: people get so myopically focused on using C to make things faster that they don't bother looking at their algorithms or data-structures. It is sad. Ruby may be slow for method dispatch, but bad code can be slow in ANY language. . . . C doesn't make ruby fast. Avoiding method dispatch makes ruby fast. You can do that using pure ruby quite a bit of the time by applying your noodle.
zenspider

John: What this really tells me is simple ... algorithms matter...
John Duimovich

Ruby isn't the fastest language on the block, but it's fast enough for me. Does that mean it's fast enough? Probably not. There are three main places that Ruby could be improved: in my code, in the libraries that I use, and in the Ruby core. John, zenspider, Dave, and Zed all have some good advice, but it all boils down to John's — algorithms matter, and where they're used matters too. I'm most able to change my own code, but the greatest effect comes from making changes at the most core code we can.

Have you looked at the performance of the libraries you rely on? Maybe you should. If you find ways they could be improved, contribute a patch, or (at least) talk to the implementor. Consider it a call to action. If every Ruby user just made one small improvement, think of the effect it would have on the language as a whole — sure, it costs a bit more, but it's worth it!

If you found this post helpful, you might want to look at my ruby-prof post collection.

Thursday, October 05, 2006

Gaaah!

Just a short rant, sorry for the interruption.

What is it with people catagorizing Ruby as a web programming language?

THERE'S MORE TO RUBY THAN RAILS!

(Sorry for the shouting, I'll try to restrain myself from here on out.) Addison-Wesely, whose Ruby books and shortcuts I'm really looking forward to after a very positive experience with Rubyisms in Rails, has things confused. Rubyisms in Rails under a "Design and Creative Media/Flash" and the "Internet and Web/General" groupings. But they're not alone . . ..

IBM has just announced their new Web Development center, which "features technical resources for . . . Ruby, as well as Web development frameworks such as . . . Rails". Great, the Rails part I understand being in a 'Web Framework Development Center'. Why Ruby though?

Ruby does more than just build websites ‐ even James Gosling has figured that out by now!

Ruby Hacker Interview: Dave Balmain

Dave, thanks for agreeing to do this interview. Before you start, could you introduce yourself?

Dave: I grew up on a sheep and cattle farm about 4 hours south of Sydney. I didn't become interested in computers until relatively late when I started studying mechanical engineering at Sydney University in 1996. I was mostly interested in mathematics and theoretical computer science rather than software engineering until third year when I was lucky enough to have Rob Pike as a lecturer and tutor. I wrote my final thesis on natural language parsing and have maintained an interest in natural language processing ever since.

After university I worked as a consultant, implementing J2EE applications. In 2004 I quit my job and moved to Japan to practice Judo. I worked as an English teacher for a year before starting training full-time. This left me a lot of free time to work on whatever I wanted to, leading to the birth of Ferret.

Do you think your Judo training has affected the approach you take to software development? If so, how?

Dave: Interesting question. Let me first say that I practice Judo more as a sport than as a way of life, so I'm not really into the philosophical aspects of it. As with any athletic endeavor, the most important thing I take from my training is the art of self discipline. Self discipline is a really important part of software development, whether it is the fortitude to stick with a problem until you find a solution or the discipline to write your unit tests first and avoid code duplication.

Another principle that I believe carries over from Judo is that you need to be a jack of all trades and a master of one. In Judo, there are an endless number of problems that you may face so the more techniques you know, the better. But to be a really great Judo competitor you need to have one great technique to beat them all. This is known as your "tokui wazi". I think the same applies in software development. "The Pragmatic Programmer" lists "jack of all trades" as one of the characteristics of a pragmatic programmer. I think it is also important to master one or two power tools under your belt that you can use to solve the majority of your problems. However, it's important to remember that you don't need to stick with that tool for the rest of your life. You should always be looking for something better.

Leaving the sporting aspect behind, Judo actually means gentle ("ju") way ("way") and is supposed to be a way of life. Dr. Jigoro Kano developed Judo from jujitsu (meaning gentle art) at the end of the 19th century. The two principles he wanted every student to learn were "maximum efficiency" and "mutual benefit". It's pretty obvious how maximum efficiency applies to software development. As for the second, "mutual benefit" is the reason I write open source software and I think it is something most open source developers understand well. By freely releasing my software I gain the benefit of a large community of testers and contributers and they in turn benefit from the use of my software in their projects. This may lead to them having more time to work on their own open source projects which I may benefit from in future. This also has a direct impact on the first principle of "maximum efficiency" as there are fewer solutions developed for each problem. I think that there are still a log of projects out there that would have a lot to be gained by going open-source.

How did you discover Ruby?

Dave: In my last year of university, one of my courses required each student in the class to present a book on software engineering. I was lucky enough to be assigned "The Pragmatic Programmer" by Dave Thomas and Andy Hunt and it has remained by favourite book on software engineering ever since.

When I quit consulting I wanted to start building some of my own web applications and I thought there must be something easier than the J2EE stack I'd been using (struts/EJBs). I started with WebWork and Hibernate, reading a book called "Java Open Source Programming: with XDoclet, JUnit, WebWork, Hibernate". Funnily enough the source code I downloaded for the book actually included some ruby code to graph the Java classes. I wondered what these strange ".rb" files were and I was intrigued by succinctness and beauty of the code.

A quick Google search turned up the blog of some Danish guy talking about how quickly he had built this "Basecamp" application in Ruby. I became even more interested when I discover that Dave Thomas and Andy Hunt where bit Ruby fans. While it seemed almost perfect, there were unfortunately two problems; Rails had yet to be released and there was no Apache Lucene equivalent in Ruby, which was essential for the work I wanted to do. I made a brief foray into Python for a couple of months before my first problem with Ruby was solved with the first public release of rails. I decided to solve the second problem myself.

What other languages do you use, and what's the mix of Ruby to other stuff?

Dave: Most of the code (~80%) that I write these days is in C. I'd love to use Ruby for everything. However, I'm a great believer in using the right tool for the right job and no single programming language will be a good fit for all tasks. I quickly learned that Ruby was no good for the kind of data processing that I needed to do but at the same time, it was very easy to extend Ruby with C and the combination of the two is extremely powerful. As a consultant I did a lot of Java programming. Other than that I'm always looking at new languages. This year I've been doing a lot of playing around with Lisp and I'm fascinated by the Lua, particularly the fact that it's implemented in one third the number of lines of code as Ferret.

Have you been reading Ola Bini's posts about the intersection of Lisp and Ruby? Do you think he's on to something?

Dave: I have read them and there were some interesting views in the comments section. Going back to an earlier post he generated a bit of heat for saying;

"But it's not until Ruby entered the common programmers mind that Metaprogramming actually starts to become common place."

I can see why this upset people but I understand what he is saying. Prior to Rails, Ruby was a little known language and I think the Ruby community was mostly made up of the inquisitive type of user who is more likely to experiment with advanced language features like meta-programming. For this reason, meta-programming seems to be a little more common than in some of the already popular meta-programming-friendly languages like Perl and Python. Then Rails comes along and you get all these users coming to Ruby for the Rails framework rather than Ruby's language features and a lot of these users are starting to play with meta-programming for the first time.

Now going back to your question, the Lisp community is still made up of the advanced types. Most users are scared away by the "ugly" syntax (which you quickly get used to). Once you get over this small hurdle it is a small jump, thanks to the syntax, to understanding and using macros. You see them everywhere in Lisp and Lisp programmers generally know how to use them. Adding Lisp-style macros to Ruby in a way that they fit seamlessly into the language would be very difficult and I can't see it happening although I'd like to be proved wrong. Perhaps this is a feature best left to an add-on library.

Can you tell us a bit about Ferret? (What is it? Why did you decide to write it?)

Dave: Ferret is a powerful information retrieval library much in the same vein as Java's Apache Lucene. As I said earlier, one of my initial reservations with Ruby was the lack of a really good search library. Python already had two ports of Lucene; Lupy a pure Python port and PyLucene which uses SWIG to bind a gcj compiled version of Lucene. Anyway, I decided what better way to learn Ruby than to jump in the deep end by porting Lucene. I knew right off the bat that there were performance problems with Lupy, so I'd have the some trouble in Ruby, but I thought I could simply apply the 80/20 rule and rewrite the bottleneck in C. The initial port of Ruby took me about a month (a major credit to Ruby considering I was new to the language) and covered about 80% of the Lucene API. Unfortunately the 80/20 rule didn't quite work out for me as I'd hoped. After rewriting about 40% of the code in C, I was only able to achieve a modest 4x speed up. Hence, the next instantiation of Ferret involved a full rewrite to C. This time I got the performance I was looking for. However, by this stage I had been using Ruby long enough to see that the Ferret API was decidedly Java-like. Also, after 2 full ports of Lucene, I started to see areas in the algorithm that could be improved. This and other reasons lead to a departure from the Lucene file format to create Ferret as it now stands.

Can you give us an example of the kind of interface changes you're talking about?

Dave: Well, this of how documents are added to the index in Lucene.


 Document doc = new Document();
 doc.add(new Field("path", filePath,
         Field.Store.YES, Field.Index.TOKENIZED), Field.TermVector.NO);
 doc.add(new Field("content", fileData,
         Field.Store.YES, Field.Index.UN_TOKENIZED));
 writer.addDocument(doc);
So that's how Ferret initially looked.

 doc = Document.new
 doc.add(Field.new("path", file_path,
         Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO))
 doc.add(Field.new("content", file_data,
         Field::Store::YES, Field::Index::UNTOKENIZED))
 writer << doc

The first change I made was to get rid of the constants. These are overkill for defining the properties of something Symbols work a lot better. One of the changes I made was to actually make the index less dynamic by setting up the fields before they are added. This may seem like a strange way to go in a Ruby library but it actually makes things a lot tidier.


 # this gets run once to create the index
 field_infos = FieldInfos.new(:term_vector => :no)
 field_infos[:content] = FieldInfo.new(:index => :untokenized)

 # now simply add fields like this
 writer << {:path => file_path, :content => file_data}

You said that you'd seen areas where the Lucene algorithm could be improved, which lead to your new file format. Can you give us some insight into the kinds of changes you made to the internals and how they affected performance?

Dave: Firstly, for some background on Lucene's indexing algorithm, check out Doug Cutting's (creator of Lucene) description of the algorithm (from his blog).

The important part to note is that as each document is added to the Lucene index a small in-memory index segment is created for that particular document. Now this seems to make sense as the index will store the data in a very compressed format so you will be able to index more documents in memory before having to do a merge. But this isn't necessarily true as a term occurring in each segment needs to be stored once in each segment. Also, merges are quite expensive so they should be avoided. Instead I have a single hash which I can add new documents to without having to do any merges and I can actually store the same number of documents in memory due to the fact that once a term is seen, it is only stored once. This one optimization made Ferret 5 times faster for some indexing operations. The straight C version of Ferret seems to be consistently an order of magnitude faster than Lucene and sometimes up to 2 orders of magnitude. Unfortunately a lot of this performance difference disappears with the Ruby bindings but Ferret is still consistently faster.

Now the interesting question is, what if I built Ferret in pure Ruby using the same algorithm? Actually, C really shines in this task, not because of its execution speed but because of the fine grained control you have on memory allocation. I don't think my algorithm would translate as successfully back to Java either. Having said that, I do think it would be possible to build a search library in pure Ruby that comes close (within about 5 times speed difference) to Lucene. Throw in a bit of RubyInline and you would have a very nice little library.

To do this though, it's not just a matter of finding a great algorithm; It's important to find an algorithm that fits Ruby well.

With the guts of Ferret written in C, it's not going to be accessible to JRuby. Any thoughts about how to port/maintain a JRuby branch of Ferret?

Dave: I think that one of the advantages of using JRuby is that you have access to Java libraries so you may as well use Lucene. Or perhaps you could setup up a Ferret index server using DRb. On that point, I'm thinking about building an object database which uses Ferret internally for its indexes. This would ideally be accessibly to a number of different languages including possibly even Java (and therefor JRuby).

Since speed is obviously important to you, would you tell us a bit about your approach to code optimization? What tools and approaches are you using?

Dave: I spend a lot more time thinking about the algorithms than anything else. I use gprof to find the bottlenecks in my code and try to rework the algorithm so that that part of the code gets called less. Then I may try and optimize the code but only in extreme cases. The other tools I really like for C development are gdb and valgrind. For those who don't know, valgrind is a debugging and profiling tool which is particularly good for finding memory related errors in C programs. I usually use it for debugging rather than profiling and I don't know how I lived without it. Unfortunately it doesn't play nice with Ruby as Ruby's garbage collector throws up a lot of red flags so I've had to overcome this by building a pretty large suppression file to get valgrind to ignore all of the Ruby errors. I still worry that I'm also suppressing errors that could be raised by Ferret but it seems to be doing a good job. Another tool I'm really starting to like is gcov which is great for checking test coverage as well as profiling.

Have you seen the work Mauricio and Jamis have done with GDB and Ruby? Is driving Ruby from GDB (or vice versa) likely to be something you add to your toolkit?

Dave: It's something I'm already playing around with. It's a great way to explore Ruby's internals, although with Jamis's gdb.rb extension you can use gdb without knowing much at all about Ruby's internals. It's really clever the way Jamis used pipes to communicate with gdb. I'll definitely be looking for places to use that technique in the future.

Have you looked at all at the Rubyland versions of these kinds of tools (rcov, ruby-prof, etc.)? Are there other development tools you'd like to see in the Ruby environment?

Dave: ruby-prof is great. When I implemented the first version of Ferret in pure Ruby I tried using the standard profile library but it was way too slow. Finding ruby-prof was a godsend, it is light-years faster and a lot more accurate when profiling code with extensions. I haven't done much with rcov yet but finding it on Mauricio's blog was what actually led me to find gcov. I'll definitely be making use of it in the future.

Interested in sharing your valgrind supressions file? I know Zed Shaw and I (among others) have both been looking at using Valgrind with Ruby.

Dave: Sure, it's stored with Ferret in my subversion repo, though it isn't very portable at all, as it refers to my version of glibc and ld. I haven't looked into it yet but it may be possible to write a more portable version using regular expressions or something.

Have you looked at profiling-guided compilation for Ferret? Do you think this is a good approach for someone building it themselves?

Dave: No. My gut feeling is that the performance gains wouldn't be worth my trouble as I'm still too busy working on the code and I'm not releasing a binary anyway. As far as other users go, I think in most cases they'd be better off spending their time on the Ferret mailing list working out how best to set up their index for optimal performance. The one situation where I think profiling-guided compilation would be worth the trouble is in a desktop application. I had considered developing a desktop search application similar to OS X's "Spotlight" for Windows but Google beat me to the punch with Google Desktop.

Which project or projects out in the Ruby community do you envy, and why?

That's an easy question. I really envy the Rails community because of its success and the number of developers they have working on the core of Rails. I'd like to think it is due to the nature of the project (web-app versus information retrieval library) but I have to admit that a lot of the success of Rails also comes from the excellent marketing skills of DHH. I think marketing is a very important skill to have as an opensource developer because you are going to have to do all of the marketing yourself. For example, I'm also a big fan of the Nitro/Og framework but I don't think it will ever see the success of Rails. Not that it needs to, but it is important to attract enough attention to the project so that if the lead developer decides to run off and join the circus, there will be someone to take the reins. I'm not so sure that would happen with Ferret yet (so the circus will have to wait).

What are your 5 favorite libraries for Ruby?

Dave: I'm a big fan of Ryan Davis's work, especially RubyInline and ParseTree. Studying these libraries is another great way to learn about Ruby's internals. I really like Why's HTML parser hpricot. It's still in the early stages of development but it is the perfect companion to Ferret when it comes to scraping and indexing websites. RMagick is another great library. Lastly, (I should include a pure Ruby library) I'm currently looking at Jamey Cribb's persistent storage library Mongoose. Databases are overkill for a lot of applications people are using them for these days so Mongoose is definitely something worth looking into.

What do you think is the next big thing for Ruby?

Dave: Hopefully Ruby 2.0. I'd like to see it sooner rather than later although I think it is still a long way off. I think the performance improvement will really boost Ruby in the eyes of some of its detractors, I just hope no one is expecting Java like performance. Speaking of Java, JRuby is starting to look like an attractive alternative now that Sun is getting behind it and Charles Nutter and Thomas Enebo are working on it full time.

What's next on the horizon for you?

Dave: I'm really keen to implement an object database in C with built-in full-text search based on Ferret. A lot of the problems people are currently having with Ferret are due to the problems with keeping the index in synch with the database. The current solution isn't very DRY since you are storing data in two different places, the database and the Ferret index. Combining the two would make life a lot easier for developers using Ferret, not to mention the performance improvements that you could get with a good object database bound to Ruby. I just need to raise the funds. ;-) I'm also currently working on another very interesting project with Benjamin Krause although I'm not at liberty to say what that is just yet.

Technorati tags: