Monday, October 23, 2006

Author Interview: Joshua Smith

Last week, I posted an interview with Matt Wade, the editor of Practical OCaml. Here's my interview with Joshua Smith, the author


Joshua, most of the readers here don't know you, would you mind introducing yourself?

Joshua: Sure. My name is Joshua Smith. I am a writer and consultant living in the suburbs of Washington, DC. I was a Unix Sysadmin for a several years, and have also been a programmer. Most of my professional experience has been in the financial industry (specifically trading and options clearing). I've been writing in Ocaml for a few years and have done extensive work in Ocaml (obviously), Perl, Python, and Java. I've also dabbled in quite a few non-mainstream languages like Prolog, Forth, a few LISPs.

Why functional languages?

Joshua: Well, for starters learning a functional language makes you a better programmer. While that may seem somewhat startling, I feel that learning any programming language makes you a better programmer (except BASIC, perhaps). This is also one of the reasons that some languages borrow features and idioms from other languages. Expressiveness and clarity are as important to writers in artificial languages as they are to writers in natural languages.

Another thing is that functional languages allow the programmer a different (and in some cases better) way to reason about programs. Some functional languages (like Ocaml) are also constant languages, which means that they do not have variables in the same way that languages like Ruby have variables. While this can make some things more complicated (keeping state, for example) it creates a situation where you cannot have a variable get "stepped on". Writing programs without mutable values means you have to think about solving problems differently than if you have mutable values.

Why OCaml?

Joshua: Ocaml offers (in my opinion) the best features of functional programming but does not require all of your programming to be purely functional. Ocaml offers a robust object system, for example, so you can use functions or methods depending on which solves your problem more effectively.

Ocaml is also very, very fast. The native-code compiler generates executable that run nearly as fast as C on many platforms. The native-code programs also can be distributed without a runtime, which can greatly simplify deployments.

Why a book on it now?

Joshua: There really haven't been any english-language books on Ocaml published by a main-stream press recently. I think there has been a lot of interest in Ocaml for some time , just look at the ICFP contest winners and you'll see a lot of Ocaml teams. I think the language and the community are now big enough that people are asking "Is there a book on this?"

Functional programming languages don't seem to have spawned many 'mainstream' (non-academic) books. Why do you think your book will succeed in this space?

Joshua: I'm not sure I agree with you. LISP is, by far, the most popular functional programming language I can think of and there are several non-academic books on Lisp. Functional languages have gotten a lot of bad press in the past. There is also the problem that popularity begets popularity. So, when a programmer thinks about starting a new project in a new language she is more likely to start that project in a language that is popular.

To more directly answer you question: I think the book will succeed because there is a demand. When people see things like the ICFP contest and see Ocaml entries they get interested. However, when they look at the documentation available and see the most recent book on the language is very old, they start to rethink: maybe this is just an academic thing. Now there is a book, it's a practical book and I think people are going to be very happy about it.

What will it take to see more 'mainstream' books on functional languages in general and OCaml in specific?

Joshua:I think this will come when more people become less dogmatic about design issues. One of the things that I think is missing from a lot of developers minds is the concept of ecology. Programs exist and run in an ecology of systems, more so now than ever. This also means that programmers should have a more free hand to choose languages based on how it helps them fill an ecologic niche or function, rather than how it fills a political function. I accept that I am an Idealist.

If the demand is there for Practical OCaml will you be doing a second book? If so, what aspect(s) of OCaml would you like to write about?

Joshua: That is a very good question. At this point there has been no talk of another book. But, If there is talk I would be happy to write another book. One of the things I would like to do is create a book length program and annotate it. This is more in the style of Knuth's MIX machine and language, which is pretty ambitious. But I think that more programmers would benefit from that kind of work than a cookbook or similar.

If you were to imagine your ideal book on OCaml, what would it be about? Who would write it?

Joshua: Probably the book I mentioned above. A book length program that is fully annotated, documented, and described. This would include everything from design to distribution and maybe even end-of-life. One of the things I feel a lot of programming books leave out is the "OK, what now?" part of programming. Think of it this way: simply being literate in English does not mean you can write a novel. In order to create in any language you need more than simple literacy, you need exposure to other works (perhaps great works), you need to know more. This "more" is what I would like to see in a programming book.

I tried to include some of that "more" in Practical Ocaml, but there is only so much one book can do.

Why should people be interested in OCaml?

Joshua: It's a great language. Learning a language like Ocaml is also a great way to deepen your understanding of why types are important. Type-safety is, I know, a real mine-field of an issue for a lot of people. Personally, I used to think that type safety was a waste. Then I found that strong, static checks where a great way to have better (and more reliable) code. I've never looked back.

Another thing that people would be interested in is tools for creating and maintaining Domain Specific Languages (or DSLs). I have seen some meta-programming in Ruby, but Ocaml has facilities that are well beyond what most people think of when they think of "extending" a given language. Camlp4 is a tool that can let you alter or extend the syntax of Ocaml programs. This goes way, way beyond macros and template systems found in other languages. Ocaml also has a more traditional Lex/Yacc toolkit that allows you to build languages, processors and whatnot from scratch.

Most of the readers here are probably Ruby hackers. What interest does Practical OCaml hold for them?

Joshua: Ocaml has a very different approach to solving problems than Ruby does. Ocaml is strongly and statically typed. Ocaml has objects, but really is more of a functional language. Ruby includes some functional programming idioms, like Lambdas, but is really an OOP language.

The type issue (I think) would be the biggest draw. Ruby supports operator overloading and runtime type identification. Both of these allow for rapid development and some very nice idioms for code. They can also allow for some very strange errors and other problems (many of which can be addressed by Unit testing, but not all). It's also important to point out that Ocaml has a notion of type and type safety that is worlds away from C or Java. I've seen a lot of people use C/C++ as their measure for static typing and that doesn't work well when talking about Ocaml. Types are inferred in Ocaml (meaning you don't have to declare a string to be a string, or a float to be a float) automatically. That's not all, but type is a pretty big deal in Ocaml.

As an ex-systems administrator, what role do you see for OCaml in that space?

Joshua: Ocaml is a really fantastic language for admins. The ability to generate native code is great, Ocaml interfaces very well with C and has great tools for dealing with text. Top all of this off with the ability to easily create and maintain DSLs and you've got a big win for admins. For example, in the book there is an application that processes complex (multi-line) log files and the definition of a tripwire-like utility and associated language. I've seen (and done) these kinds of things in Perl and Python and it is so much better in Ocaml that it's not even funny.

Ocaml is also a great systems programming language because it focuses so much on safety. Having a security flaw in an administrative utility can be a very large problem. Especially since most security flaws are really just bugs. Ocaml eliminates a lot of the things that make secure programming hard while not handcuffing you in the process.

Could you give us some (short) examples of OCaml code?

Joshua: Sure, I've attached a small program that prints a report of how much disk is being used by what person on a unix system. It includes documentation.

disk_hogs.ml


(** Program to return the disk usage (by uid) down a given path.  This 
application recurses through directories, but does not traverse symlinks.
@author Joshua B. Smith 
@version 0.3
*)


(** {2 Text Output of this program } 

{v
josh\@bebop: ocamlopt unix.cmxa disk_hogs.ml execfile.ml
josh\@bebop: sudo ./a.out /home/
ftp uses 0 MB
www-data uses 27 MB
root uses 16198 MB
Unknown uses 3 MB
Unknown uses 5 MB
josh uses 10874 MB
v}
*)

(** {2 The Code } *)

(** [traverse dlist acc] traverses all of the directories listed in dlist, and recurses into subdirectories. 

@param dlist a list of strings containing directory names
@param acc accumulator
@return a (string * Unix.LargeFile.stats) list which is (filename,lstat)
@raise Unix.Unix_error Can be raised
@raise Sys_error can also be raised
*)
let rec traverse dlist acc = match dlist with
    [] -> acc
  | h :: t -> let listing = Array.to_list 
      (Array.map (fun x -> let fname = Filename.concat h x in
      (fname,Unix.LargeFile.lstat fname)) (Sys.readdir h))
    in
    let dirs = List.filter 
      (fun (fname,f_stat) -> f_stat.Unix.LargeFile.st_kind = Unix.S_DIR)
      listing
    in
      traverse 
 (t @ (List.map (fun x -> (fst x)) dirs)) 
 (listing @ acc);;
(** [calcvals statlists acc] Takes a list of (filename,lstat) (usually from {!Disk_hogs.traverse}) 
@param statlists (string * int64) list
@param acc accumulator
@return a  which is a list of (username, disk usage) tuples
*)
let rec calcvals statlists acc = match statlists with
    [] -> List.map (fun (id,size) -> let name = try 
        (Unix.getpwuid id).Unix.pw_name
      with Not_found -> "Unknown" 
      in (name,size)) acc
  | h :: t -> let (u,other) = List.partition 
      (fun (y,x) -> x.Unix.LargeFile.st_uid = (snd h).Unix.LargeFile.st_uid) 
      statlists in
    let subtotal = 
      List.fold_left (fun acc (_,elt) -> 
   Int64.add elt.Unix.LargeFile.st_size acc) 
 Int64.zero 
 u 
    in
      calcvals other (((snd h).Unix.LargeFile.st_uid,subtotal) :: acc);;

(** [runreport startdir] Builds a report using {!Disk_hogs.calcvals} and
    displays the result neatly.

    @param startdir Directory to start from
    @return unit
*)
let runreport startdir = List.iter (fun (i,s) -> 
          Printf.printf "%s uses %Li MB\n" i 
       (Int64.div (Int64.div s 1024L) 1024L)) 
  (calcvals (traverse [startdir] []) []);;

execfile.ml


let _ = Disk_hogs.runreport Sys.argv.(1);;

Are there any blogs or websites that someone interested in OCaml should be watching?

Joshua: Absolutely,

  • First there the main Ocaml site. Then there is www.ocaml-programming.de which is home to a lot of utilities and the GODI Ocaml distribution. There is also the apress website, which will have all the source code from my book and forums, too.
  • comp.lang.functional and comp.lang.ml are both USENET groups that cover a lot of functional programming and ML (including Ocaml) topics. Oh and John Harrop's web site (www.ffconsultancy.com/index.html) Markus Mottl's website (www.ocaml.info) also includes a bunch of stuff.
  • I would also be remiss if I didn't include a link to my technical editor for the book Richard Jones (www.merjis.com). He helped out a great deal, and he is a _very_ sharp guy.
I'm sure I'm forgetting someone...

What technical books are on your list to read right now? What about non-technical books?

Joshua: Skyttner's "General Systems Theory" and "The Art of Error Correcting Coding" by Robert H. Morelos-Zaragoza are the next two technical books on my list, after I'm done reading "Stumbling On Happiness" by Daniel Gilbert and a re-read of "Slaughterhouse 5"

What books (technical or not) have influenced you the most?

Joshua: This is a really difficult question. I'm a voracious reader and I have been lucky enough to have many books influence my life. On the technical side, I would have to say "Object Oriented Software Construction" by Meyer changed my programming. I didn't "get" OOP until I read that book and it changed how I wrote programs. On the other hand, Holland's "Adaptation in natural and artificial systems" changed the way I solve problems. Personally, it would probably be a dead heat between "Getting To Yes" and "Pirke Avot" as to which book had the greatest and most broad influence on how I live, but that's a longer story.

If you ask me tomorrow you might get a different list on the technical books. While I'm a lover of books and the printed page I am a inconstant one.

Outside of OCaml, what programming languages should a Rubyist, who probably already knows a bit of C and/or Java, learn?

Joshua: Honestly, I would probably say Forth. I know, it's a strange choice. But Forth, as a stack based language, is fundamentally different than the ones you listed. It also can help a programmer better understand using stacks for computation and storage. So many applications have a stack component to them, that a deeper understanding of stacks and queues is always a good thing. Besides, it's a super easy language to learn.

If not forth, I would say Python. Python is definitely one of my preferred languages (sorry Ruby).

8 comments:

Anonymous said...

It is great to see a recent book on OCaml! I'll definitely buy the book. One thing I regret however seeing the code posted and the one downloadable from the APress site, is the poor layout of the code. I would have expected the code shown as an example for others to follow. As an example, here is my take of the "disk_hogs" program -- both clearer and more concise I think. (Sorry for the messy formatting but < pre > is not accepted.)

(** Program to return the disk usage (by uid) down a given path. This
application recurses through directories, but does not traverse
symlinks.
*)

(** {2 Text Output of this program }
{v
shell$ ocamlopt -o disk_hogs.com unix.cmxa disk_hogs.ml
shell$ sudo ./disk_hogs.com /home/
ftp uses 0 MB
www-data uses 27 MB
root uses 16198 MB
Unknown uses 3 MB
Unknown uses 5 MB
josh uses 10874 MB
v}
*)

(** {2 The Code} *)

module U = Unix.LargeFile
open Printf

(** [traverse acc fname] given a filename or a directory [fname],
returns a list of pairs (filename,lstat), recursing into
subdirectories.

@param acc a list of the results of the processed entries so far.
@raise Unix.Unix_error Can be raised
@raise Sys_error can also be raised
*)
let rec traverse acc fname =
let lstat = U.lstat fname in
let acc = (fname, lstat) :: acc in
if lstat.U.st_kind <> Unix.S_DIR then acc
else
let d = Array.map (fun f -> Filename.concat fname f) (Sys.readdir fname) in
Array.fold_left traverse acc d


(** [calcvals acc statlists] Takes [statlists], a list of
(filename,lstat) (usually from {!Disk_hogs.traverse}), and returns
a list of (username, disk usage) tuples.

@param acc a list of the results of the processed entries so far.
*)
let rec calcvals acc = function
| [] -> acc
| ((_,st) :: _) as statlists ->
let id = st.U.st_uid in
let name = (try (Unix.getpwuid id).Unix.pw_name
with Not_found -> "Unknown") in
let (u,other) = List.partition (fun (_,s) -> s.U.st_uid = id) statlists in
let subtotal =
List.fold_left (fun sum (_,elt) -> Int64.add sum elt.U.st_size) 0L u in
calcvals ((name, subtotal) :: acc) other

(** [runreport startdir] Builds a report using {!Disk_hogs.calcvals} and
displays the result neatly.

@param startdir Directory to start from
@return unit
*)
let runreport startdir =
let print (i,s) =
printf "%s uses %Li MB\n" i (Int64.div (Int64.div s 1024L) 1024L) in
List.iter print (calcvals [] (traverse [] startdir))



let () = runreport Sys.argv.(1)

Anonymous said...

I have put a copy of the code hrere.

Anonymous said...

How does one pronouce the name of the langage?

Oh-Camel?

Ock-ah-mal?

Anonymous said...

It's "Oh-Camel".

ML -> Caml -> Objective Caml ("O'Caml").

gnupate said...

Chris, thanks for the comments and the code. I'm not much of an OCaml jock (that's why I'm getting the book), but I'm asking Joshua to take a look at your code and coment back here.

Anonymous said...

I agree with chris. It's unfortunate that the code does not adhere to the Caml programming guidelines

Anonymous said...

Great to see another book published about OCaml. For anyone who is interested, I published OCaml for Scientists last year. My book had rave reviews and is selling like hot cakes.

Anonymous said...

It's interesting... I've been 'writing' OCaml code for over 5 years and I'd only consider myself a 'dabbler'... but I really need to say two things:

1. OCaml is a LFSP.
2. The author of this book should only be allowed to write after actually learning how to write.

(P.s. - I'm keeping the book anyway.)