Monday, December 22, 2008

Author Interview: Relax with CouchDB

O'Reilly is at it again, getting an open content book out there to cover an emerging technology. This time it's Relax with CouchDB by Chris Anderson (@jchris on twitter), Jan Lehnardt (@janl on twitter), and Noah Slater (@nslater on twitter) -- the book is also available as a rough cut, if you'd like to support O'Reilly and the book.

Last spring, Jan came out to MountainWest RubyConf to talk about CouchDB, and I've been interested ever since. Now that the book was on the way, I asked Chris, Jan, and Noah to sit down for a quick interview. Here's how it went:


I know this is the title of your first chapter, but I feel like we should start things out with it here too: Why CouchDB?

Noah I'm a bit of a hypertext fetishist. Serving up documents via HTTP is great, but I became interested in CouchDB because it lets me serve up documents, and then lets me edit them too! There are a few other technologies, like WebDAV, that let you do that over HTTP, but here is this amazingly elegant solution that doesn't require protocol extensions or any other annoying cruft.

Of course, CouchDB also happens to be a very powerful database! A document database isn't for everyone, but if your application revolves around organising and serving up documents, CouchDB hits a real sweet spot.

Chris I'm pumped about CouchDB because it has the potential to make a bunch of new freedom. It's replication feature make offline databases at least as powerful as hosted data. Because everyone already knows the API (whether they've heard of it or not) writing apps on it is (will be) child's play. I say "will be" because right now we're on the edge of it. CouchDB is not yet 1.0 so we are still learning how to write apps against it. I am focussed on finding the simplest path from user to documents. Ajax seems to be the answer.

Jan I gave a few presentations on CouchDB. In the introduction, I usually ask who has built database-backed web applications. I usually get 80-90% hands. I follow up with "... and who likes it and doesn't have any issues with the database?". Most hands go down and I get a few laughs.

This is a nice skit to start a presentation with and win an audiences sympathies, but it is also giving RDBMSes a bad reputation, which they do not really deserve.

Today, relational databases are used everywhere on the web (where I'm coming from). For a long time they were the only sensible choice to solve a lot of the problems you get with concurrent access to your data in a high-traffic manner. But as demand grows, relational databases begin to be used in ways they were not intended to. JOINs are broken up, data is de-normalized. Disk space and insert-speed is sacrificed for concurrent read-speed etc. They are no longer the ideal technological choice.

CouchDB is built for the web scenario. Storing huge amounts of semi-structured data is the default case in CouchDB. It makes excellent use of modern multi-core machines and multi-machine setup. CouchDB views are built using Map/Reduce, the concept that made Google. Replication allows a user to take data offline and work with it locally, without a network connection; and it can be used to synchronize machines in a load-balanced or highly-available setup or both. And the HTTP REST API makes talking to CouchDB as easy as opening a browser.

And it is written in Erlang which gives us high concurrency (20k requests per second on a single machine?), fault tolerance and live code-upgrades. Erlang is worth an entire interview in itself.

Oh hey, that isn't exactly a concise answer and I didn't even tell you half of the good stuff :-)

You're approaching this book in a very open fashion (the book is online and available as you write it). How did you convince O'Reilly to go that way?

Jan We are developing CouchDB. Writing documentation comes with development. We figured that if we have to do the writing anyway, we don't want to do it twice.

O'Reilly is pretty open about (heh) open books. They have the Subversion book that everybody knows and recently released a book on Haskell. They have experience doing this kind of thing and they have seen good results in the past. When we asked about the possibilities, they just said "sure".

Chris As Jan says, it wasn't hard. O'Reilly understands the value of open information, and the value proposition around publishing, which is part of why they've done so much to enter the wider arena of sharing ideas.

JanNoah ultimately pushed us in the direction of writing the book in the open. Once decided, I couldn't see it done any other way and I am glad for the pushing.

Noah We had been talking about doing a book for a while and I was always very adamant that whatever we did would be released under a free license.

The biggest deficiency in free operating systems is not in the software—it is the lack of good free manuals that we can include in these systems. Many of our most important programs do not come with full manuals. Documentation is an essential part of any software package; when an important free software package does not come with a free manual, that is a major gap. We have many such gaps today.
   —Free Software and Free Manuals, Richard Stallman

Since Stallman wrote this essay, the technology publishers have started to wake up a little bit. A growing number of manuals and books are being released under free licenses, and this is absolutely marvellous. O'Reilly is really leading the way on this, so we were very lucky to get a deal with them.

Our editor told us a surprising rule of thumb, that releasing a good book under a free license makes it sell more copies, and releasing a bad book under a free license makes it sell less copies. Let's hope that we are the former!

What benefits do you expect from this approach?

Chris For me, the hardest part is knowing what people just learning CouchDB will need to know. The authors mostly communicate with people who already understand and use CouchDB, but I'd like the book to be capable of drawing you in, even if you're new, so that you feel comfortable building Couch apps.

Noah I am very lucky to be writing the book with two people a lot smarter than me. When I make mistakes or write something a little silly, I get corrected. Similarly, when you develop something in the open, you get all these really bright people reading your stuff and picking you up on things that you've missed or didn't think about properly. Open collaboration is hugely beneficial like that.

Jan That the world fixes our typos. Oh wait, no! Actually, that's what our editor will take care of. You saw in my answer to the "Why CouchDB" question that it is not a one-liner. Add a very diverse potential readership and you get even more ways to put things We can't cater everybody, but we are trying make this book worth a read for a lot of people, from developers to administrators, system architects and hardcore RDBMS lovers.

By opening the writing process, we get decent feedback very early on and as a result will produce a much better book. Besides, we have been open source developers since forever, we just couldn't do it any other way. Release early, release often.

So far there's been a lot of feedback on the book's mailing list. How is that affecting your writing process?

Jan In early drafts you know that there are passages that try to explain an idea and that these passages don't make a good job. If you know the topic at hand pretty well, you might not notice immediately. Public feedback will tell you pretty quickly what works and what doesn't and that helps with the review proces. The writing process, at least mine, is not really affected.

Noah I think the most significant thing I have taken away from this process is the level of confusion caused when trying to compare CouchDB with traditional databases. I mean, fundamentally, this comparison makes about as much sense as comparing cheese and bread. Sure, you can eat both of them, and one may be applicable in one situation and not in the other, but they also taste really nice when you put them together. That's kind of how I see the CouchDB versus RDBMS debate. Meaningless, essentially. CouchDB isn't a panacea, nor are relational databases, and both of them have their uses. I think we're going to work on improving this clarification in the book.

Having said all that, we've only been taking comments for five days now and the response has been just overwhelming. Who knows what other issues, or points of confusion we're going to find along the journey? I'm pretty sure whatever happens, it's going to be just as rewarding. Thanks for the input everyone!

Chris I've been happy about how people are reacting to the figures. I basically hand-sketched some basic drafts of technical drawings for the book. I was thinking, "what's the simplest thing that could possibly work?" and some of the drawings were even captured with the iPhone built-in camera.

I was surprised because people seemed to like the hand-made feel. I've started refining that style, with the rough drafts for the code-example sections. I'm a big fan of the Head Rush books, so I'm hoping that when O'Reilly brings their artists in, they can approach it with the same playful style.

You've posted three chapters of the book already. What should we be looking for next, and how soon should we be seeing it?

Noah We're aiming to submit another six chapters to O'Reilly in early January, so you should see things trickle through to the website as we complete that process.

Jan We are aiming some time in January for the next batch. We are trying to do the book sequentially and the ToC is online already, so you can see what's coming :o)

Chris We're working on a description of the example blog application we'll be releasing. The app right now is pure-Ajax, but by the time CouchDB hits 1.0, we'll be able to do major portions of it without JavaScript at all. CouchDB it getting HTML generation capabilities so expect that to change some of how we build applications.

What kinds of projects should developers be looking at CouchDB for?

JanAfter my talks or in discussions with developers I often get asked what "niche" CouchDB fits in. I usually say "The Web" and that's me trying to be funny again because the web is nowhere near a niche.

Everything that stores messy data that users (no offence) submit (Facebook, flickr, YouTube etc.) and need efficient access to that data. Systems that handle document structures (CMSes, Blogs ... ). Situations where offline-work is preferable. The internet is available everywhere, except when the Wifi is not working or the hotel charges $5 a minute. Not using The Net to get to your data is usually faster, too.

Finally, database are a fairly boring topic (among non-database nerds). CouchDB makes databases fun again.

Chris Yes and +1 to that, Jan!

Look at replication, think about the new opportunities it opens up. Think about applications with the power of the desktop (and location) but written in the language of the web. HTML, JavaScript, REST, JSON, these have become essentially the lowest-common denominator for web services. When offline mode is not a service downgrade, and the source code is at the desktop, people have new affordances. I'm hoping Couch apps become the new Excel macro.

While CouchDB is written in Erlang, there are a lot of libraries to use it from other languages. Which ones have you dealt with? What's good and bad about them? Which ones will show up in your book?

Chris We're concentrating on the HTTP API for the book, so most of our examples are written in JavaScript or Curl. However, there are good libraries in most languages these days, so to the extent a language handles dynamic JSON-like objects well (and can access HTTP) it's a good fit for CouchDB.

Jan I think I wrote three PHP libraries and only one didn't completely suck, but that were the early days. There is the excellent (documented, unit-tested) PHPillow that has been extracted from a real-world project, which is always a good thing.

I used couchdb-python by our very own contributor Christopher Lenz for a number of projects now and it is very solid. If you are more of a Twisted person (heh), David Reid hosts Paisley.

Good and bad? CouchDB introduces new paradigms and we are still in the process of finding out what client-library abstractions work best (hint, ActiveRecord doesn't work well at all). Like all pioneering, this includes some stabbing in the dark. But in general, the libs I've seen are at least well suited for a single job or class of jobs. Maybe there is no one library to rule them all. We'll find out.

If you want to get into CouchDB (and Erlang) what's the best way to learn?

Jan For CouchDB, you don't need to know any Erlang. If you are familiar with the web, you are good to go. Start out by reading through the wiki and of course our book (*cough*). For more specific and tutorial style documentation, check Planet Couch, it aggregates CouchDB related blog posts and includes already a wealth of information.

If you still want to get into Erlang, there is Thinking in Erlang 30-page free PDF that gives you rough overview. For deeper diving, Joe Armstrong, Erlang's inventor, wrote an excellent book Programming Erlang which is a fun read and packed with all you need to set out writing your first (and second) applications.

Chris I learned Erlang by working my way into CouchDB's source code, based on my experience with the HTTP API. The HTTP callbacks are easy to find, and since you already know what they do, its a good way to learn Erlang. There are plenty of ways to get into CouchDB without touching Erlang — we're working on a feature that allows you to write arbitrary controller logic in JavaScript, so really, you can customize CouchDB, as well as use it, by programming any language that understands JSON.

No comments: