Showing posts with label CouchDB. Show all posts
Showing posts with label CouchDB. Show all posts

Tuesday, January 27, 2009

CouchDB Contest

Ok, it's contest time again!

O'Reilly has offered up two free keys to the rough-cut editition of Relax with CouchDB, so Jan and I decided to find a cool way to give them out.

Here's the plan — we want to hear about your CouchDB project ideas, you can write about them in the comments here, or just link to your own blog. Just an idea isn't enough though, let us know why you think CouchDB would be a good fit for it. You can submit ideas until February 13th, after which Jan and I will look them over and pick out two winners.

Have fun, and good luck!

Friday, January 23, 2009

Reader Interview: Relax with CouchDB

As I was preparing to do a second interview with the writers of Relax with CouchDB, I decided to approach the community that's already reading the book (even though only 4 chapters have been released so far). Chris, Jan, and Noah are writing this book in a very open fashion to try to improve the final product and to build a community around it before it hits the shelves. I thought it would be interesting to see what some of the readers thought. I was lucky enough to get Rich Morin, @rdmorin, to respond (I've been a fan of his since the Prime Time Freeware days, see below). Rich has been participating in the Freeware community for a long time, and I think he brings some real wisdom to the table.


What attracted you to CouchDB?

Rich I've been looking around for a while for a way to handle the mixture of unstructured, semi-structured, and formally structured data that I encounter in mechanized documentation projects. I've designed a few schemas that make my DBM friends grow pale, but none that looked very easy to use, given that I'd have to encode my queries in SQL.

Ontiki is the current incarnation of a long-term mechanized documentation project. I'm planning to use CouchDB for it, along with Erubis, Git, and Merb. It should stretch the traditional notions of wikis quite a bit.

I heard about CouchDB in a talk by Ezra Zygmuntowicz (of Engine Yard and Merb fame) and decided to look into it. Although it's still a Work In Progress, CouchDB looks extremely promising. I particularly like the fact that it uses Ruby-friendly data structures (eg, lists, hashes, and scalars) and that it should scale extremely well.

Why are you participating in this kind of open review/development of Relax with CouchDB?

Rich As a frequent buyer of technical books, I generally look for books on topics of interest. I've bought WIP books before and have no problem with seeing material that still needs work. In fact, I like the fact that I can help to influence which questions get answered, etc.

What value do you think this level of openness will do for the book, good or bad?

Rich Several years ago, I was a small-scale publisher of book/CD combinations (Prime Time Freeware, for anyone who remembers :). When we did the book on MacPerl, we actively solicited user input and review (though we didn't charge for access to the PDFs). The responses ranged from nitpicking to well-reasoned arguments about pedagogical style. Almost all of them were useful (sometimes extremely so) in improving the book.

Author Interview: Relax with CouchDB (Round 2)

I had a great opportunity to trade emails with Jan Lehnardt in a second interview about Relax with CouchDB. This time, we touched TDD, refactoring, and of course, the book.


The initial chapters have been available for over a month now, gathering feedback. What's been the biggest change you've made due to feedback?

Jan We still have things to integrate, but we took a lot of notes. The biggest thing we've seen is where we tried to explain concepts in CouchDB by contrasting them to how things are done in the RDBMS world. Production systems often do not follow theory to the book because of performance reasons (denormalization comes to mind). So we are saying in CouchDB your data is denormalized, thus fast, and actually true to the "CouchDB Theory" but now people are (rightfully) pointing out that the RDBMS systems have been used wrongly. Fact is: We don't want to say bad things about the RDBMS world, we just tried to explain things by comparison and a lot of people coming to CouchDB have an RDBMS background, so we thought it is a good idea to contrast them.

We learned that this is not the best approach and we are moving things a little towards explaining CouchDB on its own instead of comparing it to relational databases in the first chapters. Again, I'm not saying anybody is more right or wrong here, it was just poor choice on our part because we didn't know we'd cause such a ruckus :)

PS: CouchDB is not a relational database and we all support the idea of using the right tool for the job. This is sometimes an RDBMS and sometimes CouchDB :)

As I looked over Chapter 4, one blurb stood out to me: Applications "live" inside design documents. You can replicate design documents just like everything else in CouchDB. Because design documents can be replicated, whole CouchApps can be replicated. Can you explain this in a little more depth?

Jan CouchDB is an application server in disguise. It can host HTML+CSS+JavaScript applications like any other web server, but it also provides an HTTP API to a database system. This is the perfect basis to write standalone applications using the web-technologies everybody knows.

CouchDB's replication becomes a distribution channel not only for data (what books to you have in your library?) but also entire applications (I enhanced the library application to also handle my board game collection, do you want my patch?). Think of GitHub, but for applications and peer to peer distribution.

You can also read more about this topic in a series blog posts by Chris

Refactoring is on my mind a lot right now, and with that comes testing. How testable are CouchDB apps? What kinds of tools or frameworks exist to do testing?

Jan We are currently working with TDD experts to find a good solution to allow CouchApp developers to test their applications inside-out.

Since this is all web-technology, we expect we can re-use some of the existing tools. We just want to go the extra mile and make it really easy for the developer.

What about refactoring proper, what's the state of the art in CouchDB refactoring?

Jan That depends a bit on what you mean. Refactoring CouchApps has not been tackled yet. But CouchDB is schema-free so you can just play around and change things. Documents (that includes the design documents that hold your application) are versioned, so you can go back to an old revision (not forever, but for a little while)) if you screwed up.

About refactoring your data: Say you have an app that stores user profiles and you started out with separate fields for first- and last name. But user-feedback and UI-design found out that a single `name` field is better suited for your app. Your map function to get all first and last names originally looked like this:


 function(doc) {
   emit([doc.firstname, doc.lastname], null);
 }

And your new one looks like this:


 function(doc) {
   emit([doc.name, null);
 }

You can consolidate both to support legacy data:


 function(doc) {
   if(doc.name) {
     emit(doc.name, null);
   } else {
     emit(doc.firstname + " " + doc.lastname, null);
   }
 }

You change your UI-code to deal with a single `name` value and this view will consolidate old and new documents.

Yes, this is a little dirty, but also pretty neat. At some point, you'd want to clean up all your data and get rid of the special cases. Our off-the-hand suggestion is that for minor versions, where you want to add features quickly and make updates painless, you use the duck-typing above and for major versions you take the time to consolidate the cruft and update your data properly and prune your map and reduce functions.

This is good advice (hopefully), but we might be able to provide you with tools and libraries that handle the dirty work for you so you can concentrate on improving your app instead of fussing with the database. After all, this should be relaxing!

Monday, December 22, 2008

Author Interview: Relax with CouchDB

O'Reilly is at it again, getting an open content book out there to cover an emerging technology. This time it's Relax with CouchDB by Chris Anderson (@jchris on twitter), Jan Lehnardt (@janl on twitter), and Noah Slater (@nslater on twitter) -- the book is also available as a rough cut, if you'd like to support O'Reilly and the book.

Last spring, Jan came out to MountainWest RubyConf to talk about CouchDB, and I've been interested ever since. Now that the book was on the way, I asked Chris, Jan, and Noah to sit down for a quick interview. Here's how it went:


I know this is the title of your first chapter, but I feel like we should start things out with it here too: Why CouchDB?

Noah I'm a bit of a hypertext fetishist. Serving up documents via HTTP is great, but I became interested in CouchDB because it lets me serve up documents, and then lets me edit them too! There are a few other technologies, like WebDAV, that let you do that over HTTP, but here is this amazingly elegant solution that doesn't require protocol extensions or any other annoying cruft.

Of course, CouchDB also happens to be a very powerful database! A document database isn't for everyone, but if your application revolves around organising and serving up documents, CouchDB hits a real sweet spot.

Chris I'm pumped about CouchDB because it has the potential to make a bunch of new freedom. It's replication feature make offline databases at least as powerful as hosted data. Because everyone already knows the API (whether they've heard of it or not) writing apps on it is (will be) child's play. I say "will be" because right now we're on the edge of it. CouchDB is not yet 1.0 so we are still learning how to write apps against it. I am focussed on finding the simplest path from user to documents. Ajax seems to be the answer.

Jan I gave a few presentations on CouchDB. In the introduction, I usually ask who has built database-backed web applications. I usually get 80-90% hands. I follow up with "... and who likes it and doesn't have any issues with the database?". Most hands go down and I get a few laughs.

This is a nice skit to start a presentation with and win an audiences sympathies, but it is also giving RDBMSes a bad reputation, which they do not really deserve.

Today, relational databases are used everywhere on the web (where I'm coming from). For a long time they were the only sensible choice to solve a lot of the problems you get with concurrent access to your data in a high-traffic manner. But as demand grows, relational databases begin to be used in ways they were not intended to. JOINs are broken up, data is de-normalized. Disk space and insert-speed is sacrificed for concurrent read-speed etc. They are no longer the ideal technological choice.

CouchDB is built for the web scenario. Storing huge amounts of semi-structured data is the default case in CouchDB. It makes excellent use of modern multi-core machines and multi-machine setup. CouchDB views are built using Map/Reduce, the concept that made Google. Replication allows a user to take data offline and work with it locally, without a network connection; and it can be used to synchronize machines in a load-balanced or highly-available setup or both. And the HTTP REST API makes talking to CouchDB as easy as opening a browser.

And it is written in Erlang which gives us high concurrency (20k requests per second on a single machine?), fault tolerance and live code-upgrades. Erlang is worth an entire interview in itself.

Oh hey, that isn't exactly a concise answer and I didn't even tell you half of the good stuff :-)

You're approaching this book in a very open fashion (the book is online and available as you write it). How did you convince O'Reilly to go that way?

Jan We are developing CouchDB. Writing documentation comes with development. We figured that if we have to do the writing anyway, we don't want to do it twice.

O'Reilly is pretty open about (heh) open books. They have the Subversion book that everybody knows and recently released a book on Haskell. They have experience doing this kind of thing and they have seen good results in the past. When we asked about the possibilities, they just said "sure".

Chris As Jan says, it wasn't hard. O'Reilly understands the value of open information, and the value proposition around publishing, which is part of why they've done so much to enter the wider arena of sharing ideas.

JanNoah ultimately pushed us in the direction of writing the book in the open. Once decided, I couldn't see it done any other way and I am glad for the pushing.

Noah We had been talking about doing a book for a while and I was always very adamant that whatever we did would be released under a free license.

The biggest deficiency in free operating systems is not in the software—it is the lack of good free manuals that we can include in these systems. Many of our most important programs do not come with full manuals. Documentation is an essential part of any software package; when an important free software package does not come with a free manual, that is a major gap. We have many such gaps today.
   —Free Software and Free Manuals, Richard Stallman

Since Stallman wrote this essay, the technology publishers have started to wake up a little bit. A growing number of manuals and books are being released under free licenses, and this is absolutely marvellous. O'Reilly is really leading the way on this, so we were very lucky to get a deal with them.

Our editor told us a surprising rule of thumb, that releasing a good book under a free license makes it sell more copies, and releasing a bad book under a free license makes it sell less copies. Let's hope that we are the former!

What benefits do you expect from this approach?

Chris For me, the hardest part is knowing what people just learning CouchDB will need to know. The authors mostly communicate with people who already understand and use CouchDB, but I'd like the book to be capable of drawing you in, even if you're new, so that you feel comfortable building Couch apps.

Noah I am very lucky to be writing the book with two people a lot smarter than me. When I make mistakes or write something a little silly, I get corrected. Similarly, when you develop something in the open, you get all these really bright people reading your stuff and picking you up on things that you've missed or didn't think about properly. Open collaboration is hugely beneficial like that.

Jan That the world fixes our typos. Oh wait, no! Actually, that's what our editor will take care of. You saw in my answer to the "Why CouchDB" question that it is not a one-liner. Add a very diverse potential readership and you get even more ways to put things We can't cater everybody, but we are trying make this book worth a read for a lot of people, from developers to administrators, system architects and hardcore RDBMS lovers.

By opening the writing process, we get decent feedback very early on and as a result will produce a much better book. Besides, we have been open source developers since forever, we just couldn't do it any other way. Release early, release often.

So far there's been a lot of feedback on the book's mailing list. How is that affecting your writing process?

Jan In early drafts you know that there are passages that try to explain an idea and that these passages don't make a good job. If you know the topic at hand pretty well, you might not notice immediately. Public feedback will tell you pretty quickly what works and what doesn't and that helps with the review proces. The writing process, at least mine, is not really affected.

Noah I think the most significant thing I have taken away from this process is the level of confusion caused when trying to compare CouchDB with traditional databases. I mean, fundamentally, this comparison makes about as much sense as comparing cheese and bread. Sure, you can eat both of them, and one may be applicable in one situation and not in the other, but they also taste really nice when you put them together. That's kind of how I see the CouchDB versus RDBMS debate. Meaningless, essentially. CouchDB isn't a panacea, nor are relational databases, and both of them have their uses. I think we're going to work on improving this clarification in the book.

Having said all that, we've only been taking comments for five days now and the response has been just overwhelming. Who knows what other issues, or points of confusion we're going to find along the journey? I'm pretty sure whatever happens, it's going to be just as rewarding. Thanks for the input everyone!

Chris I've been happy about how people are reacting to the figures. I basically hand-sketched some basic drafts of technical drawings for the book. I was thinking, "what's the simplest thing that could possibly work?" and some of the drawings were even captured with the iPhone built-in camera.

I was surprised because people seemed to like the hand-made feel. I've started refining that style, with the rough drafts for the code-example sections. I'm a big fan of the Head Rush books, so I'm hoping that when O'Reilly brings their artists in, they can approach it with the same playful style.

You've posted three chapters of the book already. What should we be looking for next, and how soon should we be seeing it?

Noah We're aiming to submit another six chapters to O'Reilly in early January, so you should see things trickle through to the website as we complete that process.

Jan We are aiming some time in January for the next batch. We are trying to do the book sequentially and the ToC is online already, so you can see what's coming :o)

Chris We're working on a description of the example blog application we'll be releasing. The app right now is pure-Ajax, but by the time CouchDB hits 1.0, we'll be able to do major portions of it without JavaScript at all. CouchDB it getting HTML generation capabilities so expect that to change some of how we build applications.

What kinds of projects should developers be looking at CouchDB for?

JanAfter my talks or in discussions with developers I often get asked what "niche" CouchDB fits in. I usually say "The Web" and that's me trying to be funny again because the web is nowhere near a niche.

Everything that stores messy data that users (no offence) submit (Facebook, flickr, YouTube etc.) and need efficient access to that data. Systems that handle document structures (CMSes, Blogs ... ). Situations where offline-work is preferable. The internet is available everywhere, except when the Wifi is not working or the hotel charges $5 a minute. Not using The Net to get to your data is usually faster, too.

Finally, database are a fairly boring topic (among non-database nerds). CouchDB makes databases fun again.

Chris Yes and +1 to that, Jan!

Look at replication, think about the new opportunities it opens up. Think about applications with the power of the desktop (and location) but written in the language of the web. HTML, JavaScript, REST, JSON, these have become essentially the lowest-common denominator for web services. When offline mode is not a service downgrade, and the source code is at the desktop, people have new affordances. I'm hoping Couch apps become the new Excel macro.

While CouchDB is written in Erlang, there are a lot of libraries to use it from other languages. Which ones have you dealt with? What's good and bad about them? Which ones will show up in your book?

Chris We're concentrating on the HTTP API for the book, so most of our examples are written in JavaScript or Curl. However, there are good libraries in most languages these days, so to the extent a language handles dynamic JSON-like objects well (and can access HTTP) it's a good fit for CouchDB.

Jan I think I wrote three PHP libraries and only one didn't completely suck, but that were the early days. There is the excellent (documented, unit-tested) PHPillow that has been extracted from a real-world project, which is always a good thing.

I used couchdb-python by our very own contributor Christopher Lenz for a number of projects now and it is very solid. If you are more of a Twisted person (heh), David Reid hosts Paisley.

Good and bad? CouchDB introduces new paradigms and we are still in the process of finding out what client-library abstractions work best (hint, ActiveRecord doesn't work well at all). Like all pioneering, this includes some stabbing in the dark. But in general, the libs I've seen are at least well suited for a single job or class of jobs. Maybe there is no one library to rule them all. We'll find out.

If you want to get into CouchDB (and Erlang) what's the best way to learn?

Jan For CouchDB, you don't need to know any Erlang. If you are familiar with the web, you are good to go. Start out by reading through the wiki and of course our book (*cough*). For more specific and tutorial style documentation, check Planet Couch, it aggregates CouchDB related blog posts and includes already a wealth of information.

If you still want to get into Erlang, there is Thinking in Erlang 30-page free PDF that gives you rough overview. For deeper diving, Joe Armstrong, Erlang's inventor, wrote an excellent book Programming Erlang which is a fun read and packed with all you need to set out writing your first (and second) applications.

Chris I learned Erlang by working my way into CouchDB's source code, based on my experience with the HTTP API. The HTTP callbacks are easy to find, and since you already know what they do, its a good way to learn Erlang. There are plenty of ways to get into CouchDB without touching Erlang — we're working on a feature that allows you to write arbitrary controller logic in JavaScript, so really, you can customize CouchDB, as well as use it, by programming any language that understands JSON.