CouchDB Assignment

Video

Overview

The purpose of this assignment is to learn about a database called CouchDB.

The code developed in this course relies only on the ability of CouchDB to function as a key-value store. As a result, the code can be more easily modified to work with any number of datastores that provide simple key-value functionality.

A key-value datastore is a map that associates keys with values. In CouchDB, the keys are strings and the values are JSON documents, which are also strings. Because JSON strings represent objects in Javascript and other languages, you can think of a key-value store as an object store. If you know the key for an object, you can easily look up the object. Key-value stores give you the following basic operations, expressed in pseudo code.

value get(key)
void  put(key, value)
void  delete(key)

Compared to relational databases, key-value stores are simple and fast, but do not provide a simple query mechanism. As a result, it is difficult to answer questions about the data, such as how many customers spent more than 30 dollars?

CouchDB provides multi-master replication. When a Web application writes to anyone of the database servers, the write operation finishes quickly on the chosen server and eventually propagates to all the servers. If writes to different database servers are in conflict, the conflict will eventually be detected, and must be resolved at the application level.

Multi-master replication provides a means to increase availability and responsiveness in applications but enables inconsistencies that must be handled by application code. We will not cover these topics any further in this course. However, I'm interested in this problem, so let me know if you want to do an independent study or master's project in this area.

This assignment uses curl at the command line to explore how CouchDB works.

Reading

To better understand this assignment, you will need to read about CouchDB and about databases in general. Here are some resources that I used.

Assignment Folder

Create a directory named couch for the work you do for this assignment. At the end of the assignment, this folder will contain the following files.

Instructions

Install and Run CouchDB

Install CouchDB and start it. When Couch starts, it will open a tab inside your browser and retrieve the main page of Futon, the browser-based interface to the database. The instructions in this assignment do not involve this interface, but it is probably useful to learn.

I can think of 4 ways to interact with a CouchDB database.

Define a Server Admin User

When you first install and run CouchDB there are no database users, so the Couch server runs every operation submitted to it. To close this security hole, we will define a server admin user, which is a user who can operate on databases. We will use the following abilities that can only be performed by a server admin user.

The instructions in this assignment assume that you create a server admin user named admin with password 1234. You can create this user through the Futon interface or by issuing the following command.

curl -X PUT http://localhost:5984/_config/admins/admin -d "\"1234\""

Creating a server admin user keeps non-admin users from performing server administration tasks. To see that non-authenticated users can not create databases, try to create a database named test with the following command.

curl -X PUT http://localhost:5984/test

To create a database, you can provide the username and password of a server admin in the HTTP request URL. The following shows how to do this for our example.

curl -X PUT http://admin:1234@localhost:5984/test

After running the above command, run the following to verify that a database named test was created.

curl -X GET http://admin:1234@localhost:5984/_all_dbs

The above command returns an array of names of the databases hosted by the server. Observe that test is in the list.

Run the following to delete database test.

curl -X DELETE http://admin:1234@localhost:5984/test

Insert a Security Object

We already learned about the concept of a server admin user. There are also database member users and database admin users. Database member users can read and write documents in the database, but can not write design documents. Database admin users can read and write all documents. Design documents provide additional capabilities beyond key-value store operations, such as providing a means to query the data. However, we will not cover design documents in this course.

The member and admin users for a database are specified in a database security object. If no users are specified in the security object in a database, then non-authenticated clients can access the database. To avoid this, we will insert a security object that enables access by only our existing server admin user.

Use what you learned in the previous section to create a database named test.

After creating the test database, run the following command to verify that we can access information about it.

curl -X GET http://localhost:5984/test

Run the following command to insert a security object into the test database that only allows server admin access. Note the use of '\' for line continuation.

curl -X PUT http://admin:1234@localhost:5984/test/_security  \
     -H "Content-type: application/json"                     \
     -d "{\"members\":{\"names\":[\"admin\"]}}"

On Windows, use '^' instead of '\' for line continuation.

Now, verify that we can no longer access the test database without specifying the server admin credentials.

curl -X GET http://localhost:5984/test

From this point on, we need to provide the username and password of the server admin user in the HTTP request as shown in the following.

curl -X GET http://admin:1234@localhost:5984/test

The commands illustrated in this assignment rely on unencrypted HTTP. In a real world application, you would need to use encrypted HTTPS, otherwise your server admin username and password could be intercepted by a malicious user. Configuring CouchDB to support HTTPS requires some work, which this course does not go into.

If you are not replicating data across the public network and the connection between your Web application server and CouchDB server is through the localhost or a private network, you can turn off authentication and use HTTP to reduce overhead. If you do this, make sure that only your Web server can reach the port CouchDB is listening to.

Insert Database Documents

In this section, we will practice inserting, replacing and deleting documents from a database named test. Use what you learned in the previous section to create a database named test.

Use the following command to verify that the test database contains no documents.

curl -X GET http://admin:1234@localhost:5984/test/_all_docs

CouchDB stores documents in the form of JSON strings. To see this, consider the following Javascript object.

var obj = { _id: 'a', x: 1 };

Suppose we want to store the current state of this object in the test database. To do this, we need to describe the object using JSON syntax. We can either write the JSON string manually, or we can use the JSON.stringify function to determine it for us. To do this, run the node command line interpreter.

node

Then, enter the following lines.

obj = { _id: 'a', x: 1};
JSON.stringify(obj);

The node command line interpreter displays the value of the expression entered. The value of JSON.stringify(obj); is the following string.

'{"_id":"a","x":1}'

The outer apostrophes are used to show that the value is a string; these are not actually part of the JSON representation of obj.

Use curl as follows to send the JSON representation of obj to the test database. Note that the value for the -d argument needs to be quoted.

curl -X POST http://admin:1234@localhost:5984/test  \
     -H "Content-type: application/json"            \
     -d "{\"_id\":\"a\",\"x\":1}"

Use the following command to verify that the test database contains the document.

curl -X GET http://admin:1234@localhost:5984/test/_all_docs

Insert JSON representations of the following additional Javascript objects.

{ _id: 'b', x: 2 }
{ _id: 'c', y: 1 }

Write a Database Creation Script

When developing an application that relies on a database, it is very useful to have a script that resets the application's database to a specific state. This allows you to test for specific functionality that you may be in the process of developing. It also allows you to easily delete old data that may not be consistent with recent changes to application logic. Create a script file named createdb.bat (if on Windows) or createdb.sh (if on OS X or Linux) that deletes the test database and then recreates it with the empty security object and example documents used in this assignment.

When developing test cases, it is convenient to read JSON documents from files rather than including them directly in the database creation script. Let's see how to do this.

Create a file named d.json with the following contents.

{ "_id": "d", "y": 8 }

Modify your database creation script to insert the contents of d.json into the database. Do this with the following command.

curl -X POST http://admin:1234@localhost:5984/test  \
     -H "Content-type: application/json"            \
     -d @d.json

Verify that your script creates a database that contains documents with ids a through d, and that access to the database is restricted to the server admin user.

Test Update

This section is not discussed in the video but is needed for later assignments.

CouchDB maintains a revision string in the JSON documents it stores. The revision string enables the database server to detect conflicting writes.

Updating a document in CouchDB involves replacing the existing document with a new one. This operation requires that a revision string be provided, so that the CouchDB server knows which version of the document you are trying to replace. If the document has been updated by another process since you acquired the revision string, then your revision string will not match the current revision string and the update will fail.

In this section, you will write a script that updates a document in the database.

Run the database creation script and observe that the revision string for the document under id a is the following.

"1-0785e9eb543380151003dc452c3a001a"

Create a script named test_update.sh (or .bat if on Windows) that includes the following commands.

echo Update document a.
curl -X PUT http://admin:1234@localhost:5984/test/a                        \
     -H "Content-type: application/json"                                   \
     -d "{\"_id\":\"a\",\"x\":100,\"_rev\":\"1-0785e9eb543380151003dc452c3a001a\"}"
echo
echo Display document a.
curl -X GET http://admin:1234@localhost:5984/test/a
echo
echo Try an update with an old revision number.  Note a conflict.
curl -X PUT http://admin:1234@localhost:5984/test/a                        \
     -H "Content-type: application/json"                                   \
     -d "{\"_id\":\"a\",\"x\":999,\"_rev\":\"1-0785e9eb543380151003dc452c3a001a\"}"
echo
echo Display document a.  Note that x is still 100.
curl -X GET http://admin:1234@localhost:5984/test/a

Run the script and study the output.