Couchdb-dump

Tools to dump, modify, and load documents in CouchDB from the command line. (Same basic concept as mysqldump, but much more and for CouchDB)

View project on GitHub

couchdb-dump

A set of three command line tools that perform the following functions.

  • cdbdump outputs all documents (including any attachments) in a CouchDB database
  • cdbmorph lets you provide a function that can modify the documents in that output
  • cdbload takes that output as input and loads it back into a CouchDB database.

Reading and writing the data is done via stdin and stdout, respectively. The output of cdbdump is a JSON document containing a "docs" array element which contains the CouchDB database documents. The cdbmorph command takes the output of cdbdump and allows you to modify the documents in it by passing each of them through a function that you supply. The cdbload command takes an input which is exactly the same as the output of cdbdump or cdbmorph and writes every document in it into the target database.

Installation

npm install -g couchdb-dump
See also couchdb-dump on npm.

Usage Examples

The following will dump the contents of a CouchDB database called myhugedatabase running on port 5984 on localhost. The output is written to a file called myhugedatabase.json.

cdbdump -d myhugedatabase > myhugedatabase.json

If you are doing this for archiving purposes you could do something like this to extract and gzip by piping output ...

cdbdump -d myhugedatabase | gzip > myhugedatabase.json.gz

See the project README for more on the usage of the cdbdump command.


Both of the following command examples will load all the documents in the myhugedatabase.json file into a CouchDB database called myhugeduplicate.

cdbload -d myhugeduplicate < myhugedatabase.json
OR
cat myhugedatabase.json | cdbload -d myhugeduplicate

You can even do this ...

cdbdump -d myhugedatabase | cdbload -d myhugeduplicate

... which streams all the docs from one CouchDB database into a second one. While this works well, you should probably take a look at using CouchDB's awesome built-in replication features instead.

See the project README for more on the usage of the cdbload command.


But suppose you need to manipulate the documents in the cdbdump output before you load them into CouchDB with cdbload. You can do that with the included cdbmorph command.

Lets assume you need to add a new key and value to all documents which meet certain criteria, and you need to delete documents which meet some other criteria. So you write a function as follows and save it in a file called morph.js. For example:

module.exports = function(doc, cb){
  if(doc.somekey && doc.somekey === 'somevalue'){
    doc.anotherkey = 'anothervalue';
  }
  if(doc.someotherkey && doc.someotherkey === 'someothervalue'){
    doc._deleted = true;
  }
  cb(null, doc);
}

Now you can run the documents in the myhugedatabase.json file from the example above through this function and feed the output into a file or directly back into a CouchDB database as follows ...

cat myhugedatabase.json | cdbmorph -f ./morph.js | cdbload -d myhugemodified

You can also pipe output directly from cdbdump to cdbmorph to cdbload like this ...

cdbdump -d myhugedatabase | cdbmorph -f ./morph.js | cdbload -d myhugemodified

You can even put modified documents back into a CouchDB database by loading the stream of changed documents back into the source database, simulating the feel of in-place-updates, like this ...

cdbdump -k -d myhugedatabase | cdbmorph -f ./morph.js | cdbload -d myhugedatabase

See the project README for more on the usage of the cdbmorph command.


Release History