Search NoSQLSolution

Fun with Riak

23 July 2012

I started work on a new project recently and we needed a key value store to store some quotes. We don't know how many quotes up front (less than a billion, more than 10 million). We wanted a key value store that would be resilient, easy to expand, and perform well.... I can't think of a better store for us than Riak.

So what is riak and why is it awesome?

Riak is a key value store with a unique take on the CAP theorem. CAP stands for Consistency, Availability and Partition Tolerance. Basically it states that, in a distributed server, you can guarantee only 2 of these at any one time. (pick any two). Most database products can then be classified by the features they have chosen to accept. For example your typical relational database is a CA environment; Consistent (or atomic) and available -- In a replicated environment for example you can always access some set of the data. However, if nodes loose communication then they become out of sync... You can see that you would have to sacrifice Availability to guarantee Partition Tolerance and Consistence (because your database would have to say "sorry, until I can verify that all nodes are walking the same walk I can't guarantee the data is correct").

CAP Venn Diagram (borrowed from Couch base without permission)

Riak has a different approach. It still obeys the CAP theorem but it is tune-able per request. This means that if you really don't care that the data may have changed on another node you can specify that and access the store as a Available and Partition Tolerant (AP) store. Or in a different request where you absolutely have to be sure you can be consistent and Partition tolerant. (PA). This gives Riak a great deal of flexibility.

That's cool but even without it Riak would be an awesome product... Lets look at how riak manages it's partitions...
The Riak Ring

The riak ring is that mechanism that riak uses to locate data. When a datum is received it will be hashed to a location on the ring. The ring is an 160 bit map which vnodes claim. The diagram above illustrates a 4 node cluster mapping onto a 160bit address space. What's immediately important here is that the ring is partitioned between the 4 vnodes and that when you write data you can specify the number of nodes to write the data to. In a typical write operation you would typically write to n nodes where n = 1/ 2 (total nodes) + 1. The reason you'd do this is to make sure that the loss of a single vnode will not loose you data. This is a little bit like a RAID array but with a lot more flexibility.

it's important to note that you don't care which nodes are written to, only that you are writing to a particular number of nodes.

When a request for data is received the key will be hashed and looked up on the ring map. This will tell riak that the information should be in vnodes 1, 2, 5, 17 etc. Riak can then ask those nodes for the information and return it to you.

This is where the CAP theorem comes into play again. Say you have some information that you want to store (quotes) and you don't really care if you get the most up to date version of that information (quotes change all the time) then you can ask riak to stop looking as soon as it finds 1 vnode with that information in it. You can still store the quote in multiple places to provide resilience and performance but you can read dirty if you want to. On the other hand you could easily say: I need to know the consistent quote and demand the answer be from multiple vnodes. It's up to you.

Each node maintains a copy of the ring architecture and knows how to communicate with the other nodes. It's a share nothing architecture. This means that if a node goes offline, the rest of the nodes can redistribute their data around the other nodes in the ring. This is awesome and a very good thing.

Adding more capacity is as simple as spooling up another riak node and pointing it at any of the nodes currently in the cluster. The new node will join the clustered, start receiving data and answering queries. It is free to answer queries as soon as it joins because you (as the user) control the desired consistency of the response. If you are OK with missing a piece of data because a new node is spooling up then you do. If you are not then you just make sure that you state that you need a quorum of nodes to provide your answer.

So this means that if you loose a node (or a few nodes) then your system will still work with no data loss! and recovering from such a disaster is as simple as firing up some more nodes..... assuming you are behind a load balancer it's completely possible that your application won't even notice when a disgruntled sysadmin takes a shotgun to your servers.

Riak is also embarrassingly easy to set up. It took a few hours on a CENTOS 6.3 box to learn the ropes and have a riak cluster up and running. We're writing SALT scripts now that will mean we can spool up a new server in minutes.

It also has a restful http interface.

And it implements a per node solr search handler.....

In short, riak is awesome! give it a try

Read more about riak here:

Popular Posts