Friday, July 30, 2010

Starting to write a Cassandra app in Java

I’m going to explore using Java to create an application that uses Caassandra as a datastore. To do this I’m going to implement the Bloggy App that is described in Arin Sarkissian’s introduction to Cassanadra:

WTF is a SuperColumn? An Intro to the Cassandra Data Model

Creating the keyspace

Now assuming you’ve got Cassandra up and running you’ll need to create the keyspace for the app which describes the column families and other config (such as sorting options on the columns). You’ll need to read Arin’s web page for more detail but here from that page is the config that needs to be added to storage-conf.xml to create the keyspaces. You’ll need to do this on each node in your cluster and you’ll need to restart Cassandra on each node for the keyspaces to be created. Add this to the Keyspaces section of the file:

<Keyspace Name="BloggyAppy">

<!-- other keyspace config stuff -->
<!-- This is a test app from : -->

<!-- CF definitions -->
<ColumnFamily CompareWith="BytesType" Name="Authors"/>

<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>
<ColumnFamily CompareWith="TimeUUIDType" Name="TaggedPosts"/>
<ColumnFamily CompareWith="TimeUUIDType" Name="Comments"

CompareSubcolumnsWith="BytesType" ColumnType="Super"/>


<!-- Number of replicas of the data -->


~ EndPointSnitch: Setting this to the class that implements
~ AbstractEndpointSnitch, which lets Cassandra know enough
~ about your network topology to route requests efficiently.
~ Out of the box, Cassandra provides org.apache.cassandra.locator.EndPointSnitch,
~ and PropertyFileEndPointSnitch is available in contrib/.


Writing data to the keyspace

I’m planning on using Java to create my application so I’ll need a way to connect to the database. Cassandra uses Thrift as an API but I’ll use a higher level client, in this case Hector. Download the latest version from: Hector Downloads and make sure the files are in your classpath. There are a couple of example files (and the code here will be very heavily based on these examples) at the git hub wiki. Also look in the test section of the src code on github for more examples

More info on Hector is here

Connecting to the database

Connecting to the database is nice and easy, get a pool instance and borrow a client
CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
CassandraClient client = pool.borrowClient("xxx.yy.36.151", 9160);

remember to release the connection once you’re done with it.


Writing an entry

Before we can write anything to Cassandra we need to set the keyspace we are going to use. In this case we are going to use our Blog application keyspace BloggyApp:

Keyspace ks = client.getKeyspace("BloggyAppy");

Suppose we want to add a “record” (to borrow from RDBMS terms), in this case lets add an author record to Authors column family. First get a column path to the Authors column:

ColumnPath columnPath = new ColumnPath("Authors");

So what we want to do is add a number of “fields” (which are name value pairs) to our “record” Suppose our “record” is going to look like this:

Tel == 01555 XXXXX
Email ==
Address == Blogspot

“Andy” is going to be our Key and each of Tel:data, Email:data, Address:data columns in that key. So to add the Andy key with a email address:

String key = "Andy";
String columnName = "Email";
String value = "";

ks.insert(key, columnPath, value.getBytes());

So, here we set the columnpath (email) and then add to the key (andy) this columnpath with a value. Note that the value is stored as an array of bytes. We can go on like this to set the telephone number:

columnName = "Tel";
value = "01555 XXXXX";
ks.insert(key, columnPath, value.getBytes());

If we want to add a new “record” (say for Joe) just change the key (key=”Joe”) and start adding “fields”. Note we haven’t defined how many fields a key has or what the fields are. They are added as needed and not all may be present. This is a major difference to a traditional RDBMS. One last thing, our bloggy app (as defined in Arin’s article needs a pubdate in a Blog Entry key. This needs to be stored as unixtime. We can do that like this:

columnName = "pubDate";
long now = System.currentTimeMillis();
Long lnow=new Long(now);
value = lnow.toString();
ks.insert(key, columnPath, value.getBytes());

The important thing is we convert the long now value to a string before inserting it into the key.

Next time, starting to get some of this info out of Cassandra


  1. Going to give this a shot for mine... or at least version 3 of mine :p