Friday, September 17, 2010

Moving from version 0.6 of Cassandra to version 0.7

In order to use the latest build of Hector you need to be running a cluster based on version 0.7 of Cassandra. For me this gave me some problems. For a start I’m running my cluster on windows based machines, and it’s been running fine. However beta 1 of Cassandra 0.7.0 does not include the necessary tools for windows to convert the config files and read the schema.

So to get my cluster updated I added a linux box to it and installed a version 0.6.0 of Cassandra and joined the cluster. I installed 0.7.0 on that and attempted to use config-convertor to convert storage-conf.xml to Cassandra.yaml. For some reason that got in a horrible mess so it was back to the original Cassandra.yaml and import the settings manually. For my simple configuration this wasn’t a pain.

Once that’s done I upgraded the windows machines to version 0.7.0.

The next step is to run schematool. Without running this, your cluster will not have any schema’s in the database. This needs to be done from the linux command line:

schematool 134.36.xx.yyy 8080 import

does the job.

See http://www.riptano.com/blog/live-schema-updates-cassandra-07 and http://wiki.apache.org/cassandra/LiveSchemaUpdates for more details.

Friday, August 20, 2010

ConsistencyLevel in Hector and Cassandra

I started playing with failover in Casssandra the other day and rapidly found myself in a bit of a pickle. I had a 2 node development environment allowing me to play with Cassandra and start developing Hector programs. So I decided to turn one machine off and see if my program would carry on as normal.

It didn’t. Not much of a failover I thought. To make matters worse if attached via the Cassandra cli client, I could retrieve data from my single node. What was going on ?

Turns out this was all to do with the consistency of my cluster. Take a look at the consistency section of:

http://wiki.apache.org/cassandra/API

There are multiple levels of consistency available in a Cassandra cluster, and they can be different for read and write operations. What I hadn’t realized is that Hector defaults to a consistency of QUORUM whereas cli defults to a consistency of ONE (I believe). So with a two node cluster, Hector will fail on reads if one goes down.

Adding Nodes

One way to get round the problem is to add more nodes and up the replication level in the conf files to 4. I added two notes that where not part of the seeding process and used the Autoboostrap option in he conf file to join the cluster. This has kind of solved the problem. I can now read from one of the bootstrapped nodes if any of the other nodes goes down, but not from one of the seed nodes. I think more work on the configuration and layout of the cluster is needed by me.

Changing the consistency in Hector

You can of course change the consistency in Hector. The code here refers to Hector 0.6.15 and above (for now). The first thing we need to do is implement a consistencylevelpolicy. Here’s an example based on the default consistency implementation in Hector

import me.prettyprint.cassandra.model.*;
import org.apache.cassandra.thrift.ConsistencyLevel;

public final class MyConsistancyLevel implements ConsistencyLevelPolicy {

@Override
public  ConsistencyLevel get(OperationType op) {
   switch (op){
      case READ:return ConsistencyLevel.QUORUM;
      case WRITE: return ConsistencyLevel.ONE;
      default: return ConsistencyLevel.QUORUM; //Just in Case
   }
}
@Override
public ConsistencyLevel get(OperationType op, String cfName) {
   return ConsistencyLevel.QUORUM;
}
}

In this example we set the READ consistency to QUORUM and the WRITE consistency to ONE. (I’ve also included a default consistency just in case !). Once we’ve got this class we can set the consistency for the keyspace like this:

ConsistencyLevelPolicy mcl = new MyConsistancyLevel();
ko.setConsistencyLevelPolicy(mcl);

And that’s it !

Many thanks to Ran Tavory and Colin Vipurs for their help on this.

As ever comments and so on gratefully received.

Wednesday, August 18, 2010

A brief note about using Clusters in Hector V2 API

Recently Hector, the java library for access to Cassandra dB was updated to version 2. Now it’s time to explore moving jBloggyAppy over to the new API. In this blog I’m briefly going to look at Hector V2’s clustering options which greatly improve on version 1. To create a cluster we use getOrCreateCluster from the Hector Factory (HFactory e.prettyprint.cassandra.model.HFactory.*; ) Ran Tavory recommends importing the static library to reduce typing !:

import static me.prettyprint.cassandra.model.HFactory.*;


Once we’ve done that we create the cluster
Cluster c = HFactory.getOrCreateCluster("MyCluster", "154.36.xx.yyy:9160");

And that’s it !

However, note that we are now connected to only the machine in the cluster that we have named. we can get a list of all machines in the cluster like this:

Set <String>hosts= c.getClusterHosts(true); 
Iterator it =hosts.iterator(); 
while (it.hasNext()) {  
   System.out.println(it.next()); 
} 

The problem here is that we haven't got the port number of each machine in the cluster, nor do can we find out about the topology. Thanks to the Hector mail list folks for pointing this out.

If we want to know the clusters name:
System.out.println(c.describeClusterName());

Finally should we want to use the cluster with a V1 pool:
CassandraClient client =c.borrowClient();

and release it in a finally clause:
} finally {
   c.releaseClient(client);
}

Tuesday, August 10, 2010

Using Java reflection to render a JSON feed

In today’s post we are going to look at a method for rendering JSON output from our java web app. It’s important to note that what I’m going to show relies on a couple of things:

1: I like to use Java beans to encapsulate data between elements of the MVC model. If more than one element is needed the beans are added to a list.

2: These beans use standard accessor methods starting with “get” to access elements private methods.

3: Our controller, a servlet , adds the beans to the request via by setting a attribute and then forwards to the view (usually a jsp page) using a requestdispatcher.

If you look at the code at github you’ll see that we have 4 of these bean stores with different accessor methods and private variables. And here in lies the problem. We could write a JSON encoder that worked on each bean but that’s going to be duplicating a lot of work. The answer is to use Java Reflection to “look” into the bean and extract it’s accessor methods. This will allow us to write one servlet that can process any bean or list of beans and convert it to JSON notation.

For more information on reflection look at Java Reflection

We will use the JSON library from JSON.org to make life easy.

Our Servlet that will generate the JSON will not know what type of bean is being sent to it, or if it is a list of beans. So rather than deal with a distint class we just deal with the Object class. We can get the beans and test for a list like this:

Object temp=request.getAttribute("Data");
Class c = temp.getClass();
String className=c.getName();
if (className.compareTo("java.util.LinkedList")==0){

If it’s not a linked list we will want to get the Object, find all the methods and call the accessor methods. Here’s our method for doing this which returns a JSONObject with only values from the bean that actually have data. In this example “Value” has been obtained from the request using getAttribute.

private JSONObject  ProcessObject(Object Value){ //Value has been passed to the servlet
 JSONObject Record=new JSONObject();
  
 try {
            Class c = Value.getClass();
            Method methlist[] = c.getDeclaredMethods();
            for (int i = 0; i < methlist.length; i++) {  
              Method m = methlist[i];
             
              String mName=m.getName();
             
                 if (mName.startsWith("get")==true){
                       String Name=mName.replaceFirst("get", "");
                  Class partypes[] = new Class[0];
                  Method meth = c.getMethod(mName, partypes);
                 
                  Object rt= meth.invoke(Value);
                  if (rt!=null){
                   System.out.println(Name+" Return "+ rt);
                   try{
                    Record.put(Name,rt);
                   }catch (Exception JSONet){
                 System.out.println("JSON Fault"+ JSONet);
                 return null;
                }
               
                  }
                 }
            }
            
            
         }
         catch (Throwable e) {
            System.err.println(e);
         }
         return Record;
 }
Dealing with a linked list of objects is just a case of iterating through the list and calling the above object for each bean we want to encode:
if (className.compareTo("java.util.LinkedList")==0){ //Deal with a linked list
 List Data = (List)request.getAttribute("Data"); 
 Iterator iterator;
 JSONObject JSONObj=new JSONObject();
 JSONArray Parts=new JSONArray();
 iterator = Data.iterator();     
 while (iterator.hasNext()){
  Object Value=iterator.next();
  JSONObject obj =ProcessObject(Value);
  try {
   Parts.put(obj);
  }catch (Exception JSONet){
           System.out.println("JSON Fault"+ JSONet);
          }
 }
 try{
  JSONObj.put("Data",Parts);
 }catch (Exception JSONet){
       System.out.println("JSON Fault"+ JSONet);
      }
 if (JSONObj!=null){
  PrintWriter out = response.getWriter();
  out.print(JSONObj);
 } 
Our simple code could be extended. Although it will deal with simple types stored in beans (strings, ints, longs, dates etc) if your bean stores more complex data (arrays or lists) then handling the reflection will need to be a lot more complicated!

Monday, August 9, 2010

Hector V2 API announced

One of the problems of using “cutting edge” software is that things are always evolving and occasionally the work you are doing can get left behind. This has kinda happened with the code I've been working on, I’ve been working on http://github.com/acobley/jBoggyAppy

I’ve been using Hector as the API between Java and Casssandra, building models to encapsulate data stored in the database. However Hector has just undergone a major overhaul, upgrading the interface to version 2. This is great and really to be appreciated, Ran and the team have made things easier and moved away from the complexities of Thrift .

I’ve decided I’ll carry on with the code using the V1 API and then come back and redo it in V2 later. This should be relatively easy, I’m keeping all the Hector code in one place (Java Bean Connectors) and with luck I’ll just override the current methods with new V2 methods. The only changes I may need to make to the controllers are in the dB connection method. The V2 api implements clustering (which in it self is a useful step forward) but this may change how my code needs to interact.

I’m looking forward to using V2 once I’ve got comments implemented in my jBloggyAppy code !

Converting a long to a byte array and back again

One thing that may not be immediately obvious is that Cassandra stores all values as a byte array. While this is not really important if your are storing a string (converting to a byte array is relatively easy) what if you need to store a date. Dates in Java are essentially of type log and represent the number of milliseconds since 1970 (or there abouts). If you want to store a date you need to convert it to a byte array and back again when going into and out of the key store. You could use serialization to achieve this, but as this article:

http://java.sun.com/developer/technicalArticles/Programming/serialization/

points out serialization can be a slow process. So, here I present two methods to convert a date to and from a byte array. I welcome comments on these, I’m sure they can be speeded up and generally be improved.

Converting to a byte array

Longs are stored as 8 bytes, so converting them is a case of masking off the bottom byte storing it in the array and then shifting the orignal number right 8 bits.

private  byte[] longToByteArray(long value)
{
byte[] buffer = new byte[8]; //longs are 8 bytes I believe
for (int i = 7; i >= 0; i--) { //fill from the right
 buffer[i]= (byte)(value & 0x00000000000000ff); //get the bottom byte
     
 //System.out.print(""+Integer.toHexString((int)buffer[i])+",");
 value=value >>> 8; //Shift the value right 8 bits
 }
return buffer;
}

Converting back from a byte array to a long


Converting back the other way we use a multiplier to convert each byte to it’s correct value. This multiplier is shifted left 8 bits each time round the loop.

private long byteArrayToLong(byte[] buffer){
long value=0;
long multiplier=1;
for (int i = 7; i >= 0; i--) { //get from the right
 //System.out.println(Long.toHexString(multiplier)+"\t"+Integer.toHexString((int)buffer[i]));
 value=value+(buffer[i] & 0xff)*multiplier; // add the value * the hex multiplier
 multiplier=multiplier <<8;
 }
return value;
}

A simple test case

The following code shows these examples in use, converting a Date to a byte array and back again maintaining it’s value:
long tempnow = System.currentTimeMillis();
Date tempDate= new Date(tempnow);
System.out.println("now "+tempnow);
System.out.println("Native Date "+tempDate);
        
//Convert to Byte Array and print
byte btempnow[]=longToByteArray(tempnow);
System.out.println();
System.out.print("Byte Array ");
displayByteArrayAsHex(btempnow);
        
//and Convert it back again
long converted =byteArrayToLong(btempnow);
tempDate=new Date(converted);
System.out.println("converted now "+converted);
System.out.println("converted  Date "+tempDate);

Friday, August 6, 2010

jBloggyAppy on Github

I've started work integrating the Cassandra code into an actual app. All the code will be available on github:

jBloggyAppy

The goal of the exercise is:

1: To write a Cassandra based app in Java
2: The App should use the MVC programming model
3: The app should implement a Restful interface
4: There will be no CSS (this is not a design exercise) and only minimal Javascript for any AJAX calls.
5: Use OpenID to facilitate logins

The code already deals with creating a new user, listing all users and listing the details for one user.

Wednesday, August 4, 2010

Reading column name value pairs from a Supercolumn for the BloggyAppy

In the last blog post we look at reading from normal column families, in this part we are going to look at reading from supercolumns. We are still folowing the BloggyAppy design from Arin Sarkissian. In this design we have a keyspace called columns. Each entry has a key (that is the title slug from before). Each supercoulmn uses a TIMEUID as the key/ name and the details of the comment are stored as columns underneath that. Here’s the structure:

Comments: {
 Blog-Slug:{
  Time_UUID_1:{
   Coment: A Comment,
   Email: andy@abc.com
},
Time_UUID_2:{
   Coment: A Comment,
   Email: andy@abc.com
}
}
}


The first job is to get the comment keys for a particular title slug. First of all remember that we are going to get all the Time_UUID’s which will be the super columns. The process is very similar to getting columns in a normal column family. There are only one major difference, instead of getting a RangeSlice from the keyspace we will use the getSuperRangeSlices method from the keyspace. The KeyRange and slice predicate work in exactly the same way as before. So to get the map of supercolmns we use:


Keyspace ks = client.getKeyspace("BloggyAppy");
ColumnParent columnParent = new ColumnParent("Comments");
          
SlicePredicate slicePredicate = new SlicePredicate();
SliceRange supercolumnRange = new SliceRange();
             
supercolumnRange.setStart(new byte[0]);  
supercolumnRange.setFinish(new byte[0]); supercolumnRange.setReversed(true); 
supercolumnRange.setCount(1000);
slicePredicate.setSlice_range(supercolumnRange);
KeyRange titlesRange = new KeyRange(200); 
titlesRange.setStart_key("First-Blog");
titlesRange.setEnd_key("First-Blog");
              
Map<String, List<SuperColumn>> supermap =ks.getSuperRangeSlices(columnParent, slicePredicate, titlesRange);

Now we have the map we can step through the keys (in our example there will be only one) and then for each get the list of supercolumns underneath it. The only thing to remember here is to convert the column name to type UUId:

for (String key : supermap.keySet()) {
 List<SuperColumn> columns = supermap.get(key);

 System.out.println("Key "+key);
 for (SuperColumn column : columns) {
  //print columns with values
  java.util.UUID Name=toUUID(column.getName()) ;
  System.out.println("Name "+Name);
  }
}


Finally we want to get the names and values of the columns inside the supercolumn. The trick here is to get the actual SuperColumn from the keyspace. Here’s one way to do it (note that column.getName will return a timeUUID in our case):

ColumnPath cp = new ColumnPath("Comments");
cp.setSuper_column(column.getName());
SuperColumn sc = ks.getSuperColumn(key, cp);

Now we have the Supercolumn we can just get a list of columns it contains and iterate through them:

List<Column> cols=sc.getColumns();
Iterator<Column>itr = cols.iterator(); 
while(itr.hasNext()) {
 Column col = itr.next(); 
 System.out.println("\t\t"+string(col.getName()) + "\t ==\t" + string(col.getValue()));
}


We should now have all we need to read columns and supercolumsn from the keyspaces. The next step will be to encapsulate all this into models for our web application to use. We’ll start that in our next post.

Tuesday, August 3, 2010

Implementing the comments: Writing a comment as a supercolumn

So far we have looked at simple column families but now it’s time to tackle super columns. We are still working with the BloggyAppy design from Arin Sarkissian. We are using some code from Hector’s test code.

Following Arin’s specification the Comments Column Family is going to look like this

Comments: {
 Blog-Slug:{
  Time_UUID_1:{
   Coment: A Comment,
   Email: andy@abc.com
},
Time_UUID_2:{
   Coment: A Comment,
   Email: andy@abc.com
}
}
}

Lets be clear Blog-Slug will be the Slug entry for the blog that is being commented on. This is the key we will be looking for. Under that are the supercolumn entries, with a key of type UUID. This contains a number of columns that are the entries for the comments. So to add a comment we can do this:
Keyspace ks = client.getKeyspace("BloggyAppy");
ColumnPath cp = new ColumnPath("Comments");
java.util.UUID timeUUID=getTimeUUID();              
cp.setSuper_column(asByteArray(timeUUID));
cp.setColumn(bytes("email"));
ks.insert(slugValue, cp, bytes("andy@abc.com"));
cp.setColumn(bytes("Comment"));
ks.insert(slugValue, cp, bytes("AComment"));


SlugValue is essentially the title. Note that getTimeUUID is defined as:

public static java.util.UUID getTimeUUID()
     {
return java.util.UUID.fromString(new com.eaio.uuid.UUID().toString());
     }

Reading keys and columns from a simple column family in a Cassandra dB

In the first part of these posts I’m going to look at retrieving data from a Cassandra Keyspace. We are working with the BloggyAppy design from Arin Sarkissian and the code is heavily based on the Hector examples at
Wiki.

First up lets look at getting the author details from the Authors ColumnFamily. So assuming we have a pool of connections from the Hector pool called “client” we can set the Column Family like this :

Keyspace ks = client.getKeyspace("BloggyAppy");
//retrieve sample data
ColumnParent columnParent = new ColumnParent("Authors");


We are going to use the getRangeSlice method from the KeySpace class to search for a single Author or to list all Author details. This requires a ColumnParent (set above) a slicePredicate and a KeyRange. The keyRange is used to limit the number of keys that are returned and to limit the keysearch to a certain range. The code looks like this:

KeyRange keyRange = new KeyRange();
keyRange.setStart_key(“”);
keyRange.setEnd_key("");


These settings will get all Keys in the Keyspace. If you want to limit the number you can put a numerical value in the KeyRange constructor:

KeyRange keyRange = new KeyRange(1);


This will get just one result. If you want to look for one Key only (such as the Author Andy) then you can do this:

KeyRange keyRange = new KeyRange(1);
keyRange.setStart_key(“Andy”);
keyRange.setEnd_key("");


Hopefully you can see that:

KeyRange keyRange = new KeyRange(100);
keyRange.setStart_key(“Andy”);
keyRange.setEnd_key("Dave");


Will get at most 100 keys that are between the “Andy” and “Dave”

So we can see how to restrict the number of keys that are returned. We now need to look at limiting the number of columns from the requested key are returned. Suppose we have the following

Andy
 Tel  == 01382 345078
 Email  == andy@r2-dvd.org
 Address  == QMB

We may only need the first column (Tel) or all columns or a slice range in between. We will use a slice range (that looks a lot like the keyRange !)

SliceRange columnRange = new SliceRange();
columnRange.setCount(4);
columnRange.setStart(new byte[0]);
columnRange.setFinish(new byte[0]);
columnRange.setReversed(true);


Some differences to note here we can change the order of that the columns are returned using setReversed. The start and end of the column Range are byte arrays (so that the columns need note be strings in the dB) If you want to search for a string (if that makes sense in your app) you can do this of course:


String start="Email";
byte bStart[]=start.getBytes();
columnRange.setStart(bStart); 


Finally we can create a slicePredicate form the columnRange and get the keys and columns from the keySpace.

SlicePredicate slicePredicate = new SlicePredicate(); slicePredicate.setSlice_range(columnRange); 
Map<String, List<Column>>  map = ks.getRangeSlices(columnParent, slicePredicate, keyRange); 


Now we’ve got the map, we’ll just read through it and display the columns:

for (String key : map.keySet()) { 
 List<Column> 
 columns = map.get(key); 
 //print key  
 System.out.println(key);  
 for (Column column : columns) { 
 //print columns with values 
 System.out.println("\t" + string(column.getName()) + "\t ==\t" + string(column.getValue())); 
} 
} 


This should be all we need to get “records” from a simple column family inside a keyspace.

If we want to get all posts by a particular author all we need to do is use our AuthorPosts Column family and use the above code to display the details for each post. Here’s my complete code for displaying all posts for Author “Andy”:

public class ReadAuthorPosts {
 public static void main(String[] args) throws Exception{
  // TODO Auto-generated method stub
        CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
         CassandraClient client = pool.borrowClient("xxx.yy.36.151", 9160);
       
         try {
             Keyspace ks = client.getKeyspace("BloggyAppy");
             //retrieve sample data
             ColumnParent columnParent = new ColumnParent("AuthorPosts");

             SlicePredicate slicePredicate = new SlicePredicate();

             /**
              * this effect how many columns we are want to retrieve
              * also check slicePredicate.setColumn_names(java.util.List<byte[]> column_names)
              * .setColumn_names(new ArrayList<byte[]>()); no columns retrievied at all
              */
             SliceRange columnRange = new SliceRange();
             String Start="s";
             //For these beware of the reversed state
             //columnRange.setStart(Start.getBytes());  //Sets the first column name to get
             columnRange.setStart(new byte[0]);  //We'll get them all.
             columnRange.setFinish(new byte[0]); //Sets the last column name to get
             //effect on columns order
             columnRange.setReversed(false); //Changes order of columns returned in keyset
             columnRange.setCount(1000); //Maximum number of columsn in a key
 
             slicePredicate.setSlice_range(columnRange);

             //count of max retrieving keys
             KeyRange keyRange = new KeyRange(1);  //Maximum number of keys to get
             keyRange.setStart_key("Andy");
             keyRange.setEnd_key("");
             Map<String, List<Column>> map = ks.getRangeSlices(columnParent, slicePredicate, keyRange);

             //printing keys with columns
             for (String key : map.keySet()) {
                 List<Column> columns = map.get(key);
                 //print key
                 System.out.println(key);
                 for (Column column : columns) {
                     //print columns with values
                  java.util.UUID Name=toUUID(column.getName()) ;
              
                     System.out.println("\t" + Name + "\t ==\t" + string(column.getValue()));
                    DisplayPost(string(column.getValue()));
                 
                 }
             }

             // This line makes sure that even if the client had failures and recovered, a correct
             // releaseClient is called, on the up to date client.
             client = ks.getClient();

         } finally {
             pool.releaseClient(client);
         }
 }
 
 
 public static java.util.UUID toUUID( byte[] uuid )
    {
    long msb = 0;
    long lsb = 0;
    assert uuid.length == 16;
    for (int i=0; i<8; i++)
        msb = (msb << 8) | (uuid[i] & 0xff);
    for (int i=8; i<16; i++)
        lsb = (lsb << 8) | (uuid[i] & 0xff);
    long mostSigBits = msb;
    long leastSigBits = lsb;

    com.eaio.uuid.UUID u = new com.eaio.uuid.UUID(msb,lsb);
    return java.util.UUID.fromString(u.toString());
    }
  
  
  private static void DisplayPost(String sKey){
   CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
   CassandraClient client=null;
   try{
        client = pool.borrowClient("xxx.yy.36.151", 9160);
   
        HashMap hm = new HashMap();
        hm.put("pubDate", "");

        
            Keyspace ks = client.getKeyspace("BloggyAppy");
            //retrieve sample data
            ColumnParent columnParent = new ColumnParent("BlogEntries");

            SlicePredicate slicePredicate = new SlicePredicate();

            /**
             * this effect how many columns we are want to retrieve
             * also check slicePredicate.setColumn_names(java.util.List<byte[]> column_names)
             * .setColumn_names(new ArrayList<byte[]>()); no columns retrievied at all
             */
            SliceRange columnRange = new SliceRange();
            String Start="s";
            //For these beware of the reversed state
            //columnRange.setStart(Start.getBytes());  //Sets the first column name to get
            columnRange.setStart(new byte[0]);  //We'll get them all.
            columnRange.setFinish(new byte[0]); //Sets the last column name to get
            //effect on columns order
            columnRange.setReversed(false); //Changes order of columns returned in keyset
            columnRange.setCount(10); //Maximum number of columsn in a key

            slicePredicate.setSlice_range(columnRange);

            //count of max retrieving keys
            KeyRange keyRange = new KeyRange(200);  //Maximum number of keys to get
            keyRange.setStart_key(sKey);
            keyRange.setEnd_key(sKey);
            Map<String, List<Column>> map = ks.getRangeSlices(columnParent, slicePredicate, keyRange);

            //printing keys with columns
            for (String key : map.keySet()) {
                List<Column> columns = map.get(key);
                //print key
                System.out.println(key);
                for (Column column : columns) {
                    //print columns with values
                 String Name=string(column.getName()) ;
             
                    System.out.println("\t" + Name + "\t ==\t" + string(column.getValue()));
                
                }
            }

            // This line makes sure that even if the client had failures and recovered, a correct
            // releaseClient is called, on the up to date client.
            client = ks.getClient();
        }catch(Exception et){
     System.out.println("Can't connect to server "+et);
     return;
    }
       
         try{
          pool.releaseClient(client);
         }catch(Exception et){
          System.out.println("Can't release pool "+et);
         }
       
        
  }
 
}

Saturday, July 31, 2010

Adding a Author index to our Bloggy App

There’s one more point about Arin’s design ( WTF is a SuperColumn? An Intro to the Cassandra Data Model
) which allows you to search by tag or for all posts by searching by a default tag (“__notag__”) . But what if (as is likely ) we want to get all posts by one author ? The Answer is to add a new ColumnFamily to our keyspace that looks the same as the TaggedPosts ColumnFamily but uses the Authors name as the tag. So our this will look like:
AuthorPosts : { // CF
     // blog entries created by “Andy"
      Andy: {  // Row key is the tag name
          // column names are TimeUUIDType, value is the row key into BlogEntries
           timeuuid_1 : i-got-a-new-guitar,
           timeuuid_2 : another-cool-guitar,
       },
_AllAuthors_: {  // Row key is the tag name
          // column names are TimeUUIDType, value is the row key into BlogEntries
          timeuuid_1 : i-got-a-new-guitar,
          timeuuid_2 : another-cool-guitar,
      }
}
We’ve used a made up tag _allAuthors_ for a row that’s going to store all posts from all authors. And in the conf file we add a column family definition like this:
<ColumnFamily CompareWith="TimeUUIDType" Name="AuthorPosts"/> 
We can add the post indexes to our ColumnFamily like this
ColumnPath authorsColumnPath = new ColumnPath("AuthorPosts");

authorsColumnPath.setColumn(asByteArray(timeUUID));
ks.insert(authorValue, authorsColumnPath, slugValue.getBytes());
//And do it for all others
ks.insert("_All-Authors_", authorsColumnPath, slugValue.getBytes());
Here authorValue is a string containg the Authors Name that we have used earlier in the code. timeUUID has been created earlier in the code when we added the TaggedPosts columns. See the previous post for details of creating this value.

The interesting thing about this is that we are using ColumnFamilys as indexes, in traditional SQL we would simply have done something like “Select * from Posts where Author like ‘Andy’ order by postdate” . Here in Cassandra we are creating indexes in Column Families so predetermining how we can search the data. Careful design is needed I think !

Creating the TaggedPost Column Family

Now it’s time to deal with the TaggedPosts Column family. I like to think of this as the indexing mechanism for our application, it’s this Columnfamily that allows us to get all posts or posts from a particular tag. Because the Column names are TimeUUIDType, Arin (who’s design we are working from remember WTF is a SuperColumn? An Intro to the Cassandra Data Model) points out that getting the latest 10 entries is going to be very efficient.

So our entries for this Column family are going to look like:

Tag:{
TimeofPost: TitleofPost,
TimeofPost:TitleofPost,
}

Also remember that Arin’s design has denormalised the tags in the Blog entry so they look like tag1,Tag2,Tag3. In our test code we’ll use an array of tags for a our test entry.

First up we are going to need a ColumnPath for this Column family:

ColumnPath tagsColumnPath = new ColumnPath("TaggedPosts");

So here’s the code:

String Tags[]={"Daily","Ramblings","_No-Tag_"};
columnName = "tags";
value = "";
for (int i=0;i<Tags.length; i++){
      value=value+Tags[i]+",";
      String tagKey=Tags[i];
     tagsColumnPath.setColumn(asByteArray(timeUUID));
     ks.insert(tagKey, tagsColumnPath, slugValue.getBytes());
}


The only point to note here is that the slugValue has been stored earlier in the code and is essentially the title of the post

Now , there is one major point to note, that’s the timeUUID. There are some problems creating this value which is essentially the time of the post, for details on the problems see:

http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java

Essentially to create this UUID we are going to use Johann Burkard’s UUID library available from http://johannburkard.de/software/uuid/ and some of the code detailed in the Apache Cassandra FAC. So our timeUUID is generated as:

java.util.UUID timeUUID=getTimeUUID();

Where getTimeUUID() is taken form the Cassandra FAC:

public static java.util.UUID getTimeUUID()
{
     return java.util.UUID.fromString(new com.eaio.uuid.UUID().toString());
}

And that’s all we need to create he TaggedPosts CollumnFamily.

Friday, July 30, 2010

Trouble with Time UUIDs and Java

This is a place holder, I'm having problems generating time UUIDs and passing them to Cassandra. Currently I'm looking at:

http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java

for an answer

Starting to write a Cassandra app in Java

I’m going to explore using Java to create an application that uses Caassandra as a datastore. To do this I’m going to implement the Bloggy App that is described in Arin Sarkissian’s introduction to Cassanadra:

WTF is a SuperColumn? An Intro to the Cassandra Data Model

Creating the keyspace

Now assuming you’ve got Cassandra up and running you’ll need to create the keyspace for the app which describes the column families and other config (such as sorting options on the columns). You’ll need to read Arin’s web page for more detail but here from that page is the config that needs to be added to storage-conf.xml to create the keyspaces. You’ll need to do this on each node in your cluster and you’ll need to restart Cassandra on each node for the keyspaces to be created. Add this to the Keyspaces section of the file:

<Keyspace Name="BloggyAppy">

<!-- other keyspace config stuff -->
<!-- This is a test app from : http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model -->

<!-- CF definitions -->
<ColumnFamily CompareWith="BytesType" Name="Authors"/>

<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>
<ColumnFamily CompareWith="TimeUUIDType" Name="TaggedPosts"/>
<ColumnFamily CompareWith="TimeUUIDType" Name="Comments"

CompareSubcolumnsWith="BytesType" ColumnType="Super"/>


<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>

<!-- Number of replicas of the data -->

<ReplicationFactor>2</ReplicationFactor>

<!--
~ EndPointSnitch: Setting this to the class that implements
~ AbstractEndpointSnitch, which lets Cassandra know enough
~ about your network topology to route requests efficiently.
~ Out of the box, Cassandra provides org.apache.cassandra.locator.EndPointSnitch,
~ and PropertyFileEndPointSnitch is available in contrib/.
-->
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>

</Keyspace>

Writing data to the keyspace


I’m planning on using Java to create my application so I’ll need a way to connect to the database. Cassandra uses Thrift as an API but I’ll use a higher level client, in this case Hector. Download the latest version from: Hector Downloads and make sure the files are in your classpath. There are a couple of example files (and the code here will be very heavily based on these examples) at the git hub wiki. Also look in the test section of the src code on github for more examples

More info on Hector is here http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/

Connecting to the database

Connecting to the database is nice and easy, get a pool instance and borrow a client
CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
CassandraClient client = pool.borrowClient("xxx.yy.36.151", 9160);

remember to release the connection once you’re done with it.

pool.releaseClient(client);

Writing an entry

Before we can write anything to Cassandra we need to set the keyspace we are going to use. In this case we are going to use our Blog application keyspace BloggyApp:


Keyspace ks = client.getKeyspace("BloggyAppy");

Suppose we want to add a “record” (to borrow from RDBMS terms), in this case lets add an author record to Authors column family. First get a column path to the Authors column:

ColumnPath columnPath = new ColumnPath("Authors");

So what we want to do is add a number of “fields” (which are name value pairs) to our “record” Suppose our “record” is going to look like this:


Andy
Tel == 01555 XXXXX
Email == andy@blogspot.org
Address == Blogspot

“Andy” is going to be our Key and each of Tel:data, Email:data, Address:data columns in that key. So to add the Andy key with a email address:

String key = "Andy";
String columnName = "Email";
String value = "andy@blogspot.org";

columnPath.setColumn(columnName.getBytes());
ks.insert(key, columnPath, value.getBytes());

So, here we set the columnpath (email) and then add to the key (andy) this columnpath with a value. Note that the value is stored as an array of bytes. We can go on like this to set the telephone number:

columnName = "Tel";
value = "01555 XXXXX";
columnPath.setColumn(columnName.getBytes());
ks.insert(key, columnPath, value.getBytes());

If we want to add a new “record” (say for Joe) just change the key (key=”Joe”) and start adding “fields”. Note we haven’t defined how many fields a key has or what the fields are. They are added as needed and not all may be present. This is a major difference to a traditional RDBMS. One last thing, our bloggy app (as defined in Arin’s article needs a pubdate in a Blog Entry key. This needs to be stored as unixtime. We can do that like this:

columnName = "pubDate";
long now = System.currentTimeMillis();
Long lnow=new Long(now);
value = lnow.toString();
columnPath.setColumn(columnName.getBytes());
ks.insert(key, columnPath, value.getBytes());

The important thing is we convert the long now value to a string before inserting it into the key.

Next time, starting to get some of this info out of Cassandra

Thursday, July 29, 2010

A very simple 2 node cassandra cluster on windows XP

Today I’ve been setting up a tiny Cassandra cluster in our teaching lab. I’m (for my sins) running this on a couple of Windows XP boxes. This means that for now Cassandra need to be run from the command prompt and the machine left running. Setting up on windows is fine, just make sure your JAVA_HOME is set correctly before running. To get more than one machine talking to each other do the following.

Open the storage-conf.xml file and look for ListenAddress. Change this to be the IP address of the the machine you’re working on:

<ListenAddress>xxx.yyy.36.151</ListenAddress>
<!-- internal communications port -->
<StoragePort>7000</StoragePort>

Do this on both machines. Now look for the Seeds config entry. Change this so it lists both the IP’s of each machine.

<Seeds>
<Seed>xxx.yyy.36.151</Seed>
<Seed>xxx.yyy.36.150</Seed>
</Seeds>
I also changed the Number of replicas of the data to the number of machines and to be frank I’m not quite sure if I needed to.

<ReplicationFactor>2</ReplicationFactor>

One other thing, before you are tempted to start either machine change the ClusterName. I’ve found trying to change the cluster name after starting Cassandra can cause problems.

Tuesday, July 27, 2010

Reset vs Cancel in HTML forms

I’ve been in an interesting discussion today on Twiter on the use of the reset button in forms. The questioner asks if buttons should be [cancel][submit] or [submit][cancel]? My objection is that the cancel button should actually be [reset]. The questioner countered that all OS dialog boxes have a cancel button not a reset, well that be true, but when creating web pages we are not dealing with OS dialog boxes. The difference is simple, with a OS dialog box, you close the box when you hit cancel, when you hit a HTML reset button it clears the form.

Jakob Nielsen has a post dating from 2000 Reset and Cancel Buttons in which he argues that the reset button is bad and shouldn’t be used. His main problem is that most designers put the rest button next to the submit and so can be hit by mistake. He also argues that the reset button isn’t really needed, who needs to clear an entire form and start again? There is also a chance that having the reset button there will slow users down. Reset does seem unneeded for most cases provided the user can return each element of a form to it’s default state.

But what about an explicit Cancel button that closes the form and returns the user to a default page? This would be the equivalent of a OS dialog box so would typically be followed by a “Are you sure yes/no” dialog box which would need to repopulate the form if “no” was selected. In my opinion a cancel button is useful for:

1: Pop out forms , the cancel button just closes the form.
2: Multi form pages with the user filling in a lot of information. A confirm cancellation button is really important here.

So reset buttons should be used sparingly, does a user really need to clear the entire form ? Cancel buttons should be used to make sure the transaction/form is really cancelled and return the user to the default / last none form page.

Tuesday, July 20, 2010

A few Cassandra links

Introduction to Apache Cassandra:

http://www.nosqldatabases.com/main/2010/7/13/introduction-to-apache-cassandra.html

Cassandra: Principles and Application (pdf paper)

Cassandra: Principles and Application

A Quick Introduction to the Cassandra Data Model:

http://maxgrinev.com/2010/07/09/a-quick-introduction-to-the-cassandra-data-model/

Do You Really Need SQL to Do It All in Cassandra?

http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/

Update Idempotency: Why It is Important in Cassandra Applications

http://maxgrinev.com/2010/07/12/update-idempotency-why-it-is-important-in-cassandra-applications-2/

And of course the famous WTF is a SuperColumn? An Intro to the Cassandra Data Model

http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

A collection of excellent articles on using Java with Cassandra:

http://www.sodeso.nl/?p=80


Cassandra: Fact vs fiction

Distributed deletes in the Cassandra database

Please add more in the comments

How bad cache data in the DNS effected iPhones

This isn't really about web programming but it's the sort of annoying thing that might have an effect on any website your running in the craziest way.

Imagine the problem, all your clients can see your websites or so you think. Then you notice that if your iphone is connecting via 3g then your website is inaccessible. Everything works on wireless though. This happened to me recently and it wasn't just my iPhone. Other websites (including ones that are in the parent domain to mine) where all accessible. It was crazy !

So I downloaded an app iNetFactory that allowed ne to do nslookup, this showed that the site in question could not be resolved in the DNS. Other sites running off the same dns server where resolving just fine. This was puzzling.

To cut a long story short, after peering at the DNS configuration (its a windows dns server) I decided to look at the servers cache. To do this on a windows DNS box you'll need to open dnsmgmt and click on view/advanced. drilling down to my parent domain I noticed that the cache contained an entry for my domain (which it shouldn't). The entries in this where valid, but incorrect. I decided to nuke the cache (right click on Cached Lookups and choose clear cache). This cured the problem.

It's not pretty and I'm not sure how the bad entry got in there but I've reports from other iPhone users that the sites are now back and accessible. It's possible this was a case of cache poisoning or possibly a machine with an old entry (it did resemble an old entry) had managed to do it.

I'm keeping an eye on it !

Tuesday, July 6, 2010

Final step for handling put data, URL decoding

As we saw earlier we can get the put data from the body of the HTML request and decode it into name value pairs. However our values are URL encoded. That is spaces have been converted to "+" characters and others are encoded into %FF hex style. See:

URL encoding at Wikipedia

We need to decode this into plain text. Fortunately the standard java.net package has a URLDecode class that will do the job for us:

URLDecode man at Sun.com

This has two methods, the simpler one (taking only the string to decode as an argument) has been deprecated so we'll not use it. The second method takes the string to be decoded and a string representing the encoding method. This is usually (but not always) UTF-8. So our code to decode the PUT values is now:

URLDecoder dc = new URLDecoder();
System.out.println("String was "+dc.decode((String)hm.get("Software"),"UTF-8"));

Remember from last time the name value pair is stored in a hashmap (here hm). "Software" is the name of the field we are going to retrieve.

One last thing to do before sending this off for storing in a database. In order to avoid Cross Site Scripting attacks we should escape any html in the value field. This is to stop users putting text such as <script>alert("test")</script> into the input. We'll use the commons lang stringescapeutils package to deal with this:

String escape utils

Our code for dealing with the name value pairs now looks like:

String Software=org.apache.commons.lang.StringEscapeUtils.escapeHtml(
      (String)dc.decode((String)hm.get("Software"),"UTF-8"));

Decoding PUT data

As we saw yesterday, for a HTTP PUT command the data arrives in the body of the HTML content. A simple way to read that data we saw was:

InputStream is = request.getInputStream();
char ch;
for (int i=0; i < request.getContentLength();i++){

    ch=(char)is.read();

    System.out.print(ch);

}

which will just read the data and send it to stdout. However want we want to do is get the data and use it as if it had been sent over as standard parameters. If you look at the data output from the above code you'll see it is sent as name value pairs delimited by & . So if we our data is being sent from the jquery ajax call as follows:

data: { Module: $('#Module').val(), Software: $('#Software').val()}

This will be encoded as (for example)

Module=ac31004&Software=SQL+Server

Notice that spaces have been encoded as + characters.

We need to decode this, turning the input into name value pairs which can be sent to our database update code. There's probably a lot of ways to do this, some more efficient than others, but we'll look at one way using a hashmap.

In your servlet code create a global hashmap variable (remember to import the util class import java.util.HashMap;) :

private HashMap hm = new HashMap();

Now in our servlet init method add objects that are going to represent the name of input fields in the original html form:

hm.put("Module", "");
hm.put("Software", "");

In our put method read the contents of the request body into a Byte array:

InputStream is = request.getInputStream();
byte Buffer[]= new byte [request.getContentLength()];
is.read(Buffer);

We can now split this into name value pairs by string tokenising on the & character:

StringTokenizer st = new StringTokenizer (input,"&");

We can no read through all these pairs and String tokenise on the "=" character. We can then assume that the first of the pair is the name and the second the value. Using the hashmap we created earlier, we can look to see if the name is in the hashmap and if it is replace the value with the one we've just got from the name value pair. Doing this we restrict the input to only those fields we defined in the init method when we set up the hashmap. Here's the code:

StringTokenizer st = new StringTokenizer (input,"&");
while (st.hasMoreTokens ()) {
   String inputPair=st.nextToken ();
   StringTokenizer st2 = new StringTokenizer (inputPair,"=");

    // First token should be name of input field
   String name=st2.nextToken();
   String var =st2.nextToken();
    if (hm.containsKey(name)){
      hm.put(name, var);

   }


Finally to use these values we can just get them from the hashmap.

String Software=(String)hm.get("Software");

Monday, July 5, 2010

HTTP PUT, jquery and Java Servlets

If you are trying to create a RESTFULL interface then you need to implement the HTTP Put method to allow updates. Now , all browsers will not allow PUT (or DELETE) in a form method so the the easiest thing to do is use AJAX (xmlHttpRequest actually) to send over the data. Now you could handrole the xmlHttpRequest, but thats reinventing the wheel. Instead we can use JQUERY:

http://api.jquery.com/jQuery.ajax/

So a post can be down as follows (Module and Software are the ids of Input fields in our HTML), $("a") is attaching this to a "a href" statement:

$(document).ready(function() {
// do stuff when DOM is ready

$("a").click(function() {
$.ajax({
type: 'PUT',
url: "/Courses/Software",
processData : true,
data: { Module: $('#Module').val(), Software: $('#Software').val() } ,
error: function(data) {
$('.result').html(data);
alert('Error in put.'+data);
},
success: function(data) {
$('.result').html(data);
alert('Load was performed with '+data);
}
});
alert($('#Module').val()+" : "+ $('#Software').val());
});

});


The problem is how to handle this is the Java Servlet. You are probably aware that you can use a doPut(HttpServletRequest request, HttpServletResponse response) method in the servlet to handle the HTTP Post. The problem is how to get at the data. My first attempt was to just get the parameters:

System.out.println("Software:doPut"+request.getParameter("Module"));
System.out.println("Software:doPut"+request.getParameter("name"));

But that doesn't work. For PUT the data is in the body of the request. You can see this by looking at the content length:

System.out.println("Content length "+request.getContentLength());

So we need to read the body in our servlet (doPut) like this (simple example):

InputStream is = request.getInputStream();
char ch;
for (int i=0; i < request.getContentLength();i++){
ch=(char)is.read();
System.out.print(ch);
}

I'll leave decoding this into name, value pairs until next time