Monday, February 21, 2011

Cassandra’s data model as records and lists

I have to admit I’ve never really been happy with Cassandra’s data model, or to be more precisely, I’ve never really been with my understanding of the model. However I’ve realized that if we think of two use cases for column families then things may become a bit clearer. For me, Column families can be used in one of two ways, either as a record or an ordered list.

Columns used as a record
If we place name value pairs under the column key that contain different attributes then we can consider the columns as classic database record. So if we are wanting to store the details of a user then the columns might be:

Name (key)

  • Email:
  • Twiter: TwitterUser
  • Phone: 01 000 345678

In this schema the order of the columns is not important because the names are not related. However unlike a relational database, there is no definition of the “fields” in the record, we define them at runtime in the application. This does give us the flexibility to add new fields providing our application can handle missing “fields”.

Columns used as a List
If each of the name value pairs are the same attributes then we can consider this as an ordered list . In this use case the ordering of the columns is important and the ordering type needs to be carefully thought out. For example if we want to store messages from a user, and we want to be able to get the most recent, then we will store them as:

Author (key)

  • Timeuuid: Message
  • Timeuuid: Message
  • Timeuuid: Message

This ordered list can be thought of as an index of records. The records would be stored in another column family.

Supercolumns as a list of records
Even better, we can use supercolumns to create a combination of lists and records. Normally we would make the supercolumns the ordered list and the columns the record. In our messaging system, we want to get the latest messages from a user:

Author (Key)

  • Timeuuid: (Supercolumn name)
    • Message: Message Text
    • Time: Time of message
    • Picture: Binary picture data
  • Timeuuid: (Supercolumn name)
    • Message: Message Text
    • Time: Time of message
    • Picture: Binary picture data

The supercolumns are ordered by time, the columns under it are not ordered.

As ever I look forward to comments about this post.

Wednesday, February 16, 2011

Dundee Hackday 2011 begins.

So yesterday we kicked off this years Dundee Hackday with YDN and Mozilla. The timetable for this year is:

  • Tuesday 15th, Video conference with Murray Rowan and Steve Marshall from YDN! and Christian Heilmann from Mozilla
  • Wednesday 16th Start to assemble into teams and get your ideas together.
  • Tuesday March 1st Post your groups and ideas to Entry form
  • We will will then take a look at your idea and give you feedback.
  • Start build your Hack.
  • March 25: Show your Hack in the QMB Street to University staff, Yahoo! developers and Christian Heilmann
  • March 25 The Best Hacks are announced, prizes are given. All retire to the student union for a well earned rest!

Christian Heilmann has published his thoughts on yesterday's video conference Introducing Mozilla technology and ideas to students for a hack day

TwitterTag #uhackdundee11
FlickrTag uhackdundee11