Adventures in Web Development: June 2012

One of the things I’m interested in is using tiny Raspberry Pi computers for teaching database and network admin to Undergraduate and MSc students. In the first instance I’ve been looking at building a large cluster of these devices for to run a cluster of apache Cassandra database servers. I’m in no way expecting these to get any where near the performance of real servers or even VM installations but, for me at least, they give a feeling of working with real hardware. The first thing I’m doing is conducting stress tests with various configurations, but I’m limited by availability of the devices. I started out with a cluster of 3 and have just managed to add another node. The stress test is using the stress command Cassandra provides in the tools directory of a standard installation (some distributions missed the directory so you may need to get the source and build the stress tool yourself). After we’ve looked at the chart, I’ll look a little at the process of adding a new node to a Cassandra cluster. For the record the commands I used to stress the cluster are as follows:

Insert:

./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I DeflateCompressor

Read:

./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o read

For a 4 node test I added the new node into the list of hosts. Note also I’m using DeflateCompressor as I’ve not yet managed to get snappy compressor compiled for the Pi. I used a Mac book air to drive the stress test over a wifi connection to the cluster which is connected via a Netgear 10Meg switch which should handle the data rates form a Pi

Here then is a graph combining inserts and reads for 3 and 4 node clusters:

One thing I do want to note here, for both the 3 and 4 node clusters the insert performance drops suddenly towards the end of the run. I’m not sure why that happens. The clusters where in both case balanced with each node running 90% CPU. Here’s the ring information for the cluster arrangements (optained from the nodetool command ./nodetool -h 192.168.1.10 ring)

Address DC Rack Status State Load Effective-Owership Token
113427455640312821154458202477256070485
192.168.1.11 datacenter1 rack1 Up Normal 14.67 MB 33.33% 0
192.168.1.10 datacenter1 rack1 Up Normal 14.42 MB 33.33% 56713727820156410577229101238628035242
192.168.1.12 datacenter1 rack1 Up Normal 14.51 MB 33.33% 113427455640312821154458202477256070485

pi@raspberrypi:/home/space/apache-cassandra-1.1.0/bin$ ./nodetool -h 192.168.1.12 ring
Address DC Rack Status State Load Effective-Owership Token
127605887595351923798765477786913079296
192.168.1.11 datacenter1 rack1 Up Normal 11.24 MB 25.00% 0
192.168.1.10 datacenter1 rack1 Up Normal 11.24 MB 25.00% 42535295865117307932921825928971026432
192.168.1.12 datacenter1 rack1 Up Normal 11.38 MB 25.00% 85070591730234615865843651857942052864
192.168.1.13 datacenter1 rack1 Up Normal 11.1 MB 25.00% 127605887595351923798765477786913079296

Moving from 3 to 4 nodes.

Here’s the procedure I used to move from 3 to 4 nodes. Providing your cluster is already balanced with the initial_token correctly set in the Cassandra.yaml file you can add the new node with it’s correct key. Once it’s bootstrapped on each of the other nodes you can use nodetool move to change that nodes token, something like:

sudo ./nodetool -h 192.168.1.10 move 42535295865117307932921825928971026432

Does this on each node that needs to be moved, so not the first node with a token of 0 and the new node you've just added with the correct initial token. After the node is moved you will need to run cleanup to delete any data that the node doesn’t need:

./nodetool -h 192.168.1.10 cleanup

There’s a simple python code you can use to calculate the keys (this version courtesy of a good friend on twitter)

import sys
if (len(sys.argv) > 1):
num = int(sys.argv[1])
else:
num = int(raw_input("How many nodes? :"))
for i in range(0,num):
print 'node %d: %d' % (i, (i*(2**127)/num))

I’m looking forward to going beyond 4 nodes soon !

Getting more memory on the Pi

The Pi is a little short on memory for this type of server. The situation isn’t helped by some of the memory being shared by the GPU, the default being 64M. You can move this down to 32 M by changing the start.elf file.

Change to /boot on the pi
Copy start.elf to start.elf.old (sudo cp start.elf start.elf.old)
Copy arm224_start.elf to start.elf (sudo cp arm224_start.elf to start.elf)

Reboot. You can use the top command to see the performance of your Pi and how much memory it has. See http://elinux.org/RPi_Advanced_Setup for more information on the elf files available and how much memory the GPU uses for each.

A Pic of the setup
Just for completeness, here's a pic of 4 Raspberry Pi running apache cassandra

There’s quite rightly been a lot of talk about the Raspberry Pi, and quite some discussion on what it’s good for. Whether it will succeed in it’s mission to create a new army of programmers is any one’s guess at this point, but for me it’s already succeeding. No, not for programming but for teaching computer administration. I’ve got a small cluster of Pi (three, 2 borrowed at the moment) and I’ve been having a lot of fun configuring apache Cassandra on them. So for less than £100 I’ve got a Linux cluster I can blow away at any moment and start again. I can reconfigure the Cassandra settings, start the cluster again and run stress tests on the thing.

I take backups of the SD cards once in a while so I can go back to previous configs at any point which is quite easy. On a Mac (or Linux) just put the card in a card reader and use the following command:

dd bs=1m if=/dev/rdisk1 of=disk.img

where rdisk1 is the USB port you will have identified when creating your first image and disk.img is the file you want to create.

As a teacher, having cheap hardware around like this is going to allow us to give students machines to play on, to set up, muck around with little or no chance of damage. Our undergraduate networking course is going to get a whole lot more hands on ! Sure you could do it with VMs but that just won’t feel as real as plugging a cluster together. Once we have our data Science MSc up and running (News here ) I’m hoping that we can give the students access to a 2 data center cluster of 20 to 30 machines, all for around 100. Doing that with VMs is possible but a lot work (although you can of course automate it) and would probably cost a lot more to setup !

Looks like the Pi can do a lot more than teach programming.

Adventures in Web Development

Saturday, June 16, 2012

3 Node / 4 Node Cassandra Stress test on a Raspberry Pi cluster

Moving from 3 to 4 nodes.

Getting more memory on the Pi

Thursday, June 7, 2012

Raspberry Pi, not just for teaching programming

About Me

Blog Archive