One of the things I’m interested in is using tiny Raspberry Pi computers for teaching database and network admin to Undergraduate and MSc students. In the first instance I’ve been looking at building a large cluster of these devices for to run a cluster of apache Cassandra database servers. I’m in no way expecting these to get any where near the performance of real servers or even VM installations but, for me at least, they give a feeling of working with real hardware. The first thing I’m doing is conducting stress tests with various configurations, but I’m limited by availability of the devices. I started out with a cluster of 3 and have just managed to add another node. The stress test is using the stress command Cassandra provides in the tools directory of a standard installation (some distributions missed the directory so you may need to get the source and build the stress tool yourself). After we’ve looked at the chart, I’ll look a little at the process of adding a new node to a Cassandra cluster. For the record the commands I used to stress the cluster are as follows:
Insert:
./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I DeflateCompressor
Read:
./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o read
For a 4 node test I added the new node into the list of hosts. Note also I’m using DeflateCompressor as I’ve not yet managed to get snappy compressor compiled for the Pi. I used a Mac book air to drive the stress test over a wifi connection to the cluster which is connected via a Netgear 10Meg switch which should handle the data rates form a Pi
Here then is a graph combining inserts and reads for 3 and 4 node clusters:
One thing I do want to note here, for both the 3 and 4 node clusters the insert performance drops suddenly towards the end of the run. I’m not sure why that happens. The clusters where in both case balanced with each node running 90% CPU. Here’s the ring information for the cluster arrangements (optained from the nodetool command ./nodetool -h 192.168.1.10 ring)
Address DC Rack Status State Load Effective-Owership Token
113427455640312821154458202477256070485
192.168.1.11 datacenter1 rack1 Up Normal 14.67 MB 33.33% 0
192.168.1.10 datacenter1 rack1 Up Normal 14.42 MB 33.33% 56713727820156410577229101238628035242
192.168.1.12 datacenter1 rack1 Up Normal 14.51 MB 33.33% 113427455640312821154458202477256070485
pi@raspberrypi:/home/space/apache-cassandra-1.1.0/bin$ ./nodetool -h 192.168.1.12 ring
Address DC Rack Status State Load Effective-Owership Token
127605887595351923798765477786913079296
192.168.1.11 datacenter1 rack1 Up Normal 11.24 MB 25.00% 0
192.168.1.10 datacenter1 rack1 Up Normal 11.24 MB 25.00% 42535295865117307932921825928971026432
192.168.1.12 datacenter1 rack1 Up Normal 11.38 MB 25.00% 85070591730234615865843651857942052864
192.168.1.13 datacenter1 rack1 Up Normal 11.1 MB 25.00% 127605887595351923798765477786913079296
Moving from 3 to 4 nodes.
Here’s the procedure I used to move from 3 to 4 nodes. Providing your cluster is already balanced with the initial_token correctly set in the Cassandra.yaml file you can add the new node with it’s correct key. Once it’s bootstrapped on each of the other nodes you can use nodetool move to change that nodes token, something like:sudo ./nodetool -h 192.168.1.10 move 42535295865117307932921825928971026432
Does this on each node that needs to be moved, so not the first node with a token of 0 and the new node you've just added with the correct initial token. After the node is moved you will need to run cleanup to delete any data that the node doesn’t need:
./nodetool -h 192.168.1.10 cleanup
There’s a simple python code you can use to calculate the keys (this version courtesy of a good friend on twitter)
import sys
if (len(sys.argv) > 1):
num = int(sys.argv[1])
else:
num = int(raw_input("How many nodes? :"))
for i in range(0,num):
print 'node %d: %d' % (i, (i*(2**127)/num))
I’m looking forward to going beyond 4 nodes soon !
Getting more memory on the Pi
The Pi is a little short on memory for this type of server. The situation isn’t helped by some of the memory being shared by the GPU, the default being 64M. You can move this down to 32 M by changing the start.elf file.Change to /boot on the pi
Copy start.elf to start.elf.old (sudo cp start.elf start.elf.old)
Copy arm224_start.elf to start.elf (sudo cp arm224_start.elf to start.elf)
Reboot. You can use the top command to see the performance of your Pi and how much memory it has. See http://elinux.org/RPi_Advanced_Setup for more information on the elf files available and how much memory the GPU uses for each.
A Pic of the setup
Just for completeness, here's a pic of 4 Raspberry Pi running apache cassandra
Hi, thnks a lot for the information, What tool did you use to build the graph?
ReplyDeleteI used EXCEL, nice and simple !
ReplyDeleteHi Andy,
ReplyDeleteThis is a great article and thanks.
I have a project that needs some Cassandra skills and thus I'm looking for someone to configure and admin a cassandra cluster for me. Can you recommend anyone please?
Andi
(a n d i AT m c b u r n i e DOT c o m)