<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Geek on the Loose &#187; Programming</title>
	<atom:link href="http://www.geekontheloose.com/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.geekontheloose.com</link>
	<description>Just another girl-geek weblog</description>
	<lastBuildDate>Sat, 28 May 2011 23:36:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Cassandra 0.7 and Hector for Noobs, Part 2 Amazon</title>
		<link>http://www.geekontheloose.com/programming/java/cassandra-hector-4-noobs-part2-amazon/</link>
		<comments>http://www.geekontheloose.com/programming/java/cassandra-hector-4-noobs-part2-amazon/#comments</comments>
		<pubDate>Sat, 22 Jan 2011 21:31:42 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=295</guid>
		<description><![CDATA[This is part two of the series about bringing up a 3 node Cassandra cluster running on Amazon EC2  instances with a modest Java program using the Hector client and the  cluster. The first part was an overview covering a variety of topics, including bringing up Cassandra for the first time, the Cassandra [...]]]></description>
			<content:encoded><![CDATA[<p>This is part two of the series about bringing up a 3 node Cassandra cluster running on Amazon EC2  instances with a modest Java program using the Hector client and the  cluster. The <a title="Cassandra and Hector Part 1" href="http://www.geekontheloose.com/programming/java/cassandra-0-7-and-hector-for-noobs/">first part was an overview</a> covering a variety of topics, including bringing up Cassandra for the first time, the Cassandra command-line interface, and creating a first keyspace and column family and putting some data into it. This second post will delve into the details of bringing up your first Cassandra cluster on Amazon EC2.</p>
<p>First caveat - this post is not advice for a production Cassandra cluster, it is just my experience playing around with Cassandra and Amazon EC2 to create a small 3-node cluster to tinker around with for fun. It gives me an opportunity to record my experiences and the traps I fell into and how you might avoid or work around them. Second caveat - just to be clear, Amazon EC2 is not free, it is a paid service and you are responsible for any costs you may incur. You will incur costs if you try to replicate my steps. Third caveat - I am a noob at Cassandra and Hector, so please fact-check the material I'm presenting here. I will give links to point you to authoritative information where ever possible.</p>
<h2>Documentation for Cassandra</h2>
<p>Since my last post on Cassandra 0.7, new and very helpful documentation has cropped up that you might want to peruse:</p>
<ul>
<li><a title="Cassandra documentation" rel="nofollow" href="http://www.datastax.com/docs/0.7/index">Cassandra documentation at Datastax</a>, formerly Riptano</li>
<li><a title="Cassandra wiki documentation" rel="nofollow" href="http://wiki.apache.org/cassandra/FrontPage">Cassandra wiki</a></li>
</ul>
<h2>Documentation for Amazon EC2</h2>
<p>Here are the links to the main documentation for Amazon EC2 that I will reference throughout this article:</p>
<ul>
<li><a title="Getting Started with Amazon EC2" rel="nofollow" href="http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/">Getting Started with EC2 Guide</a> - a tutorial to get your first EC2 instance up and running</li>
<li><a title="Amazon EC2 Guide" rel="nofollow" href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/">The Amazon EC2 Guide</a> - a comprehensive guide to EC2 that is loaded with information and easy to use</li>
</ul>
<h2>Basic Steps for an Experimental Amazon EC2 Cassandra Cluster</h2>
<p>The basic steps I'll cover in this post are as follows:</p>
<ol>
<li>Set up your Amazon account, get familiar with EC2 and EBS</li>
<li>Choose your base AMI</li>
<li>Set up your base system</li>
<li>Storing your base system as an EBS-backed AMI</li>
<li>Configuring and launching instances</li>
<li>Testing your Cassandra cluster and some basic interactions with it</li>
<li>STOP your instances (you don't want to forget this step)</li>
</ol>
<p>This is the method I've devised for my own personal investigation of Amazon and EC2, entirely for learning purposes. As I've worked my way through this process, I've discovered that there are many ways to achieve this same result. I link to some of them below, but keep in mind that there are lots of ways to do this, and mine is definitely not the most refined.</p>
<h2>Amazon EC2 Setup</h2>
<p>If you haven't already, head over to <a title="Amazon EC2 account sign-up" rel="nofollow" href="http://aws.amazon.com/ec2/">aws.amazon.com</a> to set up your Amazon EC2 account and be sure to <a title="Amazon EC2 pricing" rel="nofollow" href="http://aws.amazon.com/ec2/pricing/">check out the pricing</a>. The work your way through the entire <a title="Getting Started with Amazon EC2" rel="nofollow" href="http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/"><em>Getting Started with EC2 guide</em></a>, including bringing up your first instance. At the point where you're given a choice between Linux and Windows, this post covers the Linux option, but of course you should choose whatever you prefer.</p>
<p>Now you should be up-to-speed on launching an Amazon EC2 instance from a provided AMI. EBS is Elastic Block Storage and we will be using this storage to store our own AMI to use to launch our Cassandra instances from. You do not need to set up an EBS account or create special keys for EBS.</p>
<p>Through the remainder of this post, you can find more detail on all of the Amazon topics covered by referencing <a title="Amazon EC2 Guide" rel="nofollow" href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/">the EC2 guide</a>. It's loaded with information. You should use that guide as your main source of information and anything I write in the remainder of this post should be taken with a grain of salt and fact-checked against the guide.</p>
<h2>Choose Your Base AMI</h2>
<p>Now that you have an idea of the AMIs available by default from Amazon after running through the tutorial, it's time to see the plethora of AMIs that are public and choose one to start with. From within your management console you can view instances by clicking the "Launch Instance" button and going to the "Community AMIs" tab. From there you can see thousands of AMIs. As a word of caution, you should do research on any AMI you are thinking of launching to be sure it's safe and reliable and not pre-infected with worms, root kits, etc.</p>
<p>For my own use, I like and am familiar with Ubuntu, so I'm using an Ubuntu AMI as my base. I chose my AMI after perusing the Lucid Lynx list, because Lucid is the most recent long-term support release of Ubuntu. You can see the <a title="Ubuntu Lucid Lynx EC2 AMIs" rel="nofollow" href="http://uec-images.ubuntu.com/releases/10.04/release/">Ubuntu AMI list here</a>. I chose a Small instance from the list since I wanted to keep the costs down while I was just experimenting.</p>
<p>AMIs come in many flavors: Standard, Micro, High-CPU, High-Memory, Cluster Compute and Cluster GPU. Within each flavor, are many specific offerings at varying price levels. I'm using a Standard - Small, but you should note that Small is just barely sufficient for Cassandra and doesn't have a 64-bit option. Standard has these characteristics:</p>
<ul>
<li>1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)</li>
<li>Memory: 1.7 GB</li>
<li>160 GB instance storage (150 GB plus 10 GB root partition)</li>
<li>Platform: 32-bit</li>
<li>Moderate I/O</li>
</ul>
<p>The <a title="Cassandra on EC2" rel="nofollow" href="http://wiki.apache.org/cassandra/CassandraHardware">Cassandra documentation says this about EC2</a>:</p>
<blockquote><p>On EC2, the best practice is to use L or XL instances with local  storage.  I/O performance is proportionately much worse on S and M  sizes, and EBS essentially doubles your dependence on the  already-overcrowded EC2 network...</p></blockquote>
<p>For more thoughts on EC2 with Cassandra, <a title="Presentation on Cassandra on Amazon EC2" rel="nofollow" href="http://www.slideshare.net/davegardnerisme/running-cassandra-on-amazon-ec2">I found this slideshare</a>.</p>
<p>Armed with this information, give some thought to what interests you, so out and do a little research, and decide on an AMI to use and take note of it's ID. You will want to choose an EBS-backed AMI rather than an S3-backed AMI, since this post goes down the EBS path. If you prefer S3-backed AMIs, then you'll need to make your own adjustments as you follow along with this post. To simplify matters, rather than hunting down the perfect AMI at this point, you can always base your system off of one of the AMIs in the tutorial you followed in the first step.</p>
<h2>Set Up Your Base System</h2>
<p>Now that you've chosen an AMI and know it's ID, you can launch it in the EC2 Management Console. Note that it appears it's possible to do everything via command line that you can do in the console, but we'll stick to the console for now. I'm not going to cover the details of launching the instance from the AMI ID, just click the "Launch Instance" button and then follow the wizard. If you have questions, the EC2 guide should be able to answer them.</p>
<p>Alternatively, if you want a more streamlined approach, have a look at this <a title="Cassandra EC2 on Debian tutorial" rel="nofollow" href="http://wiki.apache.org/cassandra/CloudConfig">tutorial on the Cassandra wiki</a> (and <a title="Cassandra on Ubuntu and EC2" rel="nofollow" href="http://www.coreyhulen.org/?p=277">here's yet another approach</a>, though older). My approach is not nearly as streamlined, but my goal was more in the way of educating myself about each step, whereas the tutorial seems to be more oriented toward getting you up and running as quickly as possible. Choose the approach works best for you. Please note that Debian packages are not always the most recent version of software and Debian has a tendency to store files all over the place (also true for Ubuntu), so you may need to do some hunting to locate your configuration files.</p>
<p>Assuming you're sticking with this guide and not the "quick start", we'll continue on with the experiment. After your instance is launched, then you will want to do the following:</p>
<ul>
<li>Log onto your instance and make sure it's what you wanted</li>
<li>Run any OS updates that might be needed. In Ubuntu you would do this using apt-get.</li>
<li>Install Java 6</li>
<li>Download and install Cassandra</li>
<li>Configure Cassandra - basic, not multi-node</li>
<li>Start Cassandra to verify it works</li>
<li>Clean up Cassandra files!</li>
</ul>
<p>You can log onto a Linux instance using ssh, using a command similar to this:</p>
<pre>ssh -i /home/myname/.gnupg/mynameEC2key.pem ubuntu@ec2-123-23-123-123.compute-1.amazonaws.com</pre>
<p>Where "ec2-123-23-123-123.compute-1.amazonaws.com" is the host name given to your instance after launch. You can view the hostname and IP address in the management console. "Ubuntu" is the default user for the Ubuntu AMI. The "-i /home/myname/.gnupg/mynameEC2key.pem" specifies the key you chose when launching your instance.</p>
<p>Log on, take a look around, and verify that the instance you've launched is what you were expecting from the AMI you chose.</p>
<p>Next follow whatever procedure you would normally follow to be sure you have an up-to-date system. In Ubuntu and Debian Linux systems, you would run commands such as this:</p>
<pre>sudo apt-get upgrade
sudo apt-get update</pre>
<p>Next, install Java 6, if  it's not already there, using your usual mechanism for installing Java on your chosen system. I would recommend not trying to use OpenJDK or  GCJ. In Ubuntu, you would use a command like "sudo apt-get install sun-java6-jdk".</p>
<p><a title="Download Cassandra" rel="nofollow" href="http://cassandra.apache.org/download/">Load the Cassandra release</a> onto your instance, using whatever method you prefer, and choose a location to install it. Be sure to verify your download using the PGP, MD5 or SHA1 key. Then extract the tar.gz file.</p>
<p>The IP addresses you should use will come from the instance info for each instance you will launch later, so for this configuration, set Cassandra up as a single-node cluster. Please refer to Datastax instructions for <a title="Setting up a single-node Cassandra cluster" rel="nofollow" href="http://www.datastax.com/docs/0.7/getting_started/index#setting-up-a-single-node-cluster">setting up a single-node Cassandra cluster</a>, making sure to set the token to 0.</p>
<ul>
<li>Note that <a title="MX4J project" rel="nofollow" href="http://mx4j.sourceforge.net/">mx4j-tools.jar</a> does not come with your Cassandra download, so download that separately and store it in the Cassandra lib folder. You will want it!</li>
</ul>
<p>What we are doing is starting Cassandra up and verifying that it works, but then we will clean it up afterward so that we can launch multiple instances in a clean state. This is important because Cassandra stores information about nodes when it first starts up, and since we want to have a clean AMI that we can launch repeatedly, we don't want this data hanging around, see <a title="Node data is cached with Cassandra" rel="nofollow" href="http://wiki.apache.org/cassandra/FAQ#cloned">this discussion</a>.</p>
<p>At this point you can use the Cassandra client to try creating a keyspace, to verify that it works. Since you're tailing the Cassandra logs in your first ssh connection, make a new ssh connection to do this.</p>
<pre>bin/cassandra-cli -host localhost -port 9160
create keyspace Keyspace1 with replication_factor = 1 and placement_strategy = org.apache.cassandra.locator.RackUnawareStrategy;</pre>
<p>View the log messages in your first window to observe that the keyspace is created.</p>
<p>Now, stop Cassandra and clean up the files:</p>
<pre>sudo rm -rf /var/lib/cassandra/data/
sudo rm -rf /var/lib/cassandra/commitlog/
sudo rm -rf/var/lib/cassandra/saved_caches/</pre>
<p>Now it will once again be as if Cassandra had never been started on your system. This is now your functional base instance.</p>
<h2>Store EBS-Backed AMI</h2>
<p>Now we're ready to store this baby and then it will be available to launch instances that will be ready to run with a little bit of configuration. Go back to your Amazon EC2 management console and see that below instances there is Elastic Block Store section. My current understanding is that when we create an EBS-backed AMI, it will create a snapshot, and when we launch each instance from that AMI snapshot, then a volume will be created assigned to that instance. I'm new to this, so I recommend reading the guide yourself and following the instructions, see these sections:</p>
<ul>
<li>AWS Documentation » Amazon EC2 » User Guide » Using Amazon EC2 » Using AMIs » Creating Your Own AMIs » Creating Amazon EBS-Backed AMIs</li>
<li>AWS Documentation » Amazon EC2 » User Guide » Using Amazon EC2 » Using Amazon EBS-Backed AMIs and Instances</li>
</ul>
<h2>Configuring and Launching Instances</h2>
<p>Now that you have an EBS-backed AMI with Java and Cassandra installed on it, we will launch three instances. The first one will be the seed node.</p>
<p>In the management console, in the Instance interface, click the Launch Instance button. Go to the My AMIs tab and select your new AMI and follow the wizard. Some helpful notes:</p>
<ul>
<li>Be sure to give a tag, such as name=Cassandra for each instance, so they will be easily identifiable. For the seed note, you could have name=CassandraSeed to easily find the seed.</li>
<li>Be sure to configure the same security group to all three instances. After you launch an instance you cannot change which security group it belongs to, but you can change the details of the security group. If you want them to all be able to communicate with each other using the internal IP addresses (cheapest option), then having the same security group is important.</li>
<li>Be sure to launch then in the same availability zone, such as us-east-1a</li>
</ul>
<p>Launch 3 nodes, one as the seed and two as non-seed nodes.</p>
<p>Now connect to each node using ssh to configure Cassandra. For configuring Cassandra as a multi-node cluster, I want to point you to the Datastax configuration page for <a title="Setting up a multi-node Cassandra cluster" rel="nofollow" href="http://www.datastax.com/docs/0.7/getting_started/index#setting-up-a-multi-node-cluster">setting up a multi-node Cassandra cluster</a> and add a few extra comments. See also <a title="Cassandra cloud configuration" rel="nofollow" href="http://wiki.apache.org/cassandra/CloudConfig">this wiki page</a>.</p>
<p>Configure the IP addresses and seeds as in the Datastax instructions, using the management console to get your internal IP addresses, and being careful to be consistent about which one you chose to be the seed.</p>
<p>The token for the seed should be 0, the token for the other nodes is calculated using this formula: i * (2**127 / N) for i = 0 .. N-1, where i is the node number starting from 0, and N is the total number of nodes (or 3 in this case). Using that formula I have these values calculated:</p>
<pre>token node1 (seed) = 0
token node2 = 56713727820156410577229101238628035243
token node3 = 113427455640312821154458202477256070485
the value of 2**127 is 170141183460469231731687303715884105728</pre>
<p>You can read more about the <a title="Token configuration for Cassandra" rel="nofollow" href="http://wiki.apache.org/cassandra/Operations">tokens at the Cassandra wiki</a>.</p>
<p>Now you're ready to launch your nodes. Launch the seed node first, using the familiar command that allows you to tail the log: bin/cassandra -f</p>
<h2>Testing the Cassandra Cluster</h2>
<p>Now open a new ssh session to one of the three nodes that you've launched and change directory to your Cassandra install.</p>
<p>You can use the Nodetool to verify that your nodes are all working together (this is where that jmx-tools.jar file starts to come in handy).</p>
<pre>ubuntu@domU-123-123-123:/opt/apache-cassandra-0.7.0$ bin/nodetool -host localhose ring
Address         Status State   Load            Owns    Token
113427455640312821154458202477256070485
10.207.6.65     Up     Normal  10.59 KB        33.33%  0
10.207.3.113    Up     Normal  10.43 KB        33.33%  56713727820156410577229101238628035243
10.214.47.207   Up     Normal  10.63 KB        33.33%  113427455640312821154458202477256070485</pre>
<p>I've had some problems with nodetool that I'll list here. I don't know is I have a buggy version of it, or am missing some crucial bit of information as to how to use and configure it. I think the problems are in some way connected to the way EC2 instances are configured.</p>
<ul>
<li>I've had problems getting nodetool to work using localhost as the hostname and had to resort to an IP address for one of the nodes, and not the node I am ssh'ed into</li>
<li>I've had problems with the nodetool not working after I shut the instances down and then restarted them later</li>
</ul>
<p>You can now use the cassandra-cli to create a keyspace and column family and observe the log messages as you do that. The command for the keyspace is above.</p>
<h2>Stop Your Instances</h2>
<p>Be sure to stop your instances when you are done to stop the clock on charges.</p>
<h2>Issues</h2>
<ul>
<li>When you stop your instances, the next time you start them up again, you will need to reconfigure all of the IP addresses. The workaround for this is to pay for static IPs, which is out of the scope of this article.</li>
<li>Don't forget to stop your instances, since you pay for the time when they are running.</li>
</ul>
<h2>Conclusion</h2>
<p>This was the tedious part - getting the infrastructure ready. In the next post(s), I'll start with Hector and working with <a title="Define Cassandra schema using Hector" rel="nofollow" href="https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/hector/api/ApiV2SystemTest.java">the Hector example code to define my schema programatically</a>, and then the real fun will begin. I plan to eventually write a web application and Android app that are going to use this simple Cassandra cluster.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/java/cassandra-hector-4-noobs-part2-amazon/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cassandra 0.7 and Hector for Noobs</title>
		<link>http://www.geekontheloose.com/programming/java/cassandra-0-7-and-hector-for-noobs/</link>
		<comments>http://www.geekontheloose.com/programming/java/cassandra-0-7-and-hector-for-noobs/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 20:55:50 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=250</guid>
		<description><![CDATA[I've been fiddling around with Cassandra 0.7 Beta2 and the Hector Java client (because I must be a closet masochist messing with a beta [update Cassandra 0.7 is no longer in beta, so please give it a try!]). Since documentation for these is seriously lacking [but getting better], I decided to write up my discoveries [...]]]></description>
			<content:encoded><![CDATA[<p>I've been fiddling around with Cassandra 0.7 Beta2 and the Hector Java client (because I must be a closet masochist messing with a beta [update Cassandra 0.7 is no longer in beta, so please give it a try!]). Since documentation for these is seriously lacking [but getting better], I decided to write up my discoveries and observations in the hopes of helping out other noobs like myself. My ultimate goal, by the end of this post, is to bring up a 3 instance cluster of Cassandra nodes running on Amazon EC2 instances and have a modest Java program using the Hector client and the cluster.</p>
<p>Before you read any further, I want to stress that the Cassandra documentation has very minimal detail and the same goes for Hector, and as of yet, I've found no online tutorials at other web sites for version 0.7. This adventure is not for the faint of heart!</p>
<p>If you've decided to plunge ahead anyway, then you should first study the following documents very carefully, because going forward, I'm assuming you're not a Java noob and you're up-to-speed with at least this much introductory material:</p>
<ul>
<li>Cassandra Thrift API for 0.7: <a title="Cassandra Thrift API for 0.7" rel="nofollow" href="http://wiki.apache.org/cassandra/API07">http://wiki.apache.org/cassandra/API07</a></li>
<li>Cassandra data model: <a title="Cassandra data model" href="http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model">http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model</a></li>
<li>Hector documentation via Riptano [update Riptano changed to DataStax, so URLs are all updated now]: <a title="Hector client for Cassandra documentation" href="http://www.datastax.com/sites/default/files/hector-v2-client-doc.pdf">http://www.datastax.com/sites/default/files/hector-v2-client-doc.pdf</a> [pdf] and also take a look at this blog post: <a title="Hector API v2, Java Cassandra client " href="http://prettyprint.me/2010/08/06/hector-api-v2/">http://prettyprint.me/2010/08/06/hector-api-v2/</a></li>
<li>Hector Java Cassandra client code: <a title="Hector Java Cassandra client" href="http://github.com/rantav/hector">http://github.com/rantav/hector</a></li>
<li>Amazon EC2 for our Cassandra cluster in the cloud: <a title="Amazon EC2 for our Cassandra Cloud Cluster" rel="nofollow" href="http://aws.amazon.com/ec2/">http://aws.amazon.com/ec2/</a></li>
</ul>
<p>If you find newer and/or better documents online, please list them in the comments. Thanks!</p>
<p>After you've gotten up to speed with those, then the fun begins!</p>
<p>Download and install Cassandra 0.7 Beta2: <a title="Download and install Cassandra 0.7 Beta2" rel="nofollow" href="http://cassandra.apache.org/download/">apache-cassandra-0.7.0-beta2-bin.tar.gz</a> and run it as a single node on your localhost if possible. The code examples we'll try first make the assumption you're running on localhost and using the default port of 9160.</p>
<p>I recommend spending a little time in the cassandra-cli command line interface tool, experimenting with the commands and becoming a little more familiar with Cassandra. The help command inside the tool is your best bet at the moment for discovering the commands you can try. The cassandra.yaml configuration file gives you the name of the default keyspace, Keyspace1, and the column families to try querying. (When I did this, I found my queries didn't seem to be working as I'd expected, but I stubbornly moved along to the next step anyway.)</p>
<p>After getting Cassandra running, the first thing we can try is to get <a title="zznate Hector Cassandra client example Java code" href="http://github.com/zznate/hector-examples">zznate's Hector example code</a> from github and have a go at making those run. Either download the zip file or "git clone" the repository.</p>
<p>[Note... I use Eclipse and have the <a title="Maven 2 Eclipse plugin, m2eclipse" rel="nofollow" href="http://m2eclipse.sonatype.org/installing-m2eclipse.html">Maven 2 m2eclipse plugin</a> installed, and am developing on Ubuntu Lucid Lynx. I have 4 GB of memory with a dual core CPU, running on a notebook computer. You may choose to use whatever hardware, tools and OS you prefer, but my observations are based on this environment. Cassandra wants to use a lot of memory, so please take that into account when configuring your development environment.]</p>
<p>I imported the mavenized project, hector-examples, into Eclipse. Because I'm using Maven and <a title="Maven repository for Hector at Riptano" href="http://mvn.riptano.com/content/repositories/riptano/me/prettyprint/hector/">Riptano has graciously provided a maven repository</a> for Hector [update: as of 1/11/2011 Hector is on Maven Central repository and the Riptano repository is deprecated], lots of magic happens at this point. Once Maven finished doing its thing, I immediately found 2 issues that needed to be resolved:</p>
<ol>
<li>I needed to update the hector dependency from 0.7.0-17 to 0.7.0-18 in pom.xml</li>
<li>The DeleteBatchMutate class didn't compile due to using org.apache.cassandra.thrift.Clock, which has been removed from Thrift, so I needed to change the code to use long instead of Clock.</li>
</ol>
<p>Hopefully, by the time you read this, those issues will already be fixed.</p>
<p>Next, I started trying to get all of the examples to run and then spend some time modifying them to learn more about how Cassandra and the client work. I quickly found that the examples did not run due to no keyspace, column families or super column families being configured. How did I determine that? After observing the error messages when running the examples, I found the PID of Cassandra and then ran jconsole and inspected the MBeans and saw I had nothing configured.</p>
<p>[Note: Cassandra does not come with the mx4j-tools.jar already included in the lib folder, and Cassandra happily notifies you of that when you start it up, so you'll have to go and download that jar file yourself and drop it in the lib folder to make use of jconsole.]</p>
<p>My understanding was that everything in the cassandra.yaml configuration file should have  been created the very first time my localhost Cassandra node was run,  but any changes thereafter would need to be made via Thrift or JMX. [update: that was an incorrect assumption, <a title="No default keyspaces in Cassandra 0.7" rel="nofollow" href="http://wiki.apache.org/cassandra/FAQ#no_keyspaces">the default keyspace will not be created as of Cassandra 0.7</a>] So, I thought I should have seen my column families and super column families, but they weren't there. I don't know if I'm just misunderstanding or I missed a step somewhere. One possible explanation is that  before Beta2 of Cassandra was released, I had installed a nightly build (post beta1)  that matched up with a Hector build, and that nightly build might have had  the initialization step broken. I didn't delete everything to start with a clean slate before upgrading  to the 0.7 Beta2 version. I don't know if starting from a clean beta2 would have helped or not, but I suspect it would not have made a difference.</p>
<p>Anyway, I viewed this as an opportunity to try my hand at creating my own keyspace, column families and super  column families. If you happen to already have these  installed, then you're in good shape for running the example code, but sooner or later you're going to need to learn how to create these yourself, so let's walk through that now.</p>
<p>Start up cassandra-cli and configure the keyspace:</p>
<pre>create keyspace Keyspace1 with replication_factor = 1</pre>
<p>For a single node on your localhost, replication_factor has to be 1.</p>
<p>[Note: the documentation says to include the placement strategy in the command, such as "placement_strategy = org.apache.cassandra.locator.RackUnawareStrategy", but that doesn't work and the cli expects an integer. I haven't yet found a mapping of placement strategies to integers, but am still looking.]</p>
<p>Next configure the 2 column families following the <a title="rudimentary Cassandra CLI insructions" href="http://wiki.apache.org/cassandra/LiveSchemaUpdates">rudimentary instructions given on this page of the Cassandra wiki</a>.</p>
<pre>[default@unknown] use Keyspace1
Authenticated to keyspace: Keyspace1
[default@Keyspace1] create column family Standard1 with column_type = 'Standard' and comparator = 'BytesType'
922a9664-bb01-11df-a919-e700f669bcfc
[default@Keyspace1] create column family Standard2 with column_type = 'Standard' and comparator = 'UTF8Type' and rows_cached = 10000
99ed2115-bb01-11df-a919-e700f669bcfc</pre>
<p>Please take note of the identifier data that is spewed out after running a successful create command and save those values somewhere (example: 922a9664-bb01-11df-a919-e700f669bcfc). I have not yet found a way to list those values back out, and you may need them in the future.</p>
<p>The "describe keyspace" command now shows the 2 column families:</p>
<pre>[default@unknown] use Keyspace1
Authenticated to keyspace: Keyspace1
[default@Keyspace1] describe keyspace Keyspace1
Keyspace: Keyspace1
  Replication Factor: 1
  Column Families:
    Column Family Name: Standard2 {
      Column Family Type: Standard
      Column Sorted By: org.apache.cassandra.db.marshal.UTF8Type
    }
    Column Family Name: Standard1 {
      Column Family Type: Standard
      Column Sorted By: org.apache.cassandra.db.marshal.BytesType
    }</pre>
<p>We also need super column families for the example code. After finding no documentation anywhere on how to create a super column family, trial and error lead me to this command:</p>
<pre>create column family Super1 with column_type=Super and comparator=BytesType</pre>
<p>Now the describe command shows enough to get going on running the examples:</p>
<pre>describe keyspace Keyspace1
Keyspace: Keyspace1
  Replication Factor: 1
  Column Families:
    Column Family Name: Super1 {
      Column Family Type: Super
      Column Sorted By: org.apache.cassandra.db.marshal.BytesType
    }
    Column Family Name: Standard2 {
      Column Family Type: Standard
      Column Sorted By: org.apache.cassandra.db.marshal.UTF8Type
    }
    Column Family Name: Standard1 {
      Column Family Type: Standard
      Column Sorted By: org.apache.cassandra.db.marshal.BytesType
    }</pre>
<p>I suggest at this point you should run all of the Java examples and spend some time on each one making little modifications until you feel comfortable with them. Also take some time to reference them back with the Thrift API, because that will aid in your overall understanding. The example code closely resembles the examples in the <a title="Hector client for Cassandra documentation" href="http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf">PDF documentation file for Hector</a>, so you can read the descriptions for each operation as you try to run and understand it.</p>
<p>I see that this post is becoming really long, so I'm going to break it up into multiple posts. I think at this point we have enough information to get started and experimenting around with some working code examples. I'll post in the future with the steps I'm going to follow to create a very simple application that uses a small 3-node Cassandra cluster.</p>
<ul>
<li>See part 2, <a title="3 node Cassandra cluster on Amazon EC2" href="http://www.geekontheloose.com/programming/java/cassandra-hector-4-noobs-part2-amazon/">bringing up 3 Cassandra nodes on Amazon EC2 here</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/java/cassandra-0-7-and-hector-for-noobs/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Parallelism and Abstraction in Java</title>
		<link>http://www.geekontheloose.com/programming/java/parallelism-and-abstraction-in-java/</link>
		<comments>http://www.geekontheloose.com/programming/java/parallelism-and-abstraction-in-java/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 07:05:48 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=158</guid>
		<description><![CDATA[Here's an interesting parallel programming interview with Intel's Paul Guermonprez covering threads, JSR166y, and Hadoop. I particularly enjoyed the Hadoop discussion at the end.
The discussion focused on efforts to separate Java programming from the nitty gritty details of threads. This separation allows a greater number of developers to successfully program for parallel environment by removing [...]]]></description>
			<content:encoded><![CDATA[<p>Here's an interesting <a title="Parallel programming with Intel" rel="nofollow" href="http://software.intel.com/en-us/blogs/2010/01/28/parallel-programming-talk-61-parallel-java-with-intels-paul-guermonprez/">parallel programming interview with Intel's Paul Guermonprez</a> covering threads, JSR166y, and <a title="Apache Hadoop" rel="nofollow" href="http://hadoop.apache.org/">Hadoop</a>. I particularly enjoyed the Hadoop discussion at the end.</p>
<p>The discussion focused on efforts to separate Java programming from the nitty gritty details of threads. This separation allows a greater number of developers to successfully program for parallel environment by removing focus on the technical details, and thus reducing the knowledge required to write the code. There's also some coverage of the benefits of the higher level of abstraction of functional programming and how the functional programming style is being incorporated into the Java concurrency model. The text has this:</p>
<blockquote><p>The future will be functional programming or won't be at all.</p></blockquote>
<p>Intellectually, I applaud these efforts. Emotionally, I feel some loss.</p>
<p>My first introduction to threads was in a systems programming class, using the C language and Pthreads library. Pthreads blew my mind, or maybe it was lack of quality in the lectures. Either way, determined not to be defeated by Pthreads, I went out and bought a stack of books on Pthreads and threads in general and set out to wrap my mind around threads. It worked, but along the way, I learned that I loved the challenge and so I embraced concurrency and parallelism with much enthusiasm. This knowledge that I've accumulated will always be of great value, but as I move toward programming threads at further and further abstractions, I'll lose that close connection to the internals, and I'm a little saddened by that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/java/parallelism-and-abstraction-in-java/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Martin Odersky and Josh Suereth at Scala BASE Tonight</title>
		<link>http://www.geekontheloose.com/programming/scala/martin-odersky-and-josh-suereth-at-scala-base-tonight/</link>
		<comments>http://www.geekontheloose.com/programming/scala/martin-odersky-and-josh-suereth-at-scala-base-tonight/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 15:46:59 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=162</guid>
		<description><![CDATA[I'm looking forward to a Scala BASE meeting tonight that is headlined by Martin Odersky and Josh Suereth.
From the announcement email:
Professor Martin Odersky is the director of the LAMP group at EPFL, the creator of the Scala programming language, and author of Programming in Scala.
Josh Suereth hosted the first Scala Lift Off East in Reston, [...]]]></description>
			<content:encoded><![CDATA[<p>I'm looking forward to a Scala BASE meeting tonight that is headlined by Martin Odersky and Josh Suereth.</p>
<p>From the announcement email:</p>
<blockquote><p>Professor Martin Odersky is the director of the LAMP group at EPFL, the creator of the Scala programming language, and author of Programming in Scala.</p>
<p>Josh Suereth hosted the first Scala Lift Off East in Reston, VA and has been involved with lots of Scala projects including <a rel="nofollow" href="http://scala-tools.org/" target="_blank">scala-tools.org</a>, scala-arm, scala-io, scala-jigsaw, and scala-lolz.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/scala/martin-odersky-and-josh-suereth-at-scala-base-tonight/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scala Cheat Sheet Created</title>
		<link>http://www.geekontheloose.com/programming/scala/scala-cheat-sheet-created/</link>
		<comments>http://www.geekontheloose.com/programming/scala/scala-cheat-sheet-created/#comments</comments>
		<pubDate>Sun, 26 Apr 2009 01:14:26 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=105</guid>
		<description><![CDATA[I started teaching myself Scala this week because I want to speed up coding of new projects without sacrificing quality, readability, reliability and performance. Scala seems like a good choice for this. I've also been interested in learning more about functional programming, and exploring the concurrency benefits that functional programming can bring to the table.
I've [...]]]></description>
			<content:encoded><![CDATA[<p>I started teaching myself <a title="Scala functional programming language" rel="nofollow" href="http://www.scala-lang.org/">Scala</a> this week because I want to speed up coding of new projects without sacrificing quality, readability, reliability and performance. Scala seems like a good choice for this. I've also been interested in learning more about functional programming, and exploring the concurrency benefits that functional programming can bring to the table.</p>
<p>I've read halfway through a book on <a title="Erlang Functional Programming Language" rel="nofollow" href="http://erlang.org/">Erlang</a> , another functional programming language, and am finding many similarities to Scala. (That book, BTW, <a title="Programming Erlang: Software for a Concurrent World" rel="nofollow" href="http://www.pragprog.com/titles/jaerlang/programming-erlang">"Programming Erlang: Software for a Concurrent World", by Joe Armstrong</a> , is an excellent book and I highly recommend it.)</p>
<p>So far, I'm liking Scala a lot. It's comfortable because of it's close ties to Java and the fact that it runs in the JVM. It's also compatible with much of the Java code I've written in my life, so that's a huge plus!</p>
<p>My only complaint at this point is with the documentation I've found. There's a lot of great online documentation, but the quick start articles I've read thus far seem to skip all over the place and leave a lot out, so I find myself flipping from one to the other trying to make sense of what I'm reading. It would be a lot easier if I had a concise reference to glance at.</p>
<p><a title="Scala Cheat Sheet" href="http://www.geekontheloose.com/wp-content/uploads/2010/02/Scala_Cheatsheet.pdf"><img class="alignleft" style="border: 0pt none; margin: 10px;" title="Scala Cheat Sheet" src="http://www.geekontheloose.com/wp-content/uploads/2010/02/Screenshot-Scala_Cheatsheet-thumb.jpg" alt="Scala Cheat Sheet" width="300" height="233" /></a>I decided that what's needed is a <a title="Scala Cheat Sheet" href="http://www.geekontheloose.com/wp-content/uploads/2010/02/Scala_Cheatsheet.pdf">cheat sheet</a> (pdf), but then couldn't find one, and because necessity is the mother of invention, I've written my own. [Update, most recent <a title="Scala Cheat Sheet" href="http://github.com/joulieboolie/scala-cheatsheet">cheat sheet version now available on Github</a>]</p>
<p>Now beware, this cheat sheet was created by a newborn 3-day old Scala programmer, so there may be a <strong>lot of corrections needed</strong>, and I'm certain it needs more information added to it, but I think this is good enough to start with for a quick start guide.</p>
<p>[Update: I've made several improvements and added a version number that I'll increment every time I update it. (Note: instead the <a title="Scala cheat sheet repository" href="http://github.com/joulieboolie/scala-cheatsheet">Scala cheat sheet has been added to github</a>)]</p>
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img style="border-width: 0;" src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" alt="Creative Commons License" /></a><br />
<span>Scala Cheat Sheet</span> by <a rel="cc:attributionURL" href="http://www.geekontheloose.com/programming/scala/scala-cheat-sheet-created/">Julie Bovee Hill</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License</a></p>
<hr /><em><strong>References for the quick start cheat sheet:</strong></em></p>
<p>The Busy Developers' Guide to Scala series:</p>
<ul>
<li><a rel="nofollow" href="http://www.ibm.com/developerworks/java/library/j-scala03268.html?S_TACT=105AGX02&amp;S_CMP=EDU">“Don't Get Thrown for a Loop”, IBM developerWorks</a></li>
<li><a rel="nofollow" href="http://www.ibm.com/developerworks/java/library/j-scala02198.html?S_TACT=105AGX02&amp;S_CMP=EDU">“Class action”, IBM developerWorks</a></li>
<li><a rel="nofollow" href="http://www.ibm.com/developerworks/java/library/j-scala01228.html?S_TACT=105AGX02&amp;S_CMP=EDU">“Functional programming for the object oriented”, IBM developerWorks</a></li>
</ul>
<p>Scala Reference Manuals:</p>
<ul>
<li><a rel="nofollow" href="http://www.scala-lang.org/sites/default/files/linuxsoft_archives/docu/files/ScalaOverview.pdf">“An Overview of the Scala Programming Language” (2. Edition, 20 pages), scala-lang.org</a></li>
<li><a rel="nofollow" href="http://www.scala-lang.org/sites/default/files/linuxsoft_archives/docu/files/ScalaTutorial.pdf">A Brief Scala Tutorial, scala-lang.org</a></li>
<li><a rel="nofollow" href="http://www.scala-lang.org/node/104">“A Tour of Scala”, scala-lang.org</a></li>
</ul>
<p><a rel="nofollow" href="http://blogs.sun.com/sundararajan/entry/scala_for_java_programmers">"Scala for Java programmers", A. Sundararajan's Weblog, blogs.sun.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/scala/scala-cheat-sheet-created/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Adobe Flex and Linux</title>
		<link>http://www.geekontheloose.com/programming/adobe-flex-and-linux/</link>
		<comments>http://www.geekontheloose.com/programming/adobe-flex-and-linux/#comments</comments>
		<pubDate>Sun, 04 Jan 2009 23:40:25 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/computers/adobe-flex-and-linux/</guid>
		<description><![CDATA[Dear Adobe, Please Support Linux....

I prefer Linux over Windows and have preferred Linux for many years. The notebook computer that I use most of the time is running Ubuntu Linux . Linux has the tools I need. Linux, in particular, has the programming and networking tools that I need. Linux is just simply a superb [...]]]></description>
			<content:encoded><![CDATA[<h4>Dear Adobe, Please Support Linux....</h4>
<p>
I prefer Linux over Windows and have preferred Linux for many years. The notebook computer that I use most of the time is running <a title="Ubuntu Linux" href="http://www.ubuntu.com/" rel="nofollow">Ubuntu Linux</a> . Linux has the tools I need. Linux, in particular, has the programming and networking tools that I need. Linux is just simply a superb environment for programming.
</p>
<p>
Adobe Flex is ActionScript programming. The <a title="Adobe Flex Builder" href="http://www.adobe.com/products/flex/" rel="nofollow">Adobe Flex Builder</a>, an IDE for building Flex applications, is based on <a title="Eclipse IDE" href="http://www.eclipse.org/" rel="nofollow">Eclipse</a>. Eclipse is an IDE that runs on many platforms, including Linux. Flash runs just fine on Linux (though Adobe has historically been somewhat slow to keep the Linux version up-to-date). I use Eclipse every day. I use Linux every day. I'm new to Flex, but find it fascinating and want to use Flex Builder.
</p>
<p>
To sum this up, Linux is an ideal programming platform, the tools required for Flex development (Flash and Eclipse) run happily under Linux independently, so why oh why is the Flex Builder not available for Linux?
</p>
<p>
I downloaded the trail version of Flex Builder to play around with, but to use it, I'm running VMware and have it installed in a Windows XP VM on my Ubuntu laptop – a painful, excruciating situation. So I ask Adobe to please, please, please let your Flex Builder run on Linux, too.
</p>
<p>
As soon as Adobe adds Linux support, I promise to be first in line to purchase Flex Builder for Linux.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/adobe-flex-and-linux/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>memcached</title>
		<link>http://www.geekontheloose.com/programming/memcached/</link>
		<comments>http://www.geekontheloose.com/programming/memcached/#comments</comments>
		<pubDate>Sat, 17 May 2008 23:38:55 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=68</guid>
		<description><![CDATA[
memcached 


From the site : "memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load."


My first software job was for a company that wanted to cache the Intranet for the enterprise, a sort of indirect competitor for Akamai. I [...]]]></description>
			<content:encoded><![CDATA[<p>
<a title="memcached" href="http://www.danga.com/memcached/" rel="nofollow">memcached </a>
</p>
<p>
From the <a title="memcached" href="http://www.danga.com/memcached/" rel="nofollow">site</a> : "memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load."
</p>
<p>
My first software job was for a company that wanted to cache the Intranet for the enterprise, a sort of indirect competitor for <a title="Akamai Technologies" href="http://www.akamai.com/" rel="nofollow">Akamai</a>. I loved the concept of what we were trying to do. It was fascinating and exciting. Sadly, we lost focus and spun our wheels and never really found what the potential clients were looking for. I've never stopped wishing we could have succeeded, though. I think if we had been clever enough to have invented memcached, we might have had more success as opposed to spinning our wheels on developing a custom file system. Ironically, just as we were fizzling out, memcached was coming to life.</p>
<p>Reference: <a title="Web Caching on Wikipedia" href="http://en.wikipedia.org/wiki/Web_cache" rel="nofollow">Web Caching of Wikipedia</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/memcached/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Java Women Group Proposal</title>
		<link>http://www.geekontheloose.com/programming/java/java-women-group-proposal/</link>
		<comments>http://www.geekontheloose.com/programming/java/java-women-group-proposal/#comments</comments>
		<pubDate>Sun, 02 Sep 2007 23:01:08 +0000</pubDate>
		<dc:creator>joulie</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.geekontheloose.com/?p=41</guid>
		<description><![CDATA[
This week I received a forwarded email announcement about a new Java Women group being formed as part of the OpenJDK group. Being a Java devotee myself, it sounds perfect for me, so I'll be checking it out.


Here's the mission of the group as given on their wiki page: "The Java Women Network promotes collaboration [...]]]></description>
			<content:encoded><![CDATA[<p>
This week I received a forwarded email announcement about a new Java Women group being formed as part of the <a title="openJDK" href="http://openjdk.java.net/" target="_self" rel="nofollow">OpenJDK</a> group. Being a Java devotee myself, it sounds perfect for me, so I'll be checking it out.
</p>
<p>
Here's the mission of the group as given on their <a title="Java Women Network" href="http://wiki.java.net/bin/view/JDK/JavaWomen" target="_self" rel="nofollow">wiki page</a>: "The Java Women Network promotes collaboration amongst women who develop and use Java technology to increase the visibility of women's contribution, mentoring opportunities and professional networking."</p>
]]></content:encoded>
			<wfw:commentRss>http://www.geekontheloose.com/programming/java/java-women-group-proposal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

