Fun With QR Codes and Android Phones
I couldn't resist playing around with QR codes today after a new bit.ly QR code generator hit the news.
This image is the QR code for my most recent post to this blog.
My Android phone worked incredibly well as far as reading the code and then opening the browser to the page. Such fun! Then I proceeded to get another co-worker with an Android playing around with it, too, and we then scanned every bar code we could find in our cubes, impressing our cube-mates with the nifty technology. I used the Barcode Scanner by ZXing Team.
Cassandra 0.7 and Hector for Noobs
I've been fiddling around with Cassandra 0.7 Beta2 and the Hector Java client (because I must be a closet masochist messing with a beta [update Cassandra 0.7 is no longer in beta, so please give it a try!]). Since documentation for these is seriously lacking [but getting better], I decided to write up my discoveries and observations in the hopes of helping out other noobs like myself. My ultimate goal, by the end of this post, is to bring up a 3 instance cluster of Cassandra nodes running on Amazon EC2 instances and have a modest Java program using the Hector client and the cluster.
Before you read any further, I want to stress that the Cassandra documentation has very minimal detail and the same goes for Hector, and as of yet, I've found no online tutorials at other web sites for version 0.7. This adventure is not for the faint of heart!
If you've decided to plunge ahead anyway, then you should first study the following documents very carefully, because going forward, I'm assuming you're not a Java noob and you're up-to-speed with at least this much introductory material:
- Cassandra Thrift API for 0.7: http://wiki.apache.org/cassandra/API07
- Cassandra data model: http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
- Hector documentation via Riptano [update Riptano changed to DataStax, so URLs are all updated now]: http://www.datastax.com/sites/default/files/hector-v2-client-doc.pdf [pdf] and also take a look at this blog post: http://prettyprint.me/2010/08/06/hector-api-v2/
- Hector Java Cassandra client code: http://github.com/rantav/hector
- Amazon EC2 for our Cassandra cluster in the cloud: http://aws.amazon.com/ec2/
If you find newer and/or better documents online, please list them in the comments. Thanks!
After you've gotten up to speed with those, then the fun begins!
Download and install Cassandra 0.7 Beta2: apache-cassandra-0.7.0-beta2-bin.tar.gz and run it as a single node on your localhost if possible. The code examples we'll try first make the assumption you're running on localhost and using the default port of 9160.
I recommend spending a little time in the cassandra-cli command line interface tool, experimenting with the commands and becoming a little more familiar with Cassandra. The help command inside the tool is your best bet at the moment for discovering the commands you can try. The cassandra.yaml configuration file gives you the name of the default keyspace, Keyspace1, and the column families to try querying. (When I did this, I found my queries didn't seem to be working as I'd expected, but I stubbornly moved along to the next step anyway.)
After getting Cassandra running, the first thing we can try is to get zznate's Hector example code from github and have a go at making those run. Either download the zip file or "git clone" the repository.
[Note... I use Eclipse and have the Maven 2 m2eclipse plugin installed, and am developing on Ubuntu Lucid Lynx. I have 4 GB of memory with a dual core CPU, running on a notebook computer. You may choose to use whatever hardware, tools and OS you prefer, but my observations are based on this environment. Cassandra wants to use a lot of memory, so please take that into account when configuring your development environment.]
I imported the mavenized project, hector-examples, into Eclipse. Because I'm using Maven and Riptano has graciously provided a maven repository for Hector [update: as of 1/11/2011 Hector is on Maven Central repository and the Riptano repository is deprecated], lots of magic happens at this point. Once Maven finished doing its thing, I immediately found 2 issues that needed to be resolved:
- I needed to update the hector dependency from 0.7.0-17 to 0.7.0-18 in pom.xml
- The DeleteBatchMutate class didn't compile due to using org.apache.cassandra.thrift.Clock, which has been removed from Thrift, so I needed to change the code to use long instead of Clock.
Hopefully, by the time you read this, those issues will already be fixed.
Next, I started trying to get all of the examples to run and then spend some time modifying them to learn more about how Cassandra and the client work. I quickly found that the examples did not run due to no keyspace, column families or super column families being configured. How did I determine that? After observing the error messages when running the examples, I found the PID of Cassandra and then ran jconsole and inspected the MBeans and saw I had nothing configured.
[Note: Cassandra does not come with the mx4j-tools.jar already included in the lib folder, and Cassandra happily notifies you of that when you start it up, so you'll have to go and download that jar file yourself and drop it in the lib folder to make use of jconsole.]
My understanding was that everything in the cassandra.yaml configuration file should have been created the very first time my localhost Cassandra node was run, but any changes thereafter would need to be made via Thrift or JMX. [update: that was an incorrect assumption, the default keyspace will not be created as of Cassandra 0.7] So, I thought I should have seen my column families and super column families, but they weren't there. I don't know if I'm just misunderstanding or I missed a step somewhere. One possible explanation is that before Beta2 of Cassandra was released, I had installed a nightly build (post beta1) that matched up with a Hector build, and that nightly build might have had the initialization step broken. I didn't delete everything to start with a clean slate before upgrading to the 0.7 Beta2 version. I don't know if starting from a clean beta2 would have helped or not, but I suspect it would not have made a difference.
Anyway, I viewed this as an opportunity to try my hand at creating my own keyspace, column families and super column families. If you happen to already have these installed, then you're in good shape for running the example code, but sooner or later you're going to need to learn how to create these yourself, so let's walk through that now.
Start up cassandra-cli and configure the keyspace:
create keyspace Keyspace1 with replication_factor = 1
For a single node on your localhost, replication_factor has to be 1.
[Note: the documentation says to include the placement strategy in the command, such as "placement_strategy = org.apache.cassandra.locator.RackUnawareStrategy", but that doesn't work and the cli expects an integer. I haven't yet found a mapping of placement strategies to integers, but am still looking.]
Next configure the 2 column families following the rudimentary instructions given on this page of the Cassandra wiki.
[default@unknown] use Keyspace1 Authenticated to keyspace: Keyspace1 [default@Keyspace1] create column family Standard1 with column_type = 'Standard' and comparator = 'BytesType' 922a9664-bb01-11df-a919-e700f669bcfc [default@Keyspace1] create column family Standard2 with column_type = 'Standard' and comparator = 'UTF8Type' and rows_cached = 10000 99ed2115-bb01-11df-a919-e700f669bcfc
Please take note of the identifier data that is spewed out after running a successful create command and save those values somewhere (example: 922a9664-bb01-11df-a919-e700f669bcfc). I have not yet found a way to list those values back out, and you may need them in the future.
The "describe keyspace" command now shows the 2 column families:
[default@unknown] use Keyspace1
Authenticated to keyspace: Keyspace1
[default@Keyspace1] describe keyspace Keyspace1
Keyspace: Keyspace1
Replication Factor: 1
Column Families:
Column Family Name: Standard2 {
Column Family Type: Standard
Column Sorted By: org.apache.cassandra.db.marshal.UTF8Type
}
Column Family Name: Standard1 {
Column Family Type: Standard
Column Sorted By: org.apache.cassandra.db.marshal.BytesType
}
We also need super column families for the example code. After finding no documentation anywhere on how to create a super column family, trial and error lead me to this command:
create column family Super1 with column_type=Super and comparator=BytesType
Now the describe command shows enough to get going on running the examples:
describe keyspace Keyspace1
Keyspace: Keyspace1
Replication Factor: 1
Column Families:
Column Family Name: Super1 {
Column Family Type: Super
Column Sorted By: org.apache.cassandra.db.marshal.BytesType
}
Column Family Name: Standard2 {
Column Family Type: Standard
Column Sorted By: org.apache.cassandra.db.marshal.UTF8Type
}
Column Family Name: Standard1 {
Column Family Type: Standard
Column Sorted By: org.apache.cassandra.db.marshal.BytesType
}
I suggest at this point you should run all of the Java examples and spend some time on each one making little modifications until you feel comfortable with them. Also take some time to reference them back with the Thrift API, because that will aid in your overall understanding. The example code closely resembles the examples in the PDF documentation file for Hector, so you can read the descriptions for each operation as you try to run and understand it.
I see that this post is becoming really long, so I'm going to break it up into multiple posts. I think at this point we have enough information to get started and experimenting around with some working code examples. I'll post in the future with the steps I'm going to follow to create a very simple application that uses a small 3-node Cassandra cluster.
- See part 2, bringing up 3 Cassandra nodes on Amazon EC2 here
Another Ada Lovelace Day Post – CS Role Model
I posted earlier for Ada Lovelace Day about LinuxChix.org as a great resource for women in technology, and now I'm getting into the groove and want to add another post, this time about my first female role model in computer science.
Dr. Neelima Shrikhande is a professor of computer science at Central Michigan University. At the time I was working on my MS, she was the only female professor in the department. I never had an indication that she views herself as a role model for the few women studying computer science there, but she is definitely a role model.
She's a super intelligent and focused woman for whom I have a lot of respect. According to the cmich.edu website, she "is an authority on computer vision and artificial intelligence. She studies how to make computers capable of seeing things and understanding pictures."
I had her for only one class, my compiler class, but she really opened up the world of computer science for me with that class. It was a hard and life-consuming class, but I loved it more than any other class and even used what I learned for my thesis. I now have a life-long fascination with compilers and virtual machines because of that class and I still have my dragon book. At the time, I never thought about this, but I imagine that class was at least as hard to teach as it was to take, but she held up to the challenge seamlessly.
Thanks, Dr. Shrikhande, for being such a sharp, successful role model in computer science.
Shameless plug: the CMU CS department is a great place for an education!
Celebrating Ada Lovelace Day and Women in Technology
March 24th has been designated as Ada Lovelace Day and is an opportunity to celebrate the achievements of women in science and technology. FindingAda.com is encouraging women to blog about this today.
I've not paid much attention to her aside from being aware of her and knowing she's the namesake of the Ada programming language, but I've benefited tremendously from her contributions and the contributions of other women in technology for most of my life.
I read up on Ada at Wikipedia and learned this bit of trivia today, "she was the only legitimate child of the poet Lord Byron and Anne Isabella Milbanke."
I won't spend a lot of time dwelling on her interesting life, because Wikipedia does that far better than I could, but I'll take this opportunity to make mention of some current-day pioneering women in technology, the women who are advocating and teaching other women about Linux, computers, and other free software via LinuxChix.
LinuxChix is a community for women who like Linux and for anyone who wants to support women in computing. We are an international group of Free Software users and developers, founded in 1999 with the aim of "supporting women in Linux." Founder Deb Richardson described it as an alternative to the "locker room atmosphere" found in some online technical forums and gave LinuxChix two core rules: "be polite" and "be helpful." LinuxChix is now many things to many people, but it remains primarily a group for supporting women in computing, specifically in Open Source/Free Software/Software Libre computing.
If you're a woman in need of help or able to offer some help to others, check out LinuxChix!
A Loosely Coupled Cloud
"Build loosely coupled systems." That was one nugget of recurring advice given last night by Jorge Noa, CTO of HyperStratus when he spoke at a meet-up titled "Amazon EC2 Cloud Computing and Application Design" held at HackerDojo (see slides here - pdf and I also found the same slide show already online here as an O'Reilly Media Slideshare).
After a review and comparison of various IaaS, PaaS and SaaS services, the talk then focused on details of Amazon's overall cloud offering. Finally he finished out the presentation with a discussion of software developer best practices - the primary reason I attended. More time spent on software development would have been a big plus in my view, but I can understand that he felt the need to get everyone in the room up to speed on Amazon's platform. It was a big crowd.
Cloud Computing Development Best Practices
The ten best practices Jorge espoused were:
- Build cloud apps, not apps in the cloud
- Virtualize the application stack
- Design for failure and nothing fails
- Design for scalability
- Loose coupling lets you maximize plug and play
- Design for dynamism
- Build Security into every component
- Leverage native cloud storage options
- Leverage best cloud Management Tools
- Don't fear cloud constraints
Of those ten, the two points that gave me the most pause for contemplation were to "build loosely coupled systems" and to "build security into every component."
Build Loosely Coupled Systems
"Build loosely coupled systems" brought a flash from the past, triggering a memory of a distributed operating systems class I had in the 1990s. The concept of loosely coupled systems was new for me back then and made a big impression, so I dug out my old textbook (yes, I kept them all!) to refresh my memory. The textbook was "Modern Operating Systems" by Andrew S. Tanenbaum.

