Wednesday, October 14, 2015

Cassandra anyone?

Been some time after my last post. Blogging has not been an option for the past few weeks with other commitments going on. But you cannot stop doing what you love the most. So here I am back again fervently nudging away on the keyboard. I was analyzing the use of Cassandra for one of the projects I am working on. This post will focus on the getting started aspect with related to Cassandra and mainly in terms of Windows because fortunately or unfortunately that is the operating system I am stuck with at the moment (no pun intended).

The latest Cassandra distribution can be downloaded at here

I wanted to have the ability to run Cassandra as a Windows service. Here are the steps on how to get that done;

  • Download Apache commons daemon from here
  • Extract the Cassandra distribution to a location of your choice
  • Go into the “bin” directory of the Cassandra distribution you just extracted
  • Create a folder called “daemon” inside the “bin’ directory
  • Extract the Apache commons daemon distribution to a location of your choice
  • Copy the “prunsrv.exe” related to your architecture (32bit/64bit) to the “daemon” directory created inside the “bin” directory in your Cassandra distribution extract
  • Go into your “bin” directory via the command line and execute the following command;

 cassandra.bat install

 That’s it. Now you have Cassandra installed as a windows service which you can set to start automatically if that is what you prefer.
By default, authentication and authorization is disabled on Cassandra. Enabling this is a breeze with the Cassandra YAML configuration file. Go to the “conf” directory of your Cassandra distribution where you will find a file named as “cassandra.yaml”. Open up this with your favourite text editor (Notepad++ is awesome. Just saying). Make the following changes to the already existing configuration;

  • authenticator: org.apache.cassandra.auth.PasswordAuthenticator
  • authorizer: org.apache.cassandra.auth.CassandraAuthorizer

Now that you have enabled authentication and authorization, you can go in and login to Cassandra. The super user in Cassandra has the following credentials which you must use to log in;

  • User name : cassandra
  • Password : cassandra

To log in, let us go back to the “bin” directory of the Cassandra distribution where you will find a batch file called “cqlsh”. Bring up your command line again and execute the following command;

cqlsh -u cassandra -p cassandra

Now that you are in, if required you can go and change your super user password. One of the advantages of Cassandra for me is how similar the syntax is to relational databases. You can get up and running in a very short amount of time as the commands are very similar to traditional SQL. Cassandra has its own query language called CQL.
Next up, I was looking for a convenient GUI to interact with Cassandra. Browsing around, the most user friendly interface I found was DBeaver which you can download from here.

On the next article, I will focus on functionality with respect to Cassandra and how your thinking pattern should change when working with Cassandra as it will not be the same as how you design a traditional database schema.

Sunday, February 22, 2015

Performance monitoring and profiling – Part 1

Performance monitoring and profiling are two different aspects. The former takes a more of a proactive measure whereas the latter is a reactive approach. In my experience, performance is an afterthought in most cases rather than being built into the software development life cycle. I see the same thing in most cases when it comes to security as well but let us not go there today as that lends itself to a different post.

Performance monitoring

In order to monitor your application, you first need to understand what performance aspects the end users of your application expects. This would include throughput, response time, uptime etc. After you collate these information, the next step would be to create scripts for load testing your application. Coming primarily from a Java background, I normally use JMeter which comes with a user friends UI to create load testing depending on your use case be it database, Web Service, JMS etc.

So you have your load testing scripts and it is running fine. What next?

Well now that you have the load scripts in place, the next step would be note down what parameters you should monitor. Generally you would start out monitoring the CPU and Memory usage while your load test is in progress.

When you monitor CPU, you would be looking at the CPU usage as the application is being load tested. If there is high CPU usage, you need to further drill down to find out what issue might cause this. If there is low CPU usage, then probably you can try and increase the load to see what the optimum load that uses efficient CPU.

If you are on a Windows machine, by default you will open the task manager and go into the Performance tab in order to find out what your CPU is up to. A snippet of mine is as follows;

I’m running on a Core i5 with 8GB of RAM. In the upper right hand corner you can see the CPU usage history. The four boxes here represent the four CPUs. You will get the same value if you run Runtime.getRuntime().availableProcessors() within a Java program. 

By default it only shows the total CPU time. But you can easily add the kernel CPU time by clicking onn View-> Show Kernal Times. You will then be presented with the following graphs with a red line;

The space between the red line and the green line is the user CPU time. So what is the difference between the user and the kernel CPU time you might ask.

User CPU Time : The amount of time the CPU spends on running your application code.

Kernel CPU Time : The amount of time the CPU spends on operating system related activities. For example if your application does a lot of reads/writes to the disk, you will see high kernel CPU as it requires the operating system related functionality to be invoked.

But what if you needed further information on what your CPU is doing at any given moment of time? PerfMon to the rescue. Open up a Run tab and type in perfmon and you will be presented with the following screen;

You can add any counter which you are interested in monitoring by right clicking on the graph area and selecting the Add Counters menu option. If you want to monitor the user CPU and kernel CPU, select the Performance object and under that you will find the User time and Privileged time (kernel CPU time) counters.

Another key indicator to watch out for when monitoring CPU is the CPU schedule run queue. This is where all light weight processes needed to run are queued up waiting for CPU. As a rule of thumb, if your run queue depth is more than four times the available virtual processors on your machine, then you need to investigate your application further more to see what could be done. In the era of high memory machines, people often overlook the fact of using and writing efficient algorithms and data structures as there is plenty of memory available on high end servers. But the issue here is that many a times CPU is limited. If your algorithms/data structures do not scale well with added load, you will end up over loading your CPU with the only alternative being is to scale your application into different servers. So if you are having issues with the scheduler run queue depth, it is always best to see the possibility of writing certain sections of your application code in a more efficient manner so as to utilize your CPU better. The run queue depth can also be monitored using Perfmon by selecting Add Counters->System-> Processor Queue Length.

Now that we have covered some of the important factors to look into in terms of CPU when monitoring your application, in the next post we will look at what aspects we need to consider when monitoring the memory usage.

Please do leave by your comments and suggestions which is as always much appreciated as I love learning from the experience of others which I deem as invaluable.