Scaling Overnight

We, at nokta technology team, are responsible for the infrastructure of nokta media projects, most prominent being izlesene.com, which is the largest Turkish video-sharing website. We believe we are in a unique position being stationed in a developing country with a reputation of internet censorships and trying to make a living through an internet business. And, dare i say, one of the hardest kind: video as the model and a revenue-stream based on advertising.

If this post somehow gets attention from the tech folks living in bay area where funding can be found lying on the street and every business owns a datacenter with 10k machines, let me start with a few friendly reminders here: In this part of the world, no datacenter is that big, any seven figure funding makes headlines, and advertising budgets are abysmal. While it may be the norm for businesses out there to outsource some operations to CDNs and cloud providers to cut the costs and focus on their business, it’s generally the other way around here. This means the list of things that our team is responsible for includes: networking and cabling, physical servers and their components, virtual machines, storage systems, databases, data warehouses, video transcoding, video streaming, static-content delivery, load balancers, firewalls, real-time analytics and so on. We spend our average day monitoring, troubleshooting, improving and designing the new versions of these things. March 27th was not an average day.

On March 27th, around 1500 UTC, YouTube has been banned by turkish government. ISPs applied that ban by DNS spoofing, so users trying to watch a cat diving into a box or listen to turkish pop music were reading a legal notice instead. While there is a lot to say about censorships, this post is about something else. Such a ban does not stop the average user from reaching her goal. She simply goes back to the search result and clicks on the second link. That would be us.

We are used to a similar situation when YouTube is having some kind of outage and the same user behavior applies so our traffic doubles, but those don’t last beyond an hour. This was a total ban and it was here to stay. We had no idea where the traffic would reach or when the ban would be lifted for that matter. Now, after two months, ban is still in effect and we know where we are: We are now riding a 6 times bigger traffic than what we were just a few months ago.

It wouldn’t be a story if we were serving 10 requests a day before and 60 requests now. So here are some numbers comparing first weeks of March and May:

image

And this is a direct screenshot of one of our zabbix graphs showing between Feb 16th and May 18th.

image

We know none of these numbers are enter-your-popular-webgiant-here grade but they are not also enter-your-average-website-here grade either. The interesting part for us here is actually the change in numbers. We think this kind of leap is not possible in a healthy internet ecosystem. It’s the result of being in the unique position mentioned earlier. The same position that puts a team of five, to play with just over a hundred machines in order to keep things running. And it’s sad.

It won’t be honest to say that we were totally unprepared though. Good part of the last year is spent deploying and running a private openstack cloud, a storage cluster running ceph, a data processing cluster running hadoop and storm, several haproxy and nginx and varnish installations, zabbix to monitor and puppet to keep us sane, all on ubuntu linux. When the day came, our endeavors have paid off, all of the systems kept their promises. Such a solution with any proprietary system is simply impossible, where standard support doesn’t go through in less than a month and advanced support we require is non-existent, not here anyways.

While each of them deserves a post on their own, here is a short overview of what we have done after March 27th and how these systems helped us.

  • Just when we know we need more machines, our previously purchased hardware got delivered the day after March 27th. BTW usual hardware purchase-delivery takes about a month for us, so it was lucky.

  • We deployed new machines for video streaming. With cobbler and puppet in place, it takes about an hour to get a machine from bare metal to ready state.

  • We launched additional web servers. Openstack creates an instance in seconds, required environment is set with puppet in a few minutes.

  • We exhausted available compute resources for openstack and started adding new compute nodes. No problem.

  • Adding new storage nodes to ceph is business as usual for us. It didn’t require further attention.

  • Our search index solr was a single instance and after about 12x load, it failed. We had to redo it into a solr cloud. Didn’t take more than a few hours. Then kept adding nodes.

  • Log collector flume needed a few configuration changes in order to pump more data into hadoop and kafka. Storm topologies are running flawlessly after some minor tuning.

  • Some user facing servers required kernel tuning done with sysctl.

We’d like to complete our words by thanking the open source communities of all kind, for this kind of story would not have been possible without them. We’ll be posting more technical details soon.

Note: Since this post ended up on my personal blog, i have to write about the team:

  • Hakan Kocakulak (@hakankocakulak): Team Leader
  • Caglar Bilir (@caglarbilir): Systems
  • Ahmet Kandemir (@ahbikan): Systems
  • Selcuk Tunc (@ttselcuk): Software
  • Erdem Agaoglu (@agaoglu): Software

Note: At the time of writing this post, there are rumors about YouTube ban being lifted. Our metrics still say otherwise.

Nginx and weird 416 responses

We have been experiencing lots of 416 responses on our nginx based video servers recently. Well… not that recently in fact. We were seeing those since we saw our first access log, but it’s like one client getting tens of those every second for a few seconds so we thought it may be a problematic flashplayer or something like that. Deeper inspection proved otherwise. But first, WTF is 416?

416 is a client-error type HTTP response code which is defined as “Requested Range Not Satisfiable”. It means the client requested some (byte) range over a resource but the range for that resource is not applicable. Simple example would be like the file is 1MBs but the client requested the portion between 2MBs and 3MBs of it. Server cannot reach the 2nd MB of a 1MB file so it will respond with 416. All the details about this are told in rfc2616 section 14.35.

On our case, it didn’t actually make much sense since most of our video is played on flash-based players and as far as we know, flashplayer is unable dispatch a HTTP request with Range header (i don’t know much about flash so i am not actually sure about this). But it happens and generates a lot of traffic waste, and probably some unhappy clients.

So, after some clever but dirty tcpdumping we see those requests were not like the simple example i just mentioned but actually invalid Range headers. Out of that dirty tcpdump output we saw things like:

Range: bytes=7259-7258
Range: bytes=10513-10512
Range: bytes=0--1

There is a pattern here: some client, some browser, some code with an off-by-one error causes last-byte-position to be one less than the first-byte-position which generates a syntactically invalid Range header as rfc puts it. This is the client’s problem. But the next sentence in the rfc says the recipient (nginx) MUST ignore such headers. And AFAICT if you ignore a header you should process the request as if that header was never there and you should respond 200 with all the content. It seems nginx has a bug there.

Our first course of action in these kind of situations is google the hell out of the problem, since we generally believe we are not alone. But this time we found nothing. Next, we tried apache and lighttpd with the same request and they responded 200 as rfc suggested. So we replaced our nginxes with apache… just kidding.

We thought it’d be possible to workaround this by using some lua in our config so here is what we came up with:

header_filter_by_lua '
	if ngx.var.http_range then
		local brange = string.sub(ngx.var.http_range, 7)
		local start = tonumber(brange:sub(1, brange:find("-")-1))
		local stop = tonumber(brange:sub(brange:find("-")+1))
		if stop and start and stop < start then
			ngx.req.set_header("Range", nil)
		end
	end
';

Without any prior knowledge of lua and searching for every basic operation this is as good as it gets. We strip bytes= part first, then cast the parts before and after the first dash into numbers. If those numbers are syntactically invalid, remove the header. Sometimes all that string manipulation and casting will fail and start or stop vars will be nil. If that happens, do not touch anything and let the nginx core do its work.

At the time of writing this post, we haven’t moved this piece into production yet but some local tests showed it’s OK. Just paste it into the server section of your nginx conf and you should be good to go.

PHP ile Mongo’ya bağlanmak

Ne PHP ile ne de Mongo ile yıldızım barışık değil. Ayrıca internetlerde her yerlerde yüzlerce defa yazılmış bir şeyi tekrar yazmak da hiç huyum değildir. Ancak görünen o ki bu konuda çok temel bir kaç noktada (muhtemelen türkçe kaynak açısından) eksiklik sözkonusu. Çok kısa tutacağım.

Eğer gerekçeniz “MySQL’den hızlıymış, bizim master-slave MySQL yerine buna geçmemiz lazım” ise, şimdiye kadar “master-slave gibi bir şey” olan MongoDB replika-set kurulumunu yaptığınızı tahmin ediyorum. Sıra geldi read-mysql ile write-mysql bağlantılarınız değiştirmeye:

Durun. Read ve write için ayrı bağlantı yapmayacaksınız. Artık hem read hem write işlemlerinizi tek bir veritabanı bağlantısı üzerinden yapmanız gerekiyor. Bu bağlantı basit bir ayar sayesinde read işlemlerinizi slave’lere (secondary’lere) write işlemlerinizi de master’a (primary’e) gönderecek. Yapmanız gereken bağlantı satırınıza replika-set adınızı (kurulumda tanımladınız) ve readPreference denen bir ayarı eklemek. Yani

$writelar = new MongoClient("mongodb://192.168.1.11");
$readler = new MongoClient("mongodb://192.168.1.12");

yerine

$mongo = new MongoClient("mongodb://192.168.1.11,192.168.1.12?readPreference=secondaryPreferred", array('replicaSet' => 'rs'));

Böylece “Cannot run command XXX: not master" hatasından kurtulmuş oldunuz. 

Geldik “MongoClient::__construct(): php_network_getaddresses: getaddrinfo failed: Name or service not known" gibi saçma bir hataya. "Ben bağlantı tarafına IP yazıyorum ne alaka" demeyin maalesef artık uydurma master-slave yapısı kullanmıyorsunuz. MongoDB replika-set tüm dağıtık sistemler gibi düzgün çalışan bir DNS altyapısı ister. Ama malum bugünlerde düzgün çalışan DNS bulmak zor. Ya da "eskiden DNS mi varmış", ya da "hosts dosyası ne işe yarıyor o zaman", ya da "isim olursa sistem yavaşlar IP kullanın", ya da "çok lazımsa tamam hosts’a yazarız o o kadar da yavaşlatmaz" cılardansanız yapılacak şeyi de biliyorsunuz. Kurulumu yaparken kullandığınız isimleri tüm makinalarınızda hosts dosyalarında tanımlayacaksınız. Makinalarınızın adı mongomaster ve mongoslave gibiyse:

192.168.1.11  mongomaster
192.168.1.12  mongoslave

Geçmiş olsun. Artık güvenle gidip sağda solda “bu mongo da çok kötüymüş, hiç bir şey yapamıyon olm, memcache daha iyiydi” diye atıp tutabilirsiniz.

GPUs, Hadoop and Testing Scalability

As i told numerous times before, i am currently trying to get some GPU powered image processing application to run on Hadoop. In development phase we were using a cluster of 12 machines with one Nvidia GTX 480s each, but since we are launching in a few months, we had to do some tests on our production cluster of 25 machines with two Nvidia Tesla M2050s each. In this post, i’ll try to sum up the process of testing, technical details will come later.

First some reminders about our architecture. Image processing application (IPA) receives an array of images and returns an array of results of doubles. A reduceless MapReduce application divides the images in HBase into chunks, and passes those chunks to IPA. Simply put, while it’s improbable for a single IPA to process thousands of images at once, whole system is able to process millions of images in parallel.

What matters (on our end) is the number of images IPA received and how much time did it take to return a resultset. Using those, we calculate a basic metric: speed in number of images processed per second (ipps). We also calculate the same speed for whole cluster, to see if we can reach a speed like nx ipps when our IPA runs at x ipps and cluster runs n IPAs in parallel (spoilers … we can!).

To show this in numbers, we measured base IPA speed on GTX 480. While the CPU on the system also effects it a bit, its runs at 19.46 ipps on average. On the other hand, our cluster with 12 GTX 480s runs at a total speed of 231 ipps which is extremely close to 12 x 19.46 = 233.52 ipps! Looking at this numbers we assumed our system scales linearly so when we increase the number of GPUs to, say 24 we’ll have 231 x 2 = 462 ipps.

With this assumption in mind, we measured base IPA on Tesla M2050, which is 14.80 ipps (yes, Tesla M2050 is about 24% slower than GTX 480) and expected to have a speed like 14.80 x 50 = 740 ipps on our production cluster with 50 Teslas. Our first results with 518 ipps was nowhere near that. We started investigating…

After some lousy ideas putting the blame on IPA folks and node configurations, we took a step back and started questioning our ways of testing. We knew there were IO and Hadoop task management overheads but they were omissible … for jobs containing large amounts of images. We missed that the definition of large would differ amongst clusters such as one with 12 GPUs and another with 50 (!). We were testing both using 100.000 images and it could’ve been a small number for the latter. We slowly increased the number of images to one million and…

we got close enough to expected speed of 740 ipps with 709 ipps. MapReduce jobs in our system will process millions of images in production which means the cluster will be fully utilized. If there were only a hundred thousand images a large portion of the investment would have been wasted.

Lesson learned in scalability: You have to cut your coat according to your cloth. Or you shouldn’t buy more cloth than what would be necessary to cut your coat. Or … Whatevs, you got the point.

Lesson learned in testing: Always test your systems, then test the hell out of them and when those don’t satisfy you change your tests and test again. It might cost some time but it will save money.

Java web services without (explicit) code generation - with exception handling

Finally… it’s been some busy weeks which i constantly sat in front of the computer in first one and constantly moved around them in the others. Finally i found some space to finish what i started. But space is not too much so i’ll keep this short.

Previously i talked about some funny web services stuff, and finished with a problem concerning exception handling. In short form: SOAP cannot transparently handle Java exceptions, so you cannot throw something in the server and expect the client to catch the same. You need some transformation.

In a longer form: SOAP has a thing called soapfault which is the closest thing you have to a java exception, but in order to use it you have to accept some rules. First of all your exception should be a checked one. Second, soapfault is basically XML so it can only transform things that can be parsed/rendered into XML. Which means you have to wrap your exception information in a JavaBean enabling it to be easily transformed into XML and back. Looks good with one little problem: what if you can’t get your exception information into a bean. Maybe you have errors in an enum or even worse, you are using an interface to describe your error codes. Well, JAX-WS has nothing to offer.

Another thing JAX-WS does not offer is a decent developer documentation. You have to waste some hours debugging to see which method gives you what. Except for typesafe exception handling, … because here goes:

MyException is the custom checked exception. For simplicity, it has two String parameters but it is possible for you to create your interface implementing custom object or enum or whatever you need. Just change the object instantiation in line 78 to suit you. Usage of this snippet will be simply like

MyService port = getPort(new QName(MY_NS, "MYServicePort"), MyService.class);
port = JaxWsExceptionCatcher.catchOn(port);

And you can catch and process your exceptions just as the service is in your classpath, still without code generation.

Java web services without (explicit) code generation

I don’t know you but i hate code generation. Bytecode generation may sometimes be useful, but kills debugging capabilities so should be avoided most of the time. Source-code generation on the other hand, i simply fail to understand the necessity. If some 3rd party library will write the codes i will run, why can’t i simply let the library do whatever it needs over some sort of an API?

Anyways, we all know the story. If you are making use of an external SOAP web service, you are kinda forced to generate (source) code. But most of us expand this approach and generate code for SOAP web services between modules of the same project. Which is extremely unnecessary, after JAX-WS 2.0 (i guess, not sure about the version). Instead, we can give plain-old-java-interface of our service and WSDL url to JAX-WS and make it work for us.

class MyService extends Service {
  public MyService() throws Exception {
    super(new URL("http://path/to/service?wsdl"),
        new QName("http://service.my.org/", "MyService"));
  }
  
  public My getMyPort() {
    return getPort(new QName("http://service.my.org/",
        "MyPort"), My.class);
  }
}

Above code shows the whats necessary on client side. Service we extend from is a class in JAX-WS framework. My is the interface of the service we are trying to use. This is the simplest example which you will find when you google JAX-WS without code generation. But as always noone’s trying to make a life with hello-world applications.

Every module uses custom beans (complex-types) in communication so a single interface will not be enough to work (It will be if there are no complex-types). JAX-WS will auto-generate transport classes but will not touch business specific beans. So what i come up with is to make the service providing module to publish a jar with necesseary beans and web service interface. Service consuming module defines a dependency to that artifact and goes along with its life. The jar actually contains the half of the stuff what JAX-WS would generate but now, its not ugly as in generated by some magic library, its ugly as some module developer wrote it ugly so you can push him/her around. Another upside is now that you have written the instantiator code (above) you can write it anyway you like and say dependency inject using guice.

Story does not end here though. Now that you have (almost) isolated yourself from SOAP-mechanics (using guice and all) you may want your service provider’s exceptions untouched. Hold tight for second post.

Using ivy and maven together

It’s not logical, highly unnecessary and probably expensive. But anyhow we found ourselves in that environment no matter what. Problem stemmed from the fact that eclipse/RCP dependency system being incompatible with virtually everything out there. We were using ant/ivy and pretty happy with it but our UI side found no easy way of headless-building their application using it. Eclipse is trying to make use of maven 3 with a thing called tycho, but that’s another story. Point is, they were practically forced to maven, and so was i (us).  

The problem is, the eclipse project(M1), which is built using maven, depends on a project(I1) which is built using ivy. Since these projects are constantly evolving, dependency is for SNAPSHOT version. Add another oversight of choosing nexus as artifact repository manager, we ended up being unable to publish SNAPSHOTs with ivy and depending on them with maven.

We set M1’s updatePolicy value to always and expected it to re-download the snapshot artifact of I1 on every change but there is more than one way to do this. Ivy relies on the timestamps of those while maven uses external metadata to identify if a SNAPSHOT artifact has been changed or not. But, ivy has no idea about an external metadata file during publish/deploy so nothing to use for maven. (1)

Nexus can actually repair missing metadata files but i think (not sure) it requires the artifacts to be deployed with uniqueVersions (that funny timestamp-like things replacing SNAPSHOT). Of course, ivy has no idea about those neither. (2)

OK, we can disable uniqueVersions and get “SNAPSHOT” without funny timestamps. But no, because maven 3 got rid of the functionality and uniqueVersion is always on. (3)

Adding (1), (2) and (3); we had a huge incompatibility problem on our hands. Some researching came back negative (maven blaming nexus, nexus blaming ivy, ivy asking questions why maven?…) we fell back to disabling ivy-publish'es and using mvn deploy:deploy-file's instead. Reconfigured our jenkins accordingly and finally evaded problems.

Bottom line: don’t use ivy and maven together; it’s not logical, highly unnecessary and probably expensive.

Apache ODE and CLOB issue

I took-over some responsibilities from a recently departed collegue, and with it, i was kinda forced to turn back to JEE world. Not exactly the same technologies and frameworks i am used to, but once you hate some part of something it is likely that you won’t enjoy the other parts.

Anyways, first assignment was to move some WS-BPEL processes from glassfish to Apache ODE. It sounds like it should be easy since WS-BPEL is a standardized and well-acknowledged specification but only an inexperienced and/or naive developer believes that. Standards are never that standard. Only the simplest hello-world can be deployed to more than one (two at most) container without a problem, your JPA application will never port from hibernate to toplink and your standards-compliant webpage will never look like the same in IE. Without some unknown hours/days of hard work, that is.

But for that instance, i got lucky. The hard part was already done and documented (1|2|3) by Hilal Tarakci (still not twitting!), whom i’ve been working closer now. The last problem was the easiest one but helped me steal all the credits. ODE, by default, works using a derby database which doesn’t like CLOBs larger than some size and barfs like this when it encounters one:

java.sql.SQLException: An unexpected exception was thrown
	...
Caused by: java.sql.SQLException: An unexpected exception was thrown
	...
Caused by: java.sql.SQLException: Java exception: 'A truncation error 
  was encountered trying to shrink CLOB '' to length 1048576.:
  org.apache.derby.iapi.services.io.DerbyIOException'.
	...
Caused by: org.apache.derby.iapi.services.io.DerbyIOException:
  A truncation error was encountered trying to shrink CLOB ''
  to length 1048576.

I guess this was somewhat expected because there is a small tutorial in the installation docs of ode, showing how to configure it work on a mysql db. Distribution package also contains DDLs for Oracle, but if you’re already running a postgresql server and don’t want another link in the chain, you’re (not) alone. Without further ado, here are the things you should do.

  1. Create the database you wish to use on the server (you wish to use).
  2. Get this SQL piece and execute on it.
  3. Take this context snippet and place it into your $TOMCAT_HOME/conf/server.xml in <Host> part after modifying as necessary.
  4. Get a jdbc jar from postgre and place it into $TOMCAT_HOME/lib
  5. Get this properties file and place it into $TOMCAT_HOME/webapps/ode/WEB-INF/conf
  6. Start tomcat.

And yes, i mostly got this from the original tutorial. Only thing i did was to edit SQL into the form that postgre would understand. For those of you running something bigger than tomcat, it should be easier to define a JDBC connection on JNDI.

Hadoop MapReduce job statistics (a fraction of them)

Well, this has been on my backlog for a while. The problem is extremely simple actually: when did a MapReduce job started processing? I need this info to report to my clients using my API, meaning redirecting them to the JobTracker's web interface is not an option.

Everyone using hadoop for some time knows 0.20 is the version to use, and everyone developed something other than a WordCount knows it’s a PITA. API is hard to use at best, misleading and incomplete most of the time. You might wonder how hard can it get to extract a basic (and easily accessible over the web interface) piece of information such as a start time of a job, all i can say is very.

Without further ado, while i expect something like Job.instance("JOB_ID").getStartTime() here is the piece of crappy code i found to be working:

long startTime(String jobID) {
  Configuration conf = new Configuration();
  JobClient jobClient = new JobClient(new JobConf(conf)); // deprecation WARN
  JobID jobID = JobID.forName(jobID);                     // deprecation WARN
  RunningJob runningJob = jobClient.getJob(jobID);
  Field field = runningJob.getClass()
      .getDeclaredField("status"); // reflection !!!
  field.setAccessible(true);
  JobStatus jobStatus = (JobStatus) field.get(runningJob);
  return jobStatus.getStartTime(); // finally
}

As noted above, JobConf and JobID are deprecated. But since there is no way of working with anything non-deprecated, we reluctantly accept that. What we may not accept is working with reflection, but well… I couldn’t find any other way (please point me if you know). It is actually funny to have that information in the status field of runningJob but not able to access with because of a lack of getStartTime() method which reads from it. (BTW v0.21 is closer to what i expect but it is largely unusable for various reasons.)

On the other hand, my requirement wasn’t that, exactly. There may be a delay between the time i have submitted a job and it started processing, highly because the cluster was busy. What i needed was when the job started actually processing, meaning the time the first task is fired on a task tracker. Now i expect something like Job.instance("JOB_ID").getTasksOrderedByStartDate().get(0).getStartTime() but i know i won’t get what i expect, instead:

long actualStartTime(String jobID) {
  Configuration conf = new Configuration();
  JobClient jobClient = new JobClient(new JobConf(conf)); // deprecation WARN
  JobID jobID = JobID.forName("job_201107011451_0001");   // deprecation WARN
  RunningJob runningJob = jobClient.getJob(jobID);
  TaskID firstCompletedTaskID =                           // deprecation WARN
      runningJob.getTaskCompletionEvents(0)[1].getTaskAttemptId().getTaskID();
  for (TaskReport tr : jobClient.getMapTaskReports(jobID)) {
    if (tr.getTaskID().equals(firstCompletedTaskID)) {
      return tr.getStartTime(); // search !!!
    }
  }
}

First task completion event belongs to the SETUP task which runs on the time of job submitting no matter what the cluster is busy with. That’s because i’m getting the second one in the array using [1].

One small problem is that i’m using task completion events, not task starting events, so i am assuming the first task to get finished is also the first task to get started. This is usually correct in my case but i know it will not apply to others.

I haven’t been able to find a way to get a job’s finish date yet, i’m using job.end.notification.url for that. Hadoop sends a GET to a servlet on finished jobs so i simply get the time the service was called. It may not be accurate but again works for me.

In the light of these difficulties, i am thinking about a simple application that serves easily parseable job information. It would probably be rendered obsolete when 0.22 is out but it might still be useful to be able to consume such info with other languages than Java.

Scalatra result announcer w/ various datastores

In previous post, i talked about an examination result announcer application in scalatra/scalate/scalaquery. I mentioned i would try the app with different datastores in another day and post the results. That day has arrived at last.

Since i’ve talked about the system before I’ll keep things short this time and jump straight to results.

			RPS	p90
In memory:		591	238
Voldemort (0.90):	552	256
Mongo (1.8.2):		548	255
Redis (2.2.12):		523	265
Cassandra (0.8.2):	504	273
HBase (0.90.3):		471	285
MySQL (5.5.15):		453	346

In memory storage means i used a scala.collection.mutable.Map object in my Controller to collect the results. The result above was measured with scala 2.9 parallel collections. Without them, the numbers were slightly smaller.

All 3rd party storage solutions were working on localhost with their default configurations. All of them have been accessed with preferred or well-known drivers. 

I did not try to optimize my code as it is another one of my goals to measure how easy it is to get best performance with little effort.

As seen in the chart, voldemort and mongo are virtually the same in terms of these simple performance measurements. I guess this is because both the storage systems work directly off memory. Since i everything is configured to defaults, voldemort was using in-process BDB-Java, and mongo was using memory-mapped files (i guess). On the other hand, while being an in-memory system too, redis missed their performance with a small amount. I feel some sort of configuration requirement there.

Cassandra and HBase with their BigTable like storage mechanics lag behind others with small margins. I don’t know much about Cassandra but running a pseudo-distributed HDFS and an HBase (and on the same computer as the application and jmeter) is highly discouraged. And, i guess i know enough HBase to say that it is not the perfect fit for this example application. Results simply reflected that.

Since my linux has updated itself several times in last two months, i ran mysql tests in order to keep things fair. You might have noticed its performance is also increased, but not to the point where it may compete with others.

All the code is on github. Switch branches for different stores.

NOTE: Those system updates effected my couchdb too, which upgraded it to 1.1.0. Long story short, my application running on it outperformed everything on the list with 649 requests-per-second and 90% of requests were under 201ms.

NOTE: Another thing i discovered was if i were to use rewrites in couchdb, it damages performance to 550 RPS and 230ms p90. Interesting…