GPUs, Hadoop and Testing Scalability

As i told numerous times before, i am currently trying to get some GPU powered image processing application to run on Hadoop. In development phase we were using a cluster of 12 machines with one Nvidia GTX 480s each, but since we are launching in a few months, we had to do some tests on our production cluster of 25 machines with two Nvidia Tesla M2050s each. In this post, i’ll try to sum up the process of testing, technical details will come later.

First some reminders about our architecture. Image processing application (IPA) receives an array of images and returns an array of results of doubles. A reduceless MapReduce application divides the images in HBase into chunks, and passes those chunks to IPA. Simply put, while it’s improbable for a single IPA to process thousands of images at once, whole system is able to process millions of images in parallel.

What matters (on our end) is the number of images IPA received and how much time did it take to return a resultset. Using those, we calculate a basic metric: speed in number of images processed per second (ipps). We also calculate the same speed for whole cluster, to see if we can reach a speed like nx ipps when our IPA runs at x ipps and cluster runs n IPAs in parallel (spoilers … we can!).

To show this in numbers, we measured base IPA speed on GTX 480. While the CPU on the system also effects it a bit, its runs at 19.46 ipps on average. On the other hand, our cluster with 12 GTX 480s runs at a total speed of 231 ipps which is extremely close to 12 x 19.46 = 233.52 ipps! Looking at this numbers we assumed our system scales linearly so when we increase the number of GPUs to, say 24 we’ll have 231 x 2 = 462 ipps.

With this assumption in mind, we measured base IPA on Tesla M2050, which is 14.80 ipps (yes, Tesla M2050 is about 24% slower than GTX 480) and expected to have a speed like 14.80 x 50 = 740 ipps on our production cluster with 50 Teslas. Our first results with 518 ipps was nowhere near that. We started investigating…

After some lousy ideas putting the blame on IPA folks and node configurations, we took a step back and started questioning our ways of testing. We knew there were IO and Hadoop task management overheads but they were omissible … for jobs containing large amounts of images. We missed that the definition of large would differ amongst clusters such as one with 12 GPUs and another with 50 (!). We were testing both using 100.000 images and it could’ve been a small number for the latter. We slowly increased the number of images to one million and…

we got close enough to expected speed of 740 ipps with 709 ipps. MapReduce jobs in our system will process millions of images in production which means the cluster will be fully utilized. If there were only a hundred thousand images a large portion of the investment would have been wasted.

Lesson learned in scalability: You have to cut your coat according to your cloth. Or you shouldn’t buy more cloth than what would be necessary to cut your coat. Or … Whatevs, you got the point.

Lesson learned in testing: Always test your systems, then test the hell out of them and when those don’t satisfy you change your tests and test again. It might cost some time but it will save money.