In the last post, we discussed Robert Pirsig’s seminal book Zen and the Art of Motorcycle Maintenance and its relation to application load testing. In short, Pirsig discusses that any experiment (or test run in our case) is only a failure when nothing is learned from the outcome. In this post, we’ll talk about some concrete steps to avoid falling into this trap. We’ve successfully used this approach many times in the past when load testing Facebook apps, and hope that you’ll find it useful.
Before running any tests, you’ll need to set some goals. The first and most important goal is the number of simultaneous users the application must be able to service at any time. We’ll call this SimulGoal. If your application is already deployed, you can get a sense of this number from the current statistics. Otherwise, you need to do a little magic mixed with some marketing projections. Every application is going to be different, and there are going to be loads of variables to determine the right number. This is an interesting topic in itself that we hope to cover sometime in the future, but one very rough rule of thumb that you may use is that for an application with a million Monthly Active Users (MAU) you’ll need to be able to service a couple thousand simultaneously (YMMV!!).
The next goal you’ll need to define is the number of simultaneous users that can be serviced by a single application server, we’ll call this ServerGoal. Coming up with ServerGoal can, and has been be done lots of different ways:
- Bottom up: Figure out the computational power of a single application server. Then, test and determine how much computational power is necessary to process a single transaction. Do some arithmetic, and you’ve got the number.
- Top down: Find out your budget for server equipment, and determine the number of servers you’ll be able to run for this application. Divide SimulGoal by this number to determine ServerGoal. This may not have any basis in reality, but it’s a start.
- Typical: Pull a number out of one of your body’s orifices. Note: this is the method that most people use, so don’t be ashamed to admit it.
Note that ServerGoal is a “soft” goal that can and should change as more is learned about how the application responds under load.
Okay, now that you’ve set some goals, let’s start testing:
The first thing you’ll need to decide for your first test is how many virtual users it should contain. Of course your instinct is going to tell you to try and run SimulGoal users through a full deployment infrastructure. Resist this instinct as it is a very bad idea. Why? Well, let’s look at the situation through a Zen lens. You’re not really testing any hypothesis. When you run the test, it is sure to fail (because the first one always fails). When it does fail, what have you learned? Nothing – the test has ended as a true failure and you’ve wasted your time.
A much better approach is this: Define a “Minimal Viable Server Infrastructure” (MVSI) which is the smallest infrastructure you can run and still service requests. Often this simply means a single application server with an optional database server if necessary. If you typically use a load balancer, throw one into the mix.
Now, run a test against this MVSI with ServerGoal+(a few extra) virtual users. Our hypothesis will be that when the test reaches ServerGoal users, the application infrastructure will begin to fail. When the test completes, you’ll definitely have learned something, namely ServerReal which is the actual number of simultaneous users a single server can support. What to do next will be determined by the state you’re in:
ServerReal < ServerGoal
This is probably where you’re going to be. You’ll need to rethink how you came up with ServerGoal (e.g. did you use the orifice method?) and perhaps modify it. Maybe ServerReal really is the limit that your application can handle. If you’re okay with that, set ServerGoal = ServerReal and continue onto Test #2.
However, if you’re thinking “Something’s wrong – this server must be able to handle more users than that”, then you’re in for some work. Look through the server logs and try to find some evidence of something the application is doing that isn’t efficient. Look through the code and see if you can find it there. Look at how the servers that run the application are configured, and see if they can be tweaked to get better performance. In short. come up with some idea for how to squeeze more performance out of the system.
Once you do all of that, you now have a hypothesis for how to improve performance. Test this hypothesis by re-running Test #1 and see if anything gets better. Continue this iteration cycle until you get to a value of ServerGoal that just can’t get any better.
ServerReal > ServerGoal
Wow, you’re in really good shape. First, go talk to your developers and thank them for writing such a great application. Next, reset ServerGoal = ServerReal, and if you’re satisfied with that, continue to Test #2, otherwise iterate and try to get even better.
In the next post, we’ll talk about what happens in Test #2 where we’ll determine just how scalable the application is.