1300 889 888


Making ‘census’ of load testing

Now the dust has settled on one of the more recent public outages in the news – Census, we thought we would share some methodologies from our Application Testing Solution that we use elsewhere.

High volumes of traffic is not unique to Census, recently the Canadian Immigration site was inundated with visitors during the recent election, Ticketek and the Adele ticket sale frenzy. There was also the Clickfrenzy fail a few years ago.

This article is not implying that the testing @ Census was incomplete – just a timely opportunity to share our experience and how more and more sites are being subjected to loads that are beyond capacity.

The use case:

ACME Corporation is about to release a new and improved web site. This new site includes lots of new features and functionality and has been designed to handle increased load from a brand new marketing campaign or upcoming significant event. To make it as realistic scenario, the site goes live in 2 days and no performance testing has been done. To add further realism, there is no budget left for testing, so if you can make it as cheap and quick as possible.

This is also not a plug for tools or solutions, use your own if they can generate the required traffic. This is simply a methodology and a risk reduction exercise.

The type of test is what we would call a 1 arm test, where the load testing equipment will interact with the actual web application. Note a 2 arm test is where client and servers are emulated around a network (an example would be a Firewall / IPS scalability test).

The deployment is a 2 tier application, with a web layer and a database layer. We will also assume perimeter protection with a next generation firewall. The firewall itself could have a suite of standalone tests that won’t be covered in this article.

Expected traffic volumes are in the realm of 50,000 concurrent users at anyone time. This is based on current network data and projected load expectations.

This article doesn’t go into the actual design of the application – we’ll leave that for folks that know. But it assumes redundancy in the different layers like web and dB. You might want to compare different web servers such as nginx vs apache vs xyz though and leave everything else consistent – valid test to rerun the below scenario.

On Acme Corporations new website, www.thiswontfail.com. Our test takes into account what a unique visitor would do based on previous metrics supplied by Acme Corporation:

  • login to the site
  • read a blog
  • upload a file
  • download a data-sheet
  • search for content
  • add items to a cart

This mix of actions will exercise the web to db connection as opposed to hitting the home page and being served static content. A connection will stay open for at least a minute. We won’t do a fabricated connections test where we setup and teardown connections and download no data.

One key to successful testing and isolating performance issues is real time monitoring of each device under test (DUT) in the architecture. So in this scenario we would expect resources to be monitoring the Firewalls, Web Servers and Database Tiers along with network traffic statistics.

An example of the metrics for each platform. Subject matter experts you have can add more:

  • Firewall
    • open connection
      • (interesting observation that some firewalls will count 2 connections for everyone unique user – therefore doubling theoretical performance metrics)
    • connections table / state table
      • should always flush after each test as you don’t want previous connections to impact your next test
    •  bandwidth measurements
      • incoming and outgoing bandwidth. at some stage your test will fill the pipe and that is the aim
    • CPU performance
      • this is when you will know it is under load
  • Web Tier (per VM)
    • CPU
    • open or concurrent connections
  • Database Tier
    • Query throughput / execution performance
    • Connections
    • Buffer pool usage

When doing an end to end test, there are many cogs in the wheel that can break so you need to be monitoring as much as possible in real time.

In an ideal scenario, after running each load test you would reboot each element to have consistent results. At least any element that was driven to failure. This also depends on how much time you have available.

Acme is going for a static deployment with 4 web servers and 4 databases all behind load balancers (this same test scenario could be used if web and database layers autoscaled):

Here is our list of fundamental tests:

  • Baseline connectivity test – 1000 users.
    • Nothing expected to break with this test. Just a good sanity test when things all of a sudden stopped working. You’ll often hear “ok – let’s do a baseline test and prove something has been changed”
  • Webserver validation
    • 1 webserver – 4 database servers
      • The objective here is to break the web server under load.
      • What is the maximum number of simultaneous users to break a single web server. Lets call this value $1wsbreak
      • Signs that a webserver is under duress includes increase is response times, increase in http error codes and unsuccessful transactions
    • 2 webservers – 4 database servers
      • Take $1wsbreak and multiply by 2. If you get to this value, it means the performance of your webservers is linear and predictable – which is what you want
    • 4 webserver – 4 database servers
      • take $1wsbreak and multiply by 4. Expected.
    • Reasons that linear scale may not occur could be due to other devices. This could be database related, firewall related, load balancing algorithm needs tweaking
  • database validation
    • 4 webservers – 1 database
      • Ideally here you can break the database under load. Similar to webserver testing. Lets call this break value $1dbbreak
    • 4 webservers – 2 databases
      • Take $1dbbreak and multiply by 2. If you reach this value you are getting linear performance.
    • 4 webservers – 4 databases
      • Take $1dbbreak and multiply by 4. Record results
    • As per webserver testing, you might identify other issues which aren’t database related
  • If you haven’t reached your expected traffic loads and you’ve ruled out Webserver and Database performance, you could be looking at Firewall scale issues, load balancer scale problems, bandwidth restrictions
  • An autoscale tweak to this test would be to drive the individual applications to ‘x%’ where autoscaling occurs. Does this auto scaling handle your expected performance and does it kill off servers when load has dropped off
  • Once the above suite of tests have been performed it is now about running negative tests at this load to understand what happens when:
    • Firewall failover
    • Remove power
    • Simultaneous DDOS attack
    • Emulating different GEO-IP’s would have been done as part of the above test

By completing the above tests you will know the performance of your application deployment as well as any performance bottlenecks.

For future testing, you would take 70% of the maximum metric. This would be used as your regression test load for future changes and validation.

For further information about this test methodology or require information on other testing methodologies, reach out to info@matrium.com.au


Recent Posts