Matthew O'Riordan

This is where I record my rants, comment, quotes and thoughts on things. I welcome your input, please fire away. Find out more about me at: http://mattheworiordan.com

How elastic are Amazon Elastic Load Balancers (ELB)? Not very it seems

Update: Since I made this post, Amazon have in fact been in touch and have been extremely helpful looking into this issue and running numerous tests to ensure ELB is performing as it should.  In short, many of the tests in this post are wrong.

Please go to Part 2: How elastic are Amazon Elastic Load Balancers (ELB)? Very.

Recently I’ve been working on a project that will feasibly need to respond to 10,000+ SSL web socket connection requests per second, and I’ve been racking my brain as to figure out what type of infrastructure I would need to cope with that.  

I did some initial tests and determined that a single CPU instance on EC2 (with 1 EC2 compute power) can cope with a measly 25 SSL handshakes per second.  Based on my 10k target, that means I would need 400 CPU instances on EC2 just to deal with handshakes.  I soon learn that the certificate key size is paramount, and dropping from the normal 2048 bit key down to 1024 bit improves performance 5 fold, so that now means I only need 80 CPU instances.  Still way too many instances, and to be honest, not a piece of architecture I particularly want to look after either as I will need to build a system that can auto-scale on demand.

So I started experimenting with Amazon’s Elastic Load Balancers on the premise that they are elastic. I ran some quick apache bench tests against an ELB server and unsurprisingly got 25 SSL handshakes per second throughput with a 2048 bit key.  So I can deduce that an ELB when it starts out is simply a single CPU instance on EC2.  I then tried to run apache bench for 30 minutes to see how ELB scales, but unfortunately quickly discovered that that type of test is quite contrived, and ELB scales out dynamically using DNS.  So a single load test client would only ever hit one ELB instance, and as such, ELB may not scale as only one IP is hitting it and even if it did, the client would never be able to benefit from that scaling.

So I went ahead and built a load testing farm of daemons that distribute their “attack” on the ELB from loads of different IP addresses (I spun up 40+ micro instances on EC2), and will cleverly keep querying DNS to see when ELB scales and distribute traffic to the new IP addresses that ELB responds to for its public DNS name.

And my findings were interesting, and disappointing.  It seems that ELB does in fact scale, but for some reason I am hitting a hard limit for some reason or another.  Perhaps Amazon have set a hard limit for some reason, perhaps I need to have something enabled on my account, or perhaps ELB is just not as elastic as it’s claimed to be.  I’ll only find out if Amazon get in touch with me to explain ;)

Here is a graph demonstrating the attempted requests per second from my load testing daemons that scaled up request from 50 per second in minute one, up to 6,000 requests per second in the 120th minute.  As you can see, other than a few more small upward blips, ELB pretty much flattens out at around 2,000 requests per second with a 1024 bit key and ELB spread across 3 zones (and thus 3 instances).

If anyone has any suggestions or feedback on how to make ELB scale as it should, please do get in touch.  I’ve posted my findings into the Amazon forums here and here, but I have little hope that Amazon themselves will respond (see below, Amazon did get in touch and have been brilliant!).

In the mean time, it looks like I’ll be finding another solution.

Update: Since I made this post, Amazon have in fact been in touch and have been extremely helpful looking into this issue and running numerous tests to ensure ELB is performing as it should.  In short, many of the tests in this post are wrong.

Please go to Part 2: How elastic are Amazon Elastic Load Balancers (ELB)? Very.