Matthew O'Riordan

Mar 13

How elastic are Amazon Elastic Load Balancers (ELB)? Not very it seems

Update: Since I made this post, Amazon have in fact been in touch and have been extremely helpful looking into this issue and running numerous tests to ensure ELB is performing as it should.  As soon as we have completed running all tests and conclusive findings, I will make another follow up post detailing the results.

Recently I’ve been working on a project that will feasibly need to respond to 10,000+ SSL web socket connection requests per second, and I’ve been racking my brain as to figure out what type of infrastructure I would need to cope with that.  

I did some initial tests and determined that a single CPU instance on EC2 (with 1 EC2 compute power) can cope with a measly 25 SSL handshakes per second.  Based on my 10k target, that means I would need 400 CPU instances on EC2 just to deal with handshakes.  I soon learn that the certificate key size is paramount, and dropping from the normal 2048 bit key down to 1024 bit improves performance 5 fold, so that now means I only need 80 CPU instances.  Still way too many instances, and to be honest, not a piece of architecture I particularly want to look after either as I will need to build a system that can auto-scale on demand.

So I started experimenting with Amazon’s Elastic Load Balancers on the premise that they are elastic. I ran some quick apache bench tests against an ELB server and unsurprisingly got 25 SSL handshakes per second throughput with a 2048 bit key.  So I can deduce that an ELB when it starts out is simply a single CPU instance on EC2.  I then tried to run apache bench for 30 minutes to see how ELB scales, but unfortunately quickly discovered that that type of test is quite contrived, and ELB scales out dynamically using DNS.  So a single load test client would only ever hit one ELB instance, and as such, ELB may not scale as only one IP is hitting it and even if it did, the client would never be able to benefit from that scaling.

So I went ahead and built a load testing farm of daemons that distribute their “attack” on the ELB from loads of different IP addresses (I spun up 40+ micro instances on EC2), and will cleverly keep querying DNS to see when ELB scales and distribute traffic to the new IP addresses that ELB responds to for its public DNS name.

And my findings were interesting, and disappointing.  It seems that ELB does in fact scale, but for some reason I am hitting a hard limit for some reason or another.  Perhaps Amazon have set a hard limit for some reason, perhaps I need to have something enabled on my account, or perhaps ELB is just not as elastic as it’s claimed to be.  I’ll only find out if Amazon get in touch with me to explain ;)

Here is a graph demonstrating the attempted requests per second from my load testing daemons that scaled up request from 50 per second in minute one, up to 6,000 requests per second in the 120th minute.  As you can see, other than a few more small upward blips, ELB pretty much flattens out at around 2,000 requests per second with a 1024 bit key and ELB spread across 3 zones (and thus 3 instances).

If anyone has any suggestions or feedback on how to make ELB scale as it should, please do get in touch.  I’ve posted my findings into the Amazon forums here and here, but I have little hope that Amazon themselves will respond (see below, Amazon did get in touch and have been brilliant!).

In the mean time, it looks like I’ll be finding another solution.

Update: Since I made this post, Amazon have in fact been in touch and have been extremely helpful looking into this issue and running numerous tests to ensure ELB is performing as it should.  As soon as we have completed running all tests and conclusive findings, I will make another follow up post detailing the results.

Feb 22

MaxCDN versus Amazon Cloudfront

Following on from my bad experience with MaxCDN in November, I thought it would be worth a follow up post for those who are interested in how MaxCDN stacks up against Amazon Cloudfront.  All my experience is based on production issues I have had with easyBacklog, a simple backlog management tool designed for agencies.

On a positive note, after the outage in November and my numerous support requests, MaxCDN got in touch, apologised for not responding and resolving the issue in a sensible amount of time, and in fact offered me a credit note for all the money I had spent with them to date.  +1 for them caring and doing something about it.

However, things did not get better from that point forwards.  As a result of the CDN had  going down and taking most of my site with it, I thought I should now add monitoring to the CDN.  I must admit, I never thought one would need to monitor a CDN that is designed to achieve the five 9s, 99.999% uptime, but clearly I was not willing to take the chance again.

So I set up two kinds of monitors:

The reason I took this approach is that whilst it’s great that a CDN may be delivering cached assets efficiently, I also want to be sure that each time I do a deploy, it will load up the new assets efficiently as well.

And this is where it all went wrong with MaxCDN, see the graph I sent to their support team below:

All of those red dots means their CDN server did not respond within 10s.  Yes, 10 whole seconds.  This is a CDN for god’s sake, what is going on here.  I ran simultaneous tests directly to the origin server for the same URL and the response times was never more than 750ms, meaning the CDN can for newly deployed assets simply timeout and never serve the assets.

I spoke with MaxCDN numerous times about this issue, and whilst they were doing their best to help, they never managed to resolve the issue.  The only thing they did at one point was to move me onto a new type of infrastructure that did not support querystring cache invalidation, and in the process they took my site down for another hour.  Thanks MaxCDN.

So here is why I thought I should write a blog post.  Previously I was using Jammit and MaxCDN with Rails 3.0 that uses a querystring after the asset URLs to ensure they are not cached.  Unfortunately Amazon Cloudfront ignores the querystring, so assets are never refreshed.  Hence, I had to use MaxCDN as they do support querystrings for assets.  So I bit the bullet, upgraded to Rails 3.2, removed Jammit and upgraded to the asset pipeline (Sprockets).  All assets are now precompiled with fingerprints in the URL such as /assets/my-image-au32h121wehqweq.png meaning I can use Cloudfront.

So I changed over to Amazon Cloudfront yesterday and here is how it stacks up:

CloudFront is bloody amazing, not a single spike for non cacheable assets above a few hundred milliseconds, and regardless of location it’s working brilliantly and super fast.

So in the debate of MaxCDN vs Amazon Cloudfront here is my take:

MaxCDN

Pros

Cons

Amazon Cloudfront

Pros

Cons

Get in touch if your experience differs from mine.  So far, I am very pleased with Cloudfront.

Nov 28

MaxCDN review - seriously unreliable

* Update 22 Feb 2012: Read my follow on post MaxCDN versus Amazon Cloudfront * 

I know, this post will probably come across as a rant, but it’s not.  I have recently launched my business (Easybacklog - http://easybacklog.com) into public beta and every customer and client I attract is valuable to me.  So when my CDN goes down, and therefore my site is effectively unusable as all stylesheets, Javascript and images have not loaded, I get upset because I might lose customers.

To give you some background, recently I looked around for a CDN to help speed up the loading of my site’s assets.  I looked into numerous options, and found MaxCDN to be my number one choice (second to Amazon CloudFront).  And to be fair, setting up was reasonably painless (other than issues with payment which meant my account was locked), and I was up and running in no time at all.

And then yesterday it all went wrong, and it stayed like that.  At around 11am UK time I found that the site’s asset hosting was no longer working.  Firstly the assets were not being served, and secondly the SSL certificate being issued was the wrong one, meaning there was zero chance of my assets being served.  Mistakes happen, and I can handle that.

What frustrates me, and has motivated me to post this blog post so as to warn others about MaxCDN’s terrible customer service, is that I could not get hold of anyone at MaxCDN.  On their home page, they state 24/7/365 support (screenshotted below):

So as my site was down as this was pretty critical I endeavoured to get in touch with MaxCDN by:

Eventually, 10 hours later, I managed to get hold of someone using their live chat feature (still no response to any of my emails), and that was the icing on the cake.  He told me two things in our live chat:

So I warn you all, if you want a CDN that is reliable, do not even dream of using MaxCDN. I am absolutely flabbergasted that a company touting itself as a CDN would find it acceptable to not fix an issue until the following day.  (They have fixed my issue now, 20 hours later, and without an apology)

So who would I recommend as an alternative?  
Well I’m looking into that now and hope to post an update in here once I have had some experience using the competitors.  The ones I am trying support pay per usage models (i.e. no fixed subscription) and are as follows:

Do tell me if you also experience bad support with MaxCDN so that hopefully others considering MaxCDN will not fall into the same trap that I did.

* Update 22 Feb 2012: Read my follow on post MaxCDN versus Amazon Cloudfront * 

Nov 22

URL regular expression for links with or without the protocol

I’ve just come across a pretty common requirement to convert any text that looks like a link into a link within some HTML text.  Strangely, after searching for a good 15 minutes for a regular expression, all I could find was either a regular expressions which detects URLs with a protocol such as http://mattheworiordan.com/, or a regular expression which detects URLs without such as www.mattheworiordan.com.  Why the hell I could not find one which does both is beyond me, so here I go at posting a solution for anyone else to use.

Here is the holy grail:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/

Here is a nice example of this regular expression in action http://jsbin.com/eqocuh/5/edit#source

Please feel free to modify this JSBin, add examples, and update this regular expression, and I will update within this blog post.

Here is an explanation of the regular expression for those who care

(
 ( # brackets covering match for protocol (optional) and domain
  ([A-Za-z]{3,9}:(?:\/\/)?)   # match protocol, allow in format http:// or mailto:
  (?:[\-;:&=\+\$,\w]+@)?   # allow something@ for email addresses
  [A-Za-z0-9\.\-]+   # anything looking at all like a domain, non-unicode domains
| # or instead of above
(?:www\.|[\-;:&=\+\$,\w]+@) # starting with something@ or www.
[A-Za-z0-9\.\-]+ # anything looking at all like a domain
)
( # brackets covering match for path, query string and anchor
(?:\/[\+~%\/\.\w\-]*) # allow optional /path
?\??(?:[\-\+=&;%@\.\w]*) # allow optional query string starting with ?
#?(?:[\.\!\/\\\w]*) # allow optional anchor #anchor
)? # make URL suffix optional
)

Oct 11

Infinite scrolling tab system for JQuery

Google Docs Spreadsheets has a useful worksheet tab system whereby if there are more tabs than the space available, a scroller device automatically appears.  Whilst building the new sprints functionality for my backlog management tool easyBacklog, I went on a search for a scrolling tab system and couldn’t find one.

So I’ve built a JQuery plugin inspired by the Google Docs Spreadsheet tab system, along with the rounded tabs solution by Chris Coyier.

Check out the plugin at https://github.com/mattheworiordan/jquery.infinite.tabs

Comments / feature requests / bug reports welcome.

Sep 01

Setting up a Jenkins (old Hudson) Continuous Integration Server

Following on from my recent full stack integration work on easyBacklog, the Agile Project Management app I am building, I have set up a Continuous Integration server which will run both unit tests and integration tests across the entire app each time I push a commit.  In order to do this, I needed full support for Selenium and Capybara-Webkit, along with Cucumber and RSpec.

I looked at various CI solutions:

So I opted for Jenkins and went about setting up a CI server on my Mac Mini with OS X Lion.

As I couldn’t find any easy guides on what to do, I loosely documented the steps I followed to get my CI server up and running so that hopefully others trying to do the same thing will have a good starting point.  Here is a simple chronological log of what I did:

Note: I am pulling my working directory from a local Git repository.  If you want to use Github, then install the Github plugin (see below) and figure that part out yourself!

Log into your Mac with your admin account

Install Homebrew: /usr/bin/ruby -e “$(curl -fsSL https://raw.github.com/gist/323731)”

If you have Homebrew installed already, then I would advise you `brew update` to get the latest recipes

Install Qt (needed by Capybara-Webkit): brew install qt —build-from-source
Note, this will take over an hour to build, so make yourself a nice cuppa and wait.
Whilst this is happening, I would go ahead and install Firefox which is needed for the standard Selenium install. 

Install Jenkins using the native installer, http://mirrors.jenkins-ci.org/

Server should be running, if not, reboot the machine as on Mac OS X a LaunchDaemon is installed by Jenkins.

Go to http://localhost:8080, Manage, Manage Plugins, and install Jenkins Git, Hudson Ruby and Jenkins ruby metrics

I would also advise you set up security for your Jenkins server.  Go to Manage Jenkins, Configure System, choose “Jenkins’s own user database” under “Access Control” and the rest is reasonably self-explanatory.

If you want email notifications when builds succeed or fail, then you will either need an SMTP server you can use, or you can configure OS X to run postfix.  See the following articles to get postfix running on Mac OS X:

Set up a user called jenkins in OSX (login jenkins).

Modify the file /Library/LaunchDaemon/org.jenkins-ci.plist, remove the GroupName and daemon value and then change UserName to jenkins

You will now need to change the permissions on a folder Jenkins uses, run the following:
sudo chown -R jenkins:wheel /Users/Shared/Jenkins

Now reboot, and make sure Jenkins is still up and running (http://localhost:8080)

Log into your OS X box with the jenkins account

Install RVM: bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)

Install ruby: rvm install 1.9.2

Ensure 1.9.2 is default: rvm —default install 1.9.2

Reload your console, and type rvm list.  Ruby 1.9.2 should be selected.  If not, your default Ruby 1.9.2 is not working.

Now install the Bundler gem, `gem install bundler`

Now before you set up your first job in Jenkins, I would advise that you make sure the jenkins user account has everything needed to actually run Rake and RSpec.  So to do this, simply go to the console, create a working folder and git clone git://your-git-respository-url.git to get a working version of the files onto the server.  
Run `bundle install && RAILS_ENV=test bundle exec rake db:migrate && RAILS_ENV=test bundle exec rake`.  Your Cucumber tests (and Unit tests if you have some) should run.  If they don’t pass, figure out why not before you get Jenkins to start automatically building for you.

To set up your CI job, go to Jenkins home page (http://localhost:8080), click New Job, and set the following:

#!/bin/bash -eexport RAILS_ENV=test
source “$HOME/.rvm/scripts/rvm”
rvm use 1.9.2
bundle install —deployment
bundle exec rake db:migrate spec

You can then manually start the build by going to the home page, clicking on the title of your build, and clicking Build Now.  You should see the status of the build in the Build History (bottom left), and if you click on the Build in progress, and then on Console Output, you can monitor the build as it happens.

Now that you have SCM polling, each time you push a change to your Git repository, a new build should automatically fire off.  

Do give me feedback if I’ve missed out any steps.  Unfortunately I wrote down a lot of these steps retrospectively.

Aug 24

A library for simulating a drag event with a JQuery UI Sortable widget

 

I have been working on thorough integration testing of my new Agile Project Management tool called easyBacklog, and came across a problem whereby it seems there is no easy way to test / simulate drag and drop events with a JQuery UI Sortable widget.

Whilst jquery.simulate.js provides functionality to simulate many JQuery and JQuery UI event, it is unable to simulate a drag event for a JQuery UI Sortable widget due to the intricate behaviour needed to make the JQuery UI Sortable widget fire the correct events.

So I’ve developed a library to solve this problem: it allows you to simulate Drag events for a JQuery UI Sortable, and supports a number of useful features.

Go to the Github repository jquery.simulate.drag-sortable.js for the code and the tests.

Aug 23

Testing :focus with JQuery and Selenium or Capybara-Webkit

Recently whilst trying to write integration tests for a JQuery heavy front end application, I came across a very strange issue when testing to see if an element has focus.

What I found is that if an element has focus, yet the actual browser or tab does not have focus, then using the pseudo selector :focus from JQuery (or Sizzle) will not find the focussed item.  Putting it another way, the browser and web page must have the focus in the OS for the :focus selector to work.  

A simple way to replicate this issue is to go to http://jsbin.com/abozas, open up your Javascript console, and type the following commands

$('#inpt').focus();
$('#inpt').is(':focus');

As you have just put the focus on #inpt, you would expect it to retain focus and thus pass a JQuery test of .is(‘:focus’);  However, because the Javascript console has the focus, and not the web page, .is(‘:focus’) will always return false.

Now generally this is not an issue as your web page actually has focus, but unfortunately I discovered that in Rails when using Cucumber, Capaybara and Capybara-Webkit and/or Selenium and testing for :focus, all tests will fail because the browser window does not have the focus of the OS at the time it is run.

So firstly I logged this is an issue with Capybara-Webkit, and in the mean time I’ve worked up a workaround solution by poking around in JQuery and Sizzle (the CSS selector engine that powers not only JQuery, but also PrototypeDojoMochiKitTinyMCE).  I discovered that the issue is in fact not with the Sizzle selector logic, but resides with some optimizations that Sizzle uses for performance reasons.  For the function matchesSelection, it tries to first use the native method webKitMatchesSelector() to match a selector, and if this fails, it falls back to the Sizzle selector logic.  The problem is that the webKitMatchesSelector is OS aware and thus removes focus from an element when the browser does not have focus, whereas Sizzle is unaware of the browser’s focus status and thus matches successfully using the following code:

function( elem ) {
  return elem === elem.ownerDocument.activeElement;
}

So to fix this problem, I have written a small Javascript file which you should include when you are running a test or cucumber Rails environment.  This script will then deactivate the native matches selector and query selector for your browser.

The code to fix the :focus issue can be found at https://gist.github.com/1166821 and simply does the following:

/* Prevent use of native find selector */
document.querySelectorAll = false;

/* Prevent use of native matches selector */
document.documentElement.matchesSelector = false;
document.documentElement.mozMatchesSelector = false;
document.documentElement.webkitMatchesSelector = false;
document.documentElement.msMatchesSelector = false;

If you would like to replicate this issue, review this Gist.  An example set of tests where the issue is resolved can be seen in this Gist.

I hope others find this useful.

Aug 19

Cucumber and Capybara-Webkit automatic screenshots


This post has been superseded by a Gem I have written to do automatic screen shots for Capybara (not just Cucumber, also supports RSpec and Mini-test).  Please go to https://github.com/mattheworiordan/capybara-screenshot



Writing Cucumber front end tests (AJAX & Javascript) can be pretty damned painful and slow, especially when a failure does not come with any useful error message.  Fortunately I’ve been using Capybara-Webkit recently, which brings two very useful features for debugging issues:

This code is up on a Gist at Github at so feel free to fork away.

Jul 26

Rate limiting Javascript requests

I recently needed to rate limit requests to a particular function in Javascript, and was surprised nothing obvious came up in Google searches so here is my contribution for others who need this.

Note that I am a big fan of Underscore.js and have thus simply extended the _ to include my new rate limiting function.  Also, many people immediately think that _.throttle will achieve rate limiting, but in fact _.throttle is a destructive rate limiting function in that it will disregard all requests that arrive within the time threshold specified.  If therefore you want to ensure that all calls are still executed, but are never executed more than once ever X milliseconds, then I suggest you use my method.

The code is uploaded to a Gist which you can fork at https://gist.github.com/1084831, and is embedded below.

// Rate limit ensures a function is never called more than every [rate]ms
// Unlike underscore's _.throttle function, function calls are queued so that
//   requests are never lost and simply deferred until some other time
//
// Parameters
// * func - function to rate limit
// * rate - minimum time to wait between function calls
// * async - if async is true, we won't wait (rate) for the function to complete
//           before queueing the next request
// 
// Example 
// function showStatus(i) { 
//   console.log(i); 
// } 
// var showStatusRateLimited = _.rateLimit(showStatus, 200); 
// for (var i = 0; i < 10; i++) { 
//   showStatusRateLimited(i); 
// } 
// 
// Dependencies 
// * underscore.js 
// 
_.rateLimit = function(func, rate, async) {
   var queue = [];
   var timeOutRef = false;
   var currentlyEmptyingQueue = false;
   
   var emptyQueue = function() {
     if (queue.length) {
       currentlyEmptyingQueue = true;
       _.delay(function() {
         if (async) {
           _.defer(function() { queue.shift().call(); });
         } else {
           queue.shift().call();
         }
         emptyQueue();
       }, rate);
     } else {
       currentlyEmptyingQueue = false;
     }
   };
   
   return function() {
     // get arguments into an array
     var args = _.map(arguments, function(e) { return e; });
     // call apply so we can pass in arguments as parameters as opposed to an array
     queue.push( _.bind.apply(this, [func, this].concat(args)) );
     if (!currentlyEmptyingQueue) { emptyQueue(); }
   };
};

You can see a working version at http://jsbin.com/upadif/8/edit#preview