Matthew O'Riordan

This is where I record my rants, comment, quotes and thoughts on things. I welcome your input, please fire away. Find out more about me at: http://mattheworiordan.com

Capybara Screenshot 1.0.0

3 years ago, being a newbie to Ruby and Rails, I found the acceptance test workflow tedious, specifically in regards to trying to debugging failing full stack browser tests.  Inspired by a gem by Joel Chippindale, I created Capybara Screenshot with the aim of providing automatic screenshots for any testing framework using Capybara.

Today, I am proud to say the Capybara Screenshot gem has finally reached a stable version release of 1.0.0, and supports RSpec, Cucumber, Spinach, MiniTest and TestUnit.  And impressively, according to RubyGems, it has now had more than 550,000 downloads to date, equivalent to roughly 1% Rails’ downloads.  

If those numbers are at all accurate, and this gem is helping even 1% of the Rails community, then I am a happy man for contributing something back.

View the changelog for the 1.0.0 release.

Testing the locals, why is this not possible in RSpec?

I’ve been an advocate of using locals over instances variables within Rails controllers for some time because it’s more predictable and explicit. The number of times you get a surprise because an instance variable is not set, or worse, you have no idea how the instance variable got set because it’s in a mixin or inherited controller. Read the various articles on the matter….

So I’ve come up with a solution in the form of a Gist. It’s not particularly elegant and should really not be monkey patching the Rails controller, but I’ve not got time for a PR on the RSpec Rails repo. For now though if anyone else wants to stop using instance variables but keeps hitting a wall in that you cannot as far as I can tell test locals within RSpec controller test, here’s a nice solution.

Example Controller test using assigns_local:


describe FooController do
  describe :index do
    it 'should render locals' do
      expect(response).to render_template('index')
      expect(assigns_local(:foos)).to eq(Foo.all)
    end
  end
end

Code needed to make this work:


config.before(:each, type: :controller) do
  patch_controller_render_to_support_locals controller
end

def assigns_local(key)
  controller.instance_variable_get('@patched_locals')[key]
end
 
def patch_controller_render_to_support_locals(controller)
  def controller.render(options = nil, extra_options = {}, &block)
    [options, extra_options].select { |o| o.kind_of?(Hash) }.each do |option_hash|
      if option_hash.include?(:locals)
        @patched_locals = option_hash[:locals]
      end
    end
    super(options, extra_options, &block)
  end
end

For the full implementation details, see my assigns_local example gist.

Alfred plugin for Remember the Milk

I have found a few Remember the Milk plugins to date, but I have had very little luck with them because they either did not work, or they did not provide any feedback when a task was added.

All I want from a RTM plugin for the brilliant Alfred is to quickly add an RTM task by press alt-Space, typing ‘rtm my task text here’.  This plugin delivers that - an extremely quick way to add a task when you want to with minimal key strokes.

Dependencies

  • You must have Ruby installed (this is installed by default with OS X), but you can also use rbenv or rvm
  • You must install the rumember gem and authenticate with RTM:
    Step 1 install gem:  `gem install rumember`
    Step 2 authenticate with rtm: `ru`

Note: This workflow supports system installed Ruby, RVM installed ruby or RbEnv installed ruby.  However, please make sure that whichever Ruby you are running has the gem `ru` installed and authenticated.

Download the plugin

Running Sidekiq concurrently on a single worker with Heroku

Just finished reading the article Concurrency on Heroku Cedar which describes a simple method to run a Resque queue concurrently with your web process on Heroku i.e. one (free) dyno will run both your website and your Resque queue.  Whilst I would not recommend this approach in production as the dyno can be shut off at any point, and you’ll probably have a battle for resources, this is a fantastic solution for your staging or testing environments where it’s a real pain to have to pay for an extra dyno just to run your queue for testing.

As we use Sidekiq, I thought I’d just quickly share the code I use to run Sidekiq concurrently with Unicorn on a single dyno on the Heroku Cedar stack. Note this applies only to our staging envrionment, but it’s pretty self-explanatory how do add support for additional environments below.

Insert the following into your /config/unicorn.rb file:

run_sidekiq_in_this_thread = %w(staging).include?(ENV['RAILS_ENV'])
worker_processes (run_sidekiq_in_this_thread ? 2 : 3)
# whatever you had in your unicorn.rb file
@sidekiq_pid = nil

before_fork do |server, worker|
  # ... whatever you had in your before_fork ...
  if run_sidekiq_in_this_thread
    @resque_pid ||= spawn("bundle exec sidekiq -c 2")
    Rails.logger.info('Spawned sidekiq #{@request_pid}')
  end
end

Note that I have set concurrency of sidekiq to no more than 2, and set Unicorn worker_processes to 2 when running sidekiq giving no more than 4 concurrent processes running.  Adding more unicorn workers will probably result in memory failures from experience, and adding more sidekiq processes will probably work, however you may run into ActiveRecord pooling issues unless you have a large connection pool.  See this StackOverFlow response on how to adjust your connection pool size with Heroku

Automatically retrying failing non-deterministic Cucumber feature tests

With my Cucumber test suite unfortunately now taking around 45 minutes to run all the tests for easyBacklog, I find it extremely annoying to find that sometimes one test out of the hundreds will fail intermittently for no apparent reason.  When I rerun the test on the CI server it passes, and if I run it again locally it passes, so I can only put that down to a non-deterministic fault due to the server resources being overloaded momentarily.

So I have a solution now where all the tests are run, and any failing tests are then run again which has pretty much eliminated the problems I have had to date with the CI server.

I looked at a solution proposed by Edwin Cruz, but like many others, had problems getting it to work.

Here is my very simple version instead.

Modify your cucumber.yml file to contain the following:

<%
rerun_tests = File.file?('tmp/cucumber-rerun.txt') ? IO.read('tmp/cucumber-rerun.txt') : ""
rerun_opts = if rerun_tests.to_s.strip.empty?
  "--tags @none_so_will_pass --strict"
else
  "--format #{ENV['CUCUMBER_FORMAT'] || 'pretty'} --strict #{rerun_tests}"
end
first_try = "--format rerun --out tmp/cucumber-rerun.txt"
%>
first_try: <%= first_try %> features
second_try: <%= rerun_opts %>

And on your CI server run the following commands

cucumber --profile first_try
cucumber --profile second_try

The first test’s return value will be ignored, and the second test’s return value will indicate whether the tests passed or not, and output only the failing tests. Simple!

And another advantage to this approach is that the logs from your CI server will now only contain (pretty) output from your failed tests, and all passing tests will be omitted.

JQuery :focus pseudo-selector fails with Selenium and Capybara-webkit or any headless browser

I blogged about this issue a good while ago in my post Testing Focus with JQuery and Selenium or Capybara-Webkit, and detailed a solution to the problem whereby using the :focus selector in integration environments generally does not work as :focus relies on the browser itself to have focus as well as the element.  Obviously, when spinning up headless browsers or Selenium, the browser does not have focus so all :focus tests fail.

Well unfortunately JQuery 1.8.* broke my previous hack, so I have found a new solution which I believe is more robust anyway as it digs into Sizzle and adds a custom selector.

The code and tests to replicate the issue can be found in JQuery Focus Selenium Webkit Fix repository.

Happy days, tests are working again with JQuery 1.8!

Part 2: How elastic are Amazon Elastic Load Balancers (ELB)? Very!

For those of you who have not read part 1 of this post, I recommend you take a look at the original ELB post where I briefly described my experience testing ELB to see how elastic it truly is before we roll it out into production.  We are building a highly concurrent socket & websocket based system, and we expect upwards of 10k SSL (SSL is the important bit) new connection requests per second.  We thought about building our own HAProxy + SSL termination solution, but to be honest the work in building an auto-scaling, resilient, multi-region load balancer is no mean feat, and something we’d prefer to avoid.

In order to understand the significance of the expected volume of 10k SSL requests per second, a single Amazon EC2 small instance can terminate approximately 25 2048bit SSL connections per second.  If we were to terminate these connections ourselves using small instances, then we’d need at least 400 small instances running simultaneously with an even spread of traffic through a DNS load balancer to achieve this.  Personally, I don’t like the idea of managing 400 CPUs (equivalent of a small instance) at each of our locations.

Hence we turned to ELB and started running our tests to see if it would scale as needed and deliver.  One thing to bear in mind with ELB is that it does not officially support WebSockets, so in order to use ELB you have to configure it to balance using TCP sockets on port 80 and 443, which works fine except that the source IP is not available.  I am told Amazon are working on a solution, but at present none exists, and it’s not such a big problem for us anyway.

With the help of Spencer at Amazon, who was incredibly helpful throughout the entire process (which lasted more than 2 weeks), we managed to get to a point where we knew the tests were accurate, any bottle necks that we found were indeed ELB and not caused by other factors, and we could replicate our tests easily.  Spencer set up CloudFormation templates (awesome stuff) so that we could spin up the servers and the load testing clients in the same configuration each time, and I worked on building a simple WebSocket server, a WebSocket testing client, and a controller app (based on Bees With Machine Guns) so that we could spin up any number of load testing clients we needed, and start a progressive and realistic attack on the load balancers.  On Github you can find my core repositories for the WebSocket load testing client and server, and the modified Bees With Machine Guns that does not rely on SSH connections as I found these to be unreliable when you spin up more than 10 load testing machines (we went up to 80 c1.medium).

In summary, the results were very encouraging, but certainly not perfect.  In my original post I alleged that ELB was flat lining at around 2k requests per second, unfortunately this was wrong, and was caused by me due to the way the testing clients were designed.  Once I made the controller more reliable, made the load testing client not wait for connections to close, and replicated more realistic traffic growth, then when I ran the tests myself in my EC2 cloud I easily got to 10k requests per second, and so did Spencer (Amazon) with his tests using CloudFormation and the libraries described above.  What we found is that ELB does seem to have unlimited scaling capabilities, however it doesn’t necessarily scale as fast as you want it to all the time.  If you look at the graphs below, you will see plenty of dips where we would expect ELB to scale up horizontally to cope with more demand, yet it doesn’t until some time passess.   Eventually ELB does catch up, but it’s worth bearing in mind that if you get spikes in traffic, latency will increase and potentially some connections may drop as ELB plays catch up.

My tests using ELB and EC2 showing expected versus actual in terms of new SSL connections per second

Amazon’s (Spencer) test on ELB where we expect a smooth bicubic curve

So to summarise:

  • ELB is pretty damn awesome and scales incredibly (pretty much limitlessly it seems)
  • ELB has its problems and is not perfect.  It’s designed to scale as necessary, but your traffic growth may exceed ELB’s natural scale-out algorithms meaning you will experience latency or dropped connections for short periods.
  • We’re going to be using ELB as we believe it’s the best solution out there by a long stretch (we considered hardware load balancing & SSL termination, we considered Rackspace, GoGrid, and a bunch of others).
  • Amazon have been amazingly helpful throughout this exercise, way beyond what I could have expected.

How elastic are Amazon Elastic Load Balancers (ELB)? Not very it seems

Update: Since I made this post, Amazon have in fact been in touch and have been extremely helpful looking into this issue and running numerous tests to ensure ELB is performing as it should.  In short, many of the tests in this post are wrong.

Please go to Part 2: How elastic are Amazon Elastic Load Balancers (ELB)? Very.

Recently I’ve been working on a project that will feasibly need to respond to 10,000+ SSL web socket connection requests per second, and I’ve been racking my brain as to figure out what type of infrastructure I would need to cope with that.  

I did some initial tests and determined that a single CPU instance on EC2 (with 1 EC2 compute power) can cope with a measly 25 SSL handshakes per second.  Based on my 10k target, that means I would need 400 CPU instances on EC2 just to deal with handshakes.  I soon learn that the certificate key size is paramount, and dropping from the normal 2048 bit key down to 1024 bit improves performance 5 fold, so that now means I only need 80 CPU instances.  Still way too many instances, and to be honest, not a piece of architecture I particularly want to look after either as I will need to build a system that can auto-scale on demand.

So I started experimenting with Amazon’s Elastic Load Balancers on the premise that they are elastic. I ran some quick apache bench tests against an ELB server and unsurprisingly got 25 SSL handshakes per second throughput with a 2048 bit key.  So I can deduce that an ELB when it starts out is simply a single CPU instance on EC2.  I then tried to run apache bench for 30 minutes to see how ELB scales, but unfortunately quickly discovered that that type of test is quite contrived, and ELB scales out dynamically using DNS.  So a single load test client would only ever hit one ELB instance, and as such, ELB may not scale as only one IP is hitting it and even if it did, the client would never be able to benefit from that scaling.

So I went ahead and built a load testing farm of daemons that distribute their “attack” on the ELB from loads of different IP addresses (I spun up 40+ micro instances on EC2), and will cleverly keep querying DNS to see when ELB scales and distribute traffic to the new IP addresses that ELB responds to for its public DNS name.

And my findings were interesting, and disappointing.  It seems that ELB does in fact scale, but for some reason I am hitting a hard limit for some reason or another.  Perhaps Amazon have set a hard limit for some reason, perhaps I need to have something enabled on my account, or perhaps ELB is just not as elastic as it’s claimed to be.  I’ll only find out if Amazon get in touch with me to explain ;)

Here is a graph demonstrating the attempted requests per second from my load testing daemons that scaled up request from 50 per second in minute one, up to 6,000 requests per second in the 120th minute.  As you can see, other than a few more small upward blips, ELB pretty much flattens out at around 2,000 requests per second with a 1024 bit key and ELB spread across 3 zones (and thus 3 instances).

If anyone has any suggestions or feedback on how to make ELB scale as it should, please do get in touch.  I’ve posted my findings into the Amazon forums here and here, but I have little hope that Amazon themselves will respond (see below, Amazon did get in touch and have been brilliant!).

In the mean time, it looks like I’ll be finding another solution.

Update: Since I made this post, Amazon have in fact been in touch and have been extremely helpful looking into this issue and running numerous tests to ensure ELB is performing as it should.  In short, many of the tests in this post are wrong.

Please go to Part 2: How elastic are Amazon Elastic Load Balancers (ELB)? Very.

MaxCDN versus Amazon Cloudfront

Following on from my bad experience with MaxCDN in November, I thought it would be worth a follow up post for those who are interested in how MaxCDN stacks up against Amazon Cloudfront.  All my experience is based on production issues I have had with easyBacklog, a simple backlog management tool designed for agencies.

On a positive note, after the outage in November and my numerous support requests, MaxCDN got in touch, apologised for not responding and resolving the issue in a sensible amount of time, and in fact offered me a credit note for all the money I had spent with them to date.  +1 for them caring and doing something about it.

However, things did not get better from that point forwards.  As a result of the CDN had  going down and taking most of my site with it, I thought I should now add monitoring to the CDN.  I must admit, I never thought one would need to monitor a CDN that is designed to achieve the five 9s, 99.999% uptime, but clearly I was not willing to take the chance again.

So I set up two kinds of monitors:

  • Monitors from 3 global locations to cacheable assets i.e. the CDN should not have to contact my server at all and should be serving these assets from the cache.
  • Monitors from 3 global locations to non-cacheable assets i.e. the CDN has to reload this asset for every HTTP request.

The reason I took this approach is that whilst it’s great that a CDN may be delivering cached assets efficiently, I also want to be sure that each time I do a deploy, it will load up the new assets efficiently as well.

And this is where it all went wrong with MaxCDN, see the graph I sent to their support team below:

All of those red dots means their CDN server did not respond within 10s.  Yes, 10 whole seconds.  This is a CDN for god’s sake, what is going on here.  I ran simultaneous tests directly to the origin server for the same URL and the response times was never more than 750ms, meaning the CDN can for newly deployed assets simply timeout and never serve the assets.

I spoke with MaxCDN numerous times about this issue, and whilst they were doing their best to help, they never managed to resolve the issue.  The only thing they did at one point was to move me onto a new type of infrastructure that did not support querystring cache invalidation, and in the process they took my site down for another hour.  Thanks MaxCDN.

So here is why I thought I should write a blog post.  Previously I was using Jammit and MaxCDN with Rails 3.0 that uses a querystring after the asset URLs to ensure they are not cached.  Unfortunately Amazon Cloudfront ignores the querystring, so assets are never refreshed.  Hence, I had to use MaxCDN as they do support querystrings for assets.  So I bit the bullet, upgraded to Rails 3.2, removed Jammit and upgraded to the asset pipeline (Sprockets).  All assets are now precompiled with fingerprints in the URL such as /assets/my-image-au32h121wehqweq.png meaning I can use Cloudfront.

So I changed over to Amazon Cloudfront yesterday and here is how it stacks up:

CloudFront is bloody amazing, not a single spike for non cacheable assets above a few hundred milliseconds, and regardless of location it’s working brilliantly and super fast.

So in the debate of MaxCDN vs Amazon Cloudfront here is my take:

MaxCDN

Pros

  • Simple to set up and priced right
  • Allows the use of custom aliases such as easybacklog.netdna-ssl.com
  • Supports SSL
  • Feature rich, supports querystring invalidation amongst other things
  • Nice admin interfaces

Cons

  • Unreliable, I have experienced a number of outages
  • Support is not great when needed
  • Performance is sub-par

Amazon Cloudfront

Pros

  • Simple to set up and priced right
  • Supports SSL
  • Unbelievably reliable
  • Performance is incredible

Cons

Get in touch if your experience differs from mine.  So far, I am very pleased with Cloudfront.

MaxCDN review - seriously unreliable

* Update 22 Feb 2012: Read my follow on post MaxCDN versus Amazon Cloudfront * 

I know, this post will probably come across as a rant, but it’s not.  I have recently launched my business (Easybacklog - http://easybacklog.com) into public beta and every customer and client I attract is valuable to me.  So when my CDN goes down, and therefore my site is effectively unusable as all stylesheets, Javascript and images have not loaded, I get upset because I might lose customers.

To give you some background, recently I looked around for a CDN to help speed up the loading of my site’s assets.  I looked into numerous options, and found MaxCDN to be my number one choice (second to Amazon CloudFront).  And to be fair, setting up was reasonably painless (other than issues with payment which meant my account was locked), and I was up and running in no time at all.

And then yesterday it all went wrong, and it stayed like that.  At around 11am UK time I found that the site’s asset hosting was no longer working.  Firstly the assets were not being served, and secondly the SSL certificate being issued was the wrong one, meaning there was zero chance of my assets being served.  Mistakes happen, and I can handle that.

What frustrates me, and has motivated me to post this blog post so as to warn others about MaxCDN’s terrible customer service, is that I could not get hold of anyone at MaxCDN.  On their home page, they state 24/7/365 support (screenshotted below):

So as my site was down as this was pretty critical I endeavoured to get in touch with MaxCDN by:

  • Calling them (no answer, got a voicemail which disconnected and wouldn’t let me leave a message)
  • Emailed them (no response)
  • Tweeted them (no response)
  • Used their contact form (no response)
  • Used their live chat feature (no response)

Eventually, 10 hours later, I managed to get hold of someone using their live chat feature (still no response to any of my emails), and that was the icing on the cake.  He told me two things in our live chat:

  • Sorry, but it’s the weekend, so we don’t have any engineers available.  Someone will get back to you on Monday.
  • Sorry, but MaxCDN is lower priority than NetNDA (I assume their parent / provider), so we’ll get to your issue when we can.

So I warn you all, if you want a CDN that is reliable, do not even dream of using MaxCDN. I am absolutely flabbergasted that a company touting itself as a CDN would find it acceptable to not fix an issue until the following day.  (They have fixed my issue now, 20 hours later, and without an apology)

So who would I recommend as an alternative?  
Well I’m looking into that now and hope to post an update in here once I have had some experience using the competitors.  The ones I am trying support pay per usage models (i.e. no fixed subscription) and are as follows:

Do tell me if you also experience bad support with MaxCDN so that hopefully others considering MaxCDN will not fall into the same trap that I did.

* Update 22 Feb 2012: Read my follow on post MaxCDN versus Amazon Cloudfront *