Agressive Timeouts On External API Calls

Posted by val
on Sunday, March 30

One of the challenges with writing a Facebook or Bebo application is staying within a limit it gives you to respond with data before it shows the Application Did Not Respond page to a user. Having a content reach application calling external APIs, like Amazon or YouTube, with response times beyond your control, forces you to keep such calls short to allow extra time for processing. We usually wrap them in aggressive timeouts with a retry. As an example is this code from the Ruby Amazon E-Commerce REST Service API gem rewritten to limit a single call attempt to two seconds with one more retry.

Original Code
module Amazon  
  class Ecs

    def self.send_request(opts)
      request_url = prepare_url(opts)

      res = Net::HTTP.get_response(URI::parse(request_url))
      unless res.kind_of? Net::HTTPSuccess
        raise Amazon::RequestError, "HTTP Response: #{res.code} #{res.message}"
      end
      Response.new(res.body)
    end

  end
end
Modified Code
module Amazon  
  class Ecs

    class EmptyResponse
      def items; []; end
      def total_pages; 0; end
    end

    def self.send_request(opts)

      res = timed_try(request_url, 2) do |url|

        uri = URI::parse(url)
        req = Net::HTTP.new(uri.host, uri.port)

        # Agressive timeouts
        req.open_timeout = 1
        req.read_timeout = 2

        req.start { |http| http.request_get(url) }

      end

      res.kind_of?(Net::HTTPSuccess) ? Response.new(res.body) : EmptyResponse.new

    end

private

     def timed_try(url, attempts, &block)

       attempt = 1
       begin
         block.call(url)
       rescue Timeout::Error
         if attempt >= attempts
           RAILS_DEFAULT_LOGGER.warn "[amazon_api] gave up after attempt ##{ attempt } to get data from #{ url }"
           nil
         else
           RAILS_DEFAULT_LOGGER.warn "[amazon_api] attempt ##{ attempt } timed out on getting data from #{ url }"
           attempt += 1
           retry
         end
       end

     end

  end
end

Reviewing Application Health with HAProxy Stats

Posted by val
on Thursday, March 27
One of the methods we use for checking the health of our applications is stats collected from HAProxy. We utilize it to see how many requests are scheduled for execution on mongrel instances. The graph is one indication of how our applications perform. When we launched the new version of the site three weeks ago, the graph for a single vertical (ReadingSocial) on a typical Tuesday looked like this:
So, between porting all verticals to Myspace, Orkut, Bebo, and enhancing the functionality, we spent some time on optimization. In addition to analyzing slow-query logs with mysqlsla, Aaron wrapped all external API calls (and we do a lot of them - to Amazon, Facebook, Myspace, etc) in slow monitoring so we could see where the latest external bottleneck was so we could fix it one by one. Three weeks later the graph became much more peaceful: