Reviewing Application Health with HAProxy Stats 0

Posted by val
on Thursday, March 27
One of the methods we use for checking the health of our applications is stats collected from HAProxy. We utilize it to see how many requests are scheduled for execution on mongrel instances. The graph is one indication of how our applications perform. When we launched the new version of the site three weeks ago, the graph for a single vertical (ReadingSocial) on a typical Tuesday looked like this:
So, between porting all verticals to Myspace, Orkut, Bebo, and enhancing the functionality, we spent some time on optimization. In addition to analyzing slow-query logs with mysqlsla, Aaron wrapped all external API calls (and we do a lot of them - to Amazon, Facebook, Myspace, etc) in slow monitoring so we could see where the latest external bottleneck was so we could fix it one by one. Three weeks later the graph became much more peaceful:

Reconfiguring the whole rails stack via a central YAML file

Posted by val
on Sunday, August 19

The challenge with hosting of multiple Rails-based Facebook applications is that the amount of users grow quickly. To address this problem we are using EC2 nodes that we can expand/shrink as the demand grows. The price/performance ratio isn’t quite what we first expected, so we are moving toward having a few dedicated boxes instead. Another problem that we add at least a couple of applications a week. On each box that hosts them, we need to reconfigure monit, haproxy, nginx, logrotate and nagios.

To mitigate both issues on dedicated boxes, we resolved to have a central configuration definition in svn with individual box configurations keyed on localhost name. A ruby script regenerates all those aforementioned configuration files from ERB-processed templates when it is run on a box and bounces the services. A sample config looks like:
dedicated-1:

    description: "The dedicated box #1"
    ip: 64.233.167.99
    failover: dedicated-2

    apps:

        bookshelf:
            port: 5000
            instances: 20
            response: Book

        ljconnect:
            port: 6000
            instances: 7
            virtual: ljconnect.hungrymachine.com
            response: Journal  
                      

That definition would generate a monit config with 20 instances of the bookshelf application and 7 instances of the ljconnect application plus all other configurations (including nagios health checks expecting the response value) . It is all possible because we adopt a fixed application deployment file structure and port numbering conventions (via offsets) for all servers.

Killing sneaky mongrels

Posted by val
on Thursday, August 16

We found that sometimes monit fails to restart all mongrel instances after deployment and some of them end up running with the pid file gone. Since there is no pid, monit believes the instance is not running so it tries to start a new one on the same port and, of course, fails. Which leads to stale mongrel instances with old code. We’re investigating a long term solution but in the meantime have wrapped the mongrel_rails start script with a replacement which finds and kills the stale mongrel instances before starting a new one.

#!/usr/bin/env ruby

class MongrelController

  def self.run_mongrel(args)
    pid = extract_pid(args)
    kill_stale_process(pid) if pid
    system "/bin/mongrel_rails #{ args.join(' ') }"
  end

  def self.extract_pid(args)
     (args[0] == 'start') && (i = args.index('-P')) && args[i + 1]
  end

  def self.kill_stale_process(pid)
    mongrel_processes(pid).each { |p| process_running?(p) && Process.kill(9, p)  }
  end

  def self.mongrel_processes(pid)
    `ps axww -o 'pid command'`.split(/\n/).inject([]) do |mongrels, process|
      mongrels << process[/^\s*(\d+)/][$1].to_i if process.match(%r{/bin/mongrel_rails\s.*\s-P\s#{ pid }\b})
      mongrels
    end
  end

  def self.process_running?(pid)
    pid && (`ps -p #{ pid }`.split(/\n/).size == 2)
  end

end

MongrelController.run_mongrel(ARGV)

Using a ruby-based AIM notifier in nagios

Posted by val
on Tuesday, August 14
If you use nagios for monitoring of your rails instances, you might want to get notification not only via email or SMS-messages but to your AIM when you are online. The script (libexec/aim_notifier.rb) utilizes the Net::TOC gem for sending out notifications:
#!/usr/bin/env ruby

 require 'rubygems'
 require 'net/toc'

 user = 'your_bot_name'
 password = 'bot_password'

 msg = ARGV[0].to_s.gsub('\n', "\n")

 client = Net::TOC.new(user, password)

 client.connect

 sleep 3

 buddies = []

 client.buddy_list.each_group { |g, b| buddies = b if g == 'Friends' }

 buddies.each do |b|
   b.send_im(msg) if b.available?
 end

 sleep 3

 client.disconnect
You need to add any account you want to be notified to bot’s friends (either by logging to AIM using the bot account or using Net::TOC’s ability to add friends). The last piece is to add a new notifier in etc/objects/commands.cfg as:
 define command{
         command_name    notify-service-by-aim
         command_line    $USER1$/aim_notifier.rb $ARG1$ $ARG2$ "***** Nagios *****\n\nNotification Ty
 pe: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: 
 $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$"
         }
and to append it to the list of notifiers defined for a contact template in etc/objects/commands.cfg:
 service_notification_commands   notify-service-by-email,notify-service-by-aim

Repeat the configuration if you want to use the AIM notification for hosts as well.