Reviewing Application Health with HAProxy Stats 0
The challenge with hosting of multiple Rails-based Facebook applications is that the amount of users grow quickly. To address this problem we are using EC2 nodes that we can expand/shrink as the demand grows. The price/performance ratio isn’t quite what we first expected, so we are moving toward having a few dedicated boxes instead. Another problem that we add at least a couple of applications a week. On each box that hosts them, we need to reconfigure monit, haproxy, nginx, logrotate and nagios.
To mitigate both issues on dedicated boxes, we resolved to have a central configuration definition in svn with individual box configurations keyed on localhost name. A ruby script regenerates all those aforementioned configuration files from ERB-processed templates when it is run on a box and bounces the services. A sample config looks like:dedicated-1: description: "The dedicated box #1" ip: 64.233.167.99 failover: dedicated-2 apps: bookshelf: port: 5000 instances: 20 response: Book ljconnect: port: 6000 instances: 7 virtual: ljconnect.hungrymachine.com response: Journal
That definition would generate a monit config with 20 instances of the bookshelf application and 7 instances of the ljconnect application plus all other configurations (including nagios health checks expecting the response value) . It is all possible because we adopt a fixed application deployment file structure and port numbering conventions (via offsets) for all servers.
We found that sometimes monit fails to restart all mongrel instances after deployment and some of them end up running with the pid file gone. Since there is no pid, monit believes the instance is not running so it tries to start a new one on the same port and, of course, fails. Which leads to stale mongrel instances with old code. We’re investigating a long term solution but in the meantime have wrapped the mongrel_rails start script with a replacement which finds and kills the stale mongrel instances before starting a new one.
#!/usr/bin/env ruby class MongrelController def self.run_mongrel(args) pid = extract_pid(args) kill_stale_process(pid) if pid system "/bin/mongrel_rails #{ args.join(' ') }" end def self.extract_pid(args) (args[0] == 'start') && (i = args.index('-P')) && args[i + 1] end def self.kill_stale_process(pid) mongrel_processes(pid).each { |p| process_running?(p) && Process.kill(9, p) } end def self.mongrel_processes(pid) `ps axww -o 'pid command'`.split(/\n/).inject([]) do |mongrels, process| mongrels << process[/^\s*(\d+)/][$1].to_i if process.match(%r{/bin/mongrel_rails\s.*\s-P\s#{ pid }\b}) mongrels end end def self.process_running?(pid) pid && (`ps -p #{ pid }`.split(/\n/).size == 2) end end MongrelController.run_mongrel(ARGV)
#!/usr/bin/env ruby require 'rubygems' require 'net/toc' user = 'your_bot_name' password = 'bot_password' msg = ARGV[0].to_s.gsub('\n', "\n") client = Net::TOC.new(user, password) client.connect sleep 3 buddies = [] client.buddy_list.each_group { |g, b| buddies = b if g == 'Friends' } buddies.each do |b| b.send_im(msg) if b.available? end sleep 3 client.disconnect
define command{
command_name notify-service-by-aim
command_line $USER1$/aim_notifier.rb $ARG1$ $ARG2$ "***** Nagios *****\n\nNotification Ty
pe: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState:
$SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$"
}service_notification_commands notify-service-by-email,notify-service-by-aim
Repeat the configuration if you want to use the AIM notification for hosts as well.