2011-07-17

Tuning resque workers memory usage

One of the problems we had to deal with was that we only have one server with 2GB of memory and we needed to spawn the larger number of workers possible.
To achieve that we must avoid loading the rails environment with the resque worker. Start by changing your Rakefile to look like this:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
unless ENV['RESQUE_WORKER'] == 'true'
  require File.expand_path('../config/application', __FILE__)
  require 'rake'

  MyApp::Application.load_tasks
else
  RAILS_ENV = ENV['RAILS_ENV']
  ROOT_PATH = File.expand_path("..", __FILE__)
  require 'bundler'
  require 'rake'
  require File.join(ROOT_PATH, 'config/initializers/_config')
  ENV['BUNDLE_GEMFILE'] = File.expand_path('Gemfile', File.dirname(__FILE__))
  Bundler.require(:worker)
end


In your Gemfile you should create a group only with the gems that are used in your worker.
Next change the rake task definitions for your rake environment. Here is an example of our lib/tasks/resque.rake :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# -*- coding: UTF-8 -*-
require "resque_scheduler"
require 'resque_scheduler/tasks'
require 'resque/tasks'

namespace :resque do
  task :setup => :anemic_environment
end

task :anemic_environment do
  raise "Please set your RESQUE_WORKER variable to true" unless ENV['RESQUE_WORKER'] == "true"
  $:.unshift File.join(ROOT_PATH, "app/models")
  $:.unshift File.join(ROOT_PATH, "app/consumers")
  $:.unshift File.join(ROOT_PATH, "lib")
  db = YAML.load_file File.join(ROOT_PATH, "config/database.yml")
  puts db[RAILS_ENV]
  ActiveRecord::Base.establish_connection db[RAILS_ENV]
  Dir[
      File.join(ROOT_PATH, "app/consumers/*.rb"),
      File.join(ROOT_PATH, "app/models/*.rb"),
      File.join(ROOT_PATH, "lib/*.rb")
  ].each do |f|
    name = f.split("/").last.gsub(/\.rb$/, "")
    autoload name.camelize, name
  end

end


As you can see in the "anemic_environment" rake task, we are adding our models, workers (we called consumers) and files from the lib directory to the load path. This is only some convenience for me, you don't need to do that. We also start the connection to the database manually, since we aren't loading the Rails environment (Rails really saves us a lot of work :P )
We also used autoload (from activesupport), so we don't have to require our files manually. This approach is also convenient because the worker won't require the file until it is needed (very useful when performance is not an issue, but memory is). If you have performance issues, it's better to require the files  prior to starting the worker (in this "anemic_environment" e.g.).
Doing this we were able to reduce the memory of a single consumer from 90 MB to 54MB (40% less memory).
But we can still tune the worker a little more by not requiring the gems we depend on until we need them. This can be done by adding a ":require => false" in the Gemfile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
group :default do
  gem 'rails', '3.0.9'
  gem 'authlogic'
end

group :default, :worker do
  gem 'mysql2', '~> 0.2.7'
  gem 'attr_encrypted'

  gem 'resque'
  gem 'resque-scheduler'
end

group :worker do
  gem 'activerecord', :require => 'active_record'
  gem 'activesupport', :require => 'active_support/all'

  gem 'actionmailer', :require => false
  gem 'httpclient', :require => false
  gem 'twitter'   , :require => false
end


Finally we were able to reduce the memory usage to 36MB (60% less than the default way). That is a really great improvement and now we can spawn twice the original number of workers in our application.

1 comment:

jowdjbrown said...

Very nice article. I enjoyed reading your post. very nice share. I want to twit this to my followers. Thanks !.
meu calendário para seduzir uma mulher esta aqui