Partially bypass ActiveRecord instantiation when using Memcached

With the recent scaling woes and Blaine Cook's Scaling Twitter talk, I think slide 23 is a welcome, and bold, statement.

Memcahed is generally used as an intermediate cache to lessen database load, and especially more crucial when using ORM ( eg. ActiveRecord ) as avoiding/bypassing object instantiation will give you additional mileage in the App tier as well.

The context:

Three example models: Account, Customer and Booking ( some omitted for simplicity ).

class Account < ActiveRecord::Base
  acts_as_cached :version => 1, :include => [:omitted], :ttl => 2.hours

  has_many :customers
  has_many :bookings
  has_many :contacts
  has_many :addresses
end

class Customer < ActiveRecord::Base
  acts_as_cached :version => 1, :include => [:address, :contact, :bookings, :account], :ttl => 2.hours 

  belongs_to :account
  has_one :address, :as => :addressable, :dependent => :destroy  
  has_one :contact, :as => :contactable, :dependent => :destroy
end

class Booking < ActiveRecord::Base
  acts_as_cached :version => 1, :include => [:omitted], :ttl => 2.hours[:date, :progress, :user, :customer, :events, :booking_extras, :booking_products, :notes, :payment, :account, :coupon], :ttl => 2.hours

  belongs_to :customer
end

The problem

We are caching the Customer model, with its direct and most often used associations (Contact, Address and Bookings) and would like to maintain this integrity, without instantiating additional AR objects and related associations when referencing all Customers or Bookings for a given Account:

  account.customers.to_a #or account.customers(true)
  account.bookings.to_a #or account.bookings(true)

However, Rails do have an association proxy reader method:

   account.customer_ids # => [1]
  account.booking_ids # => [6, 1, 5, 2, 3]

... which isn't very helpful if we'd like a specifc subset of customer or booking identifiers.

The solution

Extend ActiveRecord::Base with a find_ids singleton method, with the exact same usage as AR::Base#find, but never instantiates any objects.We only fetch the ID's from the raw connection:

module ActiveRecord 
  class Base  

   class << self
    def find_ids(*args)
      options = extract_options_from_args!(args)
      logger.debug("Find by ID:" + options.inspect)
      validate_find_options(options)      

      case args.first
        when :first then find_initial_id(options)
        when :all   then find_every_id(options)
      end
    end  

    def find_by_sql_ids(sql)
      connection.select_all(sanitize_sql(sql), "#{name} Load").collect! { |record| record['id'] }
    end    

    private

    def find_initial_id(options)
      options.update(:limit => 1) unless options[:include]
      find_every_id(options).first
    end

    def find_every_id(options)
      records = scoped?(:find, :include) || options[:include] ?
        find_with_associations_ids(options) : 
        find_by_sql_ids(construct_finder_sql(options))
      records
    end
    end        
  end  
end

module ActiveRecord
  module Associations
    module ClassMethods
      def find_with_associations_ids(options = {})
        catch :invalid_query do
          join_dependency = JoinDependency.new(self, merge_includes(scope(:find, :include), options[:include]), options[:joins])
          logger.debug("All rows: " + select_all_rows(options, join_dependency).inspect)
          return select_all_rows(options, join_dependency).collect { |row| row[join_dependency.joins.first.aliased_primary_key] }
        end
        []
      end
    end  
  end
end

Usage examples

Following is an association extension that illustrates compatibility with AR::Base#find :

module BookingsExtension
  def upcoming( page = 1) self.find_ids(:all, default_find_options(page)) end

  def recent( page = 1 ) self.find_ids(:all, default_find_options(page).merge!({:order => 'booking_dates.date_from DESC'})) end

  def by_status(status = 'pending', page = 1)
    self.find_ids(:all, default_find_options(page).merge!(:conditions => ['bookings.status = ?', status]))     
  end

    def since(date = Time.now.utc, page = 1)
    self.find_ids(:all, default_find_options(page).merge!(:conditions => ["bookings.status != ? AND booking_dates.date_from >= ?", 'in_progress', date.to_s(:db)]))      
  end  

  def until(date = Time.now.utc, page = 1)
    self.find_ids(:all, default_find_options(page).merge!(:conditions => ["bookings.status != ? AND booking_dates.date_from <= ?", 'in_progress', date.to_s(:db)]))        
  end

  def by_user(user, page = 1)
    self.find_ids(:all, default_find_options(page).merge!(:conditions => ["bookings.status != ? AND bookings.user_id = ?",'in_progress',user.id]))
  end

  def by_reference( reference, page = 1 )
    self.find_ids(:all, default_find_options(page).merge!(:conditions => ["bookings.status != ? AND bookings.reference = ?",'in_progress',reference]))
  end  

  def default_find_options(page)
    { :include => [:customer, :date], :conditions => ['bookings.status != ?', 'in_progress'], :order => 'bookings.created_at DESC', :page => { :size => 10, :current => page, :first => 1 } }
  end
end

Standalone examples:

  account.bookings.by_status(:pending).to_a  # => ["2", "1"]

  account.bookings.by_user( User.get_cache(1) ).to_a # => ["2", "1", "3"]

Memcached friendly examples

I use cachefu to interface with memcache-client.The multiget_cache extension method is particularly useful here:

  Booking.multi_get_cache( ["2", "1", "3"] ) # => lots of output

In the above example we are attempting to fetch Bookings with ID 1..3 from Memcached.Should the objects already be cached, we only had the DB overhead of 1 relatively cheap query while maintaining cache integrity without duplicating any processing or data.

Conclusion

The above is a slight anti-pattern, which true the 90/10 principle, would only ever be useful to those users of the framework with Memcached in their production stack.

It's OK to break free from constraints, religious DRY development that may shoot you in the foot later and even denormalize, as per Blaine's slide number 23, if and when the framework doesn't natively solve your problem ( performance? design constraints? production environment? ) at hand.


About this entry