#rails

New in ActiveRecord: #in_batches

Kashyap's avatar

Kashyap

Rails recently introduced a new method #in_batches similar to the existing #find_in_batches method, the difference being what class the yielded object belongs to. In the case of the latter the yielded collection is an Array whereas in the new method, the yielded collection is an ActiveRecord::Relation. The following post is a usecase that we found for this new method.

In one of our client projects, there is code that looks like:

def fetch_payments_for_merchants(merchants)
  PaymentFetcher.where(merchant_id: merchants.map(&:id))
end

Mapping over a collection of ActiveRecord objects results in an N+1 query syndrome, which means that for each iteration of the #map method, we make an SQL call. A detailed description about N+1 queries and how to avoid them is provided in this Rails guides article. To summarize that section, we should prefer using the #pluck that works on any ActiveRecord collection object to using #map.

I was looking to refactor the codebase to use the #pluck method, but it was not possible here as the argument is not always an ActiveRecord collection object. This method is also used for report generation method, which in turn uses the #find_in_batches method to fetch a batch of merchant objects and operate upon them, inorder to reduce memory usage. The #find_in_batches method takes a block and yields an Array containing the batch of merchants. Something like this:

def generate_csv
  Merchant.find_in_batches do |merchants|
    payments = fetch_payments_for(merchants)
 
    # do some report generation
  end
end
 
# elsewhere in the controller
 
def show
  @merchants = Merchant.where(id: params[:id])
  @payments  = PaymentFetcher.fetch_payments_for(@merchants)
end

So using #pluck would fail with a NoMethodError when the app has to generate a report—at least, as of Rails 4.1.x. To make method work in both cases, the original author used the .map method, since it works on both Arrays and ActiveRecord collection objects. But this penalizes the case where the user loads the page in the browser because there is a possibilty of the show method resulting in an N+1 query. And this is precisely where the new #in_batches method helps. So the generate_csv action can be rewritten as:

def generate_csv
  Merchant.in_batches do |merchants|
    payments = fetch_payments_for(merchants)
 
    # do some report generation
  end
end
 
private
 
def fetch_payments_for(merchants)
  PaymentFetcher.where(merchant_id: merchants.pluck(:id))
end

Now for the bad news: the #in_batches method is only available in Rails master (that would mean version 5, as of today). As per Rails' maintenance policy, new features are not backported to older version. This leaves us with some options:

Polymorphic style conditional

In the fetch_payments_for method, we check the class of the merchants object, and switch between #map and #pluck accordingly.

def fetch_payments_for(merchants)
  merchant_ids =
    if merchants.is_a?(Array)
      merchants.map(&:id)
    else
      merchants.pluck(:id)
    end
  PaymentFetcher.where(merchant_id: merchant_ids)
end

This works, but pretty ugly.

Monkeypatching Array

The other, equally ugly/hacky, way would be to monkeypatch the Array class and define a #pluck method:

def fetch_payments_for(merchants)
  PaymentFetcher.where(merchant_id: merchants.pluck(:id))
end
 
 
class Array
  # The argument is named columns because this method is expected to be
  # used only on an Array of ActiveRecord items.
  def pluck(*columns)
    map do |item|
      columns.map do |column|
        item.send(column)
      end
    end
  end
end

IMO, this implementation is incredibly hacky because of how the definition of a method in a general purpose type (Array) is now coupled with ActiveRecord object structure. Although Duck typing suggests that this is the way, I tend to think twice before polluting a language-level module. That said, as of Rails 5.x, #pluck is defined for Enumerable module via ActiveSupport core extensions, and so, technically, we could use this.

Monkeypatching ActiveRecord::Batches

The other option would be to backport the #in_batches method for the Rails version we use. This would be a bit more complicated than monkey-patching Array class, but it's restricted only to the ActiveRecord::Batches module. This seemed like a nicer change, so we went with this for our case.