Heading image for post: Rails Quick Tips: Easy ActiveRecord Optimizations

Ruby

Rails Quick Tips: Easy ActiveRecord Optimizations

Profile picture of Micah Woods

Recently, we had the opportunity to write an API endpoint that would "bulk" create thousands of users. The API needed to quickly return errors if any existed, so a background job wouldn't work in this case. Bulk insert was easy thanks to the activerecord-import gem. However, validation had to be done before bulk inserting users into the database.

We had to ensure that the screen_names were unique (amongst other things). The code we produced was a bit more involved, but here is a minimal version of what an screen_name validation might look like.

class BulkUserScreenNameValidator
  attr_reader :users
  def initialize(emails)
    # emails is an array ['jane@example.com','john@example.com']
    # it comes from the API endpoint and potentially has thousands of emails
    @users = User.where(email: emails) 
  end

  def valid?
    User.where(screen_name: users.map(&:screen_name)).blank?
  end
end

On the surface this looks like clean idiomatic Ruby. However, when you run the code you quickly see the performance problems.

  1. Thousands of users are instantiated and loaded into an array to check if it's blank?
  2. Thousands of users are instantiated and loaded into an array to map screen_name
  3. Multiple queries are made.

Ok, let's go down the list and fix these problems. Most of the time when you see blank? at the end of an active record relation, you should use empty? instead. Let's look at the difference.

# Using `blank?`
User.where(screen_name: ['user1','user2']).blank?

# 1. Queries database for all user data
#   SELECT "users".* FROM "users" WHERE "users"."screen_name" IN ('user1','user2')

# 2. Loads users into an array
#   [<#User:0x007fbf6413c510>,<#User:0x007fbf65ab1c70>]

# 3. Checks to see if the array size is zero
#   => true
# Using `empty?`
User.where(screen_name: ['user1','user2').empty?

# 1. Queries database for ONLY a count
#   SELECT COUNT(*) FROM "users" WHERE "users"."screen_name" IN ('user1','user2')

# 2. Checks to see if the count is zero
# => true

So, blank? will load the entire array, then check to see if the array is empty. On the other hand, empty? asks the database for a count, and checks to see if that count is zero or not. This might not make a difference in small datasets (like development), but it can make a big difference in databases with large datasets (like production). It will also make a huge difference in memory consumption when thousands of records are loaded vs a single integer.

Next, let's look at our mapping of the screen_names. When you notice that you're mapping an activerecord relation, you should consider using pluck instead. We will talk more about this later.

# Using `map?`
User.where(email: ['jane@example.com', 'john@example.com']).map(&:screen_name)

# 1. Queries database for all user data
#   SELECT "users".* FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')

# 2. Loads users into an array
#   [<#User:0x007fbf6413c510>,<#User:0x007fbf65ab1c70>]

# 3. Iterates over users to collect screen_names
#   ['user1','user2']
# Using `pluck?`
User.where(email: ['jane@example.com', 'john@example.com']).pluck(:screen_name)

# 1. Queries database for only screen_names
#   SELECT "users"."screen_name" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')

# 2. Returns those screen_names in an array
#   ['user1','user2']

So, map will load the entire array, then iterate to collect the screen_names. Alternatively, pluck asks the database for exactly what it needs and returns an array of just those items. Once again, performance is gained and less memory is used for large datasets (like production).

Now, we have two fairly efficient queries. How do we transform it into just one efficient query. In the past, I would drop down into Arel, or just give up and write SQL. However, thanks to a recent blog post, (Rails: Don’t “pluck” Unnecessarily), I now have another option.

So the new rule is "think twice before you use pluck". Any time you use pluck you should consider using select instead. Lets look at select and how it works.

User.where(email: ['jane@example.com', 'john@example.com']).select(:screen_name)

# 1. Queries database for only screen_names
#   SELECT "users"."screen_names" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')

# 2. Returns and array of User instances with only the "screen_name" populated
#   [<#User:0x007fbf6413c510>,<#User:0x007fbf65ab1c70>]

Ok, well that seems less efficient than pluck, and it is, until you pass it into a where clause.

# Using `pluck` with a `where`
emails =  ['jane@example.com', 'john@example.com']
User.where(screen_name: User.where(email: emails).pluck(:screen_name)).empty?

# 1. Queries for just the emails
#   SELECT "users"."email" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')

# 2. Queries for the count
#  SELECT COUNT(*) FROM "users" WHERE "users"."screen_name" IN ('user1','user2')

# Checks if the count is zero
# => true
# Using `select` with a `where`
emails =  ['jane@example.com', 'john@example.com']
User.where(screen_name: User.where(email: emails).select(:screen_name)).empty?

# 1. Makes one query for count with subquery
#   SELECT COUNT(*) FROM "users" WHERE "users"."screen_name" IN (
#     SELECT "users"."screen_name" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')
#   )

# 2. Checks if the count is zero
# => true

And that's it!! We just saved a ton of performance and memory. Let's look at the new code and how little it changed.

class BulkUserScreenNameValidator
  attr_reader :users
  def initialize(emails)
    @users = User.where(email: emails) 
  end

  def valid?
    User.where(screen_name: users.select(:screen_name)).empty?
  end
end

Here's a reminder of the rules.

  1. Use empty? or any? instead of blank? or present?.
  2. Never use map on active record relations, use pluck instead.
  3. If you're using pluck to pass values to a where use select instead.
  • Note, email validator was updated to screen name validator. User.where(email: emails).empty? would be enough to check if emails already existed. Thanks to Pavel for pointing this out. And even in the example of a screen name validator, the query could be User.where(screen_name: screen_names).empty? I tried to pull a small piece of code from a large application, and simplify it. Unfortunately, somethings get lost in translation. So my hope is that you take away some simple rules, that can improve your application. Of course if you are doing this in your own app, please write some bench mark tests, import production data, and ensure that you are actually optimizing your code.
  • Adobe logo
  • Barnes and noble logo
  • Aetna logo
  • Vanderbilt university logo
  • Ericsson logo

We're proud to have launched hundreds of products for clients such as LensRentals.com, Engine Yard, Verisign, ParkWhiz, and Regions Bank, to name a few.

Let's talk about your project