Ruby
Rails Quick Tips: Easy ActiveRecord Optimizations
Recently, we had the opportunity to write an API endpoint that would "bulk" create thousands of users. The API needed to quickly return errors if any existed, so a background job wouldn't work in this case. Bulk insert was easy thanks to the activerecord-import gem. However, validation had to be done before bulk inserting users into the database.
We had to ensure that the screen_names were unique (amongst other things). The code we produced was a bit more involved, but here is a minimal version of what an screen_name
validation might look like.
class BulkUserScreenNameValidator
attr_reader :users
def initialize(emails)
# emails is an array ['jane@example.com','john@example.com']
# it comes from the API endpoint and potentially has thousands of emails
@users = User.where(email: emails)
end
def valid?
User.where(screen_name: users.map(&:screen_name)).blank?
end
end
On the surface this looks like clean idiomatic Ruby. However, when you run the code you quickly see the performance problems.
- Thousands of users are instantiated and loaded into an array to check if it's
blank?
- Thousands of users are instantiated and loaded into an array to map
screen_name
- Multiple queries are made.
Ok, let's go down the list and fix these problems. Most of the time when you see blank?
at the end of an active record relation, you should use empty?
instead. Let's look at the difference.
# Using `blank?`
User.where(screen_name: ['user1','user2']).blank?
# 1. Queries database for all user data
# SELECT "users".* FROM "users" WHERE "users"."screen_name" IN ('user1','user2')
# 2. Loads users into an array
# [<#User:0x007fbf6413c510>,<#User:0x007fbf65ab1c70>]
# 3. Checks to see if the array size is zero
# => true
# Using `empty?`
User.where(screen_name: ['user1','user2').empty?
# 1. Queries database for ONLY a count
# SELECT COUNT(*) FROM "users" WHERE "users"."screen_name" IN ('user1','user2')
# 2. Checks to see if the count is zero
# => true
So, blank?
will load the entire array, then check to see if the array is empty. On the other hand, empty?
asks the database for a count, and checks to see if that count is zero or not. This might not make a difference in small datasets (like development), but it can make a big difference in databases with large datasets (like production). It will also make a huge difference in memory consumption when thousands of records are loaded vs a single integer.
Next, let's look at our mapping of the screen_names. When you notice that you're mapping an activerecord relation, you should consider using pluck
instead. We will talk more about this later.
# Using `map?`
User.where(email: ['jane@example.com', 'john@example.com']).map(&:screen_name)
# 1. Queries database for all user data
# SELECT "users".* FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')
# 2. Loads users into an array
# [<#User:0x007fbf6413c510>,<#User:0x007fbf65ab1c70>]
# 3. Iterates over users to collect screen_names
# ['user1','user2']
# Using `pluck?`
User.where(email: ['jane@example.com', 'john@example.com']).pluck(:screen_name)
# 1. Queries database for only screen_names
# SELECT "users"."screen_name" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')
# 2. Returns those screen_names in an array
# ['user1','user2']
So, map
will load the entire array, then iterate to collect the screen_names. Alternatively, pluck
asks the database for exactly what it needs and returns an array of just those items. Once again, performance is gained and less memory is used for large datasets (like production).
Now, we have two fairly efficient queries. How do we transform it into just one efficient query. In the past, I would drop down into Arel, or just give up and write SQL. However, thanks to a recent blog post, (Rails: Don’t “pluck” Unnecessarily), I now have another option.
So the new rule is "think twice before you use pluck". Any time you use pluck
you should consider using select
instead. Lets look at select
and how it works.
User.where(email: ['jane@example.com', 'john@example.com']).select(:screen_name)
# 1. Queries database for only screen_names
# SELECT "users"."screen_names" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')
# 2. Returns and array of User instances with only the "screen_name" populated
# [<#User:0x007fbf6413c510>,<#User:0x007fbf65ab1c70>]
Ok, well that seems less efficient than pluck
, and it is, until you pass it into a where
clause.
# Using `pluck` with a `where`
emails = ['jane@example.com', 'john@example.com']
User.where(screen_name: User.where(email: emails).pluck(:screen_name)).empty?
# 1. Queries for just the emails
# SELECT "users"."email" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')
# 2. Queries for the count
# SELECT COUNT(*) FROM "users" WHERE "users"."screen_name" IN ('user1','user2')
# Checks if the count is zero
# => true
# Using `select` with a `where`
emails = ['jane@example.com', 'john@example.com']
User.where(screen_name: User.where(email: emails).select(:screen_name)).empty?
# 1. Makes one query for count with subquery
# SELECT COUNT(*) FROM "users" WHERE "users"."screen_name" IN (
# SELECT "users"."screen_name" FROM "users" WHERE "users"."email" IN ('jane@example.com','john@example.com')
# )
# 2. Checks if the count is zero
# => true
And that's it!! We just saved a ton of performance and memory. Let's look at the new code and how little it changed.
class BulkUserScreenNameValidator
attr_reader :users
def initialize(emails)
@users = User.where(email: emails)
end
def valid?
User.where(screen_name: users.select(:screen_name)).empty?
end
end
Here's a reminder of the rules.
- Use
empty?
orany?
instead ofblank?
orpresent?
. - Never use
map
on active record relations, usepluck
instead. - If you're using
pluck
to pass values to awhere
useselect
instead.
- Note, email validator was updated to screen name validator.
User.where(email: emails).empty?
would be enough to check if emails already existed. Thanks to Pavel for pointing this out. And even in the example of a screen name validator, the query could beUser.where(screen_name: screen_names).empty?
I tried to pull a small piece of code from a large application, and simplify it. Unfortunately, somethings get lost in translation. So my hope is that you take away some simple rules, that can improve your application. Of course if you are doing this in your own app, please write some bench mark tests, import production data, and ensure that you are actually optimizing your code.