Ruby
Advanced Queries with ActiveRecord's From Method
For those of us that write database-backed Rails apps,
ActiveRecord
is a
familiar beast. We know its various query building methods well: where
,
order
, limit
, and even group
. The most important SQL clause is
conspicuously missing from our repertoire: the from
clause.
Chances are you can't think of many, if any, times that you've used
ActiveRecord
's from
method. This is to be expected. The from
clause in
SQL specifies the target table of a query. In the context of ActiveRecord, the
target table is explicitly expressed by the model from which you query.
Consider the times you use one-liners such as Developer.first
,
Developer.all
, and Developer.count
. With each of these, the developers
table as the target of the query is expressed by the use of the Developer
model.
So, where does the from
method come into the picture?
The
docs
describe the from
method as such.
Specifies table from which the records will be fetched.
Specifying the table of a query against an ActiveRecord model is going to feel a bit redundant.
> Developer.from('developers').to_sql
=> "SELECT \"developers\".* FROM developers"
As you've probably noticed we get this same behavior with nothing more than the
all
method.
> Developer.all.to_sql
=> "SELECT \"developers\".* FROM \"developers\""
So, again, where does the from
method come into the picture?
Our from
clauses aren't always going to look as simple as from developers
.
We sometimes need the "table" from which we are fetching to be a collection of
computed values. The key insight here is that the result of a query is itself a
table. That means that we can query against the result of another query. This
is ostensibly how subqueries work.
The from
method comes into play when we are building up a complex query
in SQL and need to transition back to the land of ActiveRecord and Rails.
Let's work with the
developer_scores.sql
query from the hr-til
codebase.
select
developers.id,
username,
posts,
likes,
round(likes::numeric / posts, 2) as avg_likes,
round(log(2, (1022 * ((score - min(score) over ()) / ((max(score) over ()) - (min(score) over ()))) + 2)::numeric), 1) as hotness
from developers
join (
select developer_id as id, sum(score) as score
from hot_posts
group by developer_id
) developer_scores using(id)
join (
select
developer_id as id,
count(*) as posts,
sum(likes) as likes
from posts
group by developer_id
) stats using(id)
This query is used to compute both the average number of likes and the hotness score for each developer's posts. Because of the complexity of the query, we are better off writing it in SQL rather than using ActiveRecord's DSL. This query does a complex computation that results in a bunch of data. It would be nice if we could further refine the results as needed with ActiveRecord's DSL. And herein lies the challenge. How do we transition from this SQL query to something compatible with ActiveRecord?
The from
method -- the topic of this post -- is, of course, the answer. We
can treat the above query as a "table" that we can query against (i.e. a
subquery). This allows us to then use order
, limit
, group
, etc. to
further refine the result set.
We can create a class (app/models/developer_ranker.rb
) to see how this can work.
class DeveloperRanker
RANKED_DEV_SQL = <<-SQL
(select
developers.id,
username,
posts,
likes,
round(likes::numeric / posts, 2) as avg_likes,
round(log(2, (1022 * ((score - min(score) over ()) / ((max(score) over ()) - (min(score) over ()))) + 2)::numeric), 1) as hotness
from developers
join (
select developer_id as id, sum(score) as score
from hot_posts
group by developer_id
) developer_scores using(id)
join (
select
developer_id as id,
count(*) as posts,
sum(likes) as likes
from posts
group by developer_id
) stats using(id)
) as developers
SQL
def self.top_developers(count=1)
Developer.from(RANKED_DEV_SQL)
.order(hotness: :desc)
.limit(count)
end
def self.bottom_developers(count=1)
Developer.from(RANKED_DEV_SQL)
.order(hotness: :asc)
.limit(count)
end
end
We treat the complex query as a subquery by wrapping it in (...) as
developers
assigning it as a string to a constant. This constant is then
available to any methods that want to build on this query without duplicating
all the core query logic.
As you can see in DeveloperRanker.top_developers
, we hydrate an
ActiveRecord::Relation
with the results of the query. We can then chain those
familiar ActiveRecord query methods on the end. In this case, we order the
results with the developers most on fire at the top and then limiting by the
given count
. Note that because we are querying against the Developer
model, it is essential that the subquery's name matches the name that
ActiveRecord uses for that model's table.
We create DeveloperRanker.bottom_developers
to perform a similar query with
very little duplication.
An additional benefit of this approach is that the Developer
objects that are
being hydrated are decorated with the additional values returned from the
query. Not only do we know the id
and username
of each developer, we can
also access the avg_likes
and hotness
attributes. These hydrated
Developer
objects can be passed to the view layer like any other decorated
object would be.
This technique is great for generating reports and stats that require lots of computation or complex joins, but are ultimately tied to particular domain concepts (e.g. a developer).
Ultimately, this frees you to write some SQL without completely cutting
yourself off from the world of ActiveRecord
. Enjoy!
Cover photo by Samuel Zeller on Unsplash.com