Fun with Views and CTEs
A view is a stored query the results of which can be treated like a table. Note that it is the query that is saved and not the results of the query. Each time you use a view, its associated query is executed.
A related concept is that of the common table expression or CTE. A CTE can be thought of as a short-lived view; you can only use it within the query in which it appears (you can refer to it multiple times, however).
Let's say that you're the SQL-savvy owner of a store that sells kits for building robots. On your site, customers are guided through the process of selecting all of the components needed to build a quadrupedal robot, i.e. getting the right motors, microcontrollers, sensors, batteries, grippers, etc.
Robots have lots of component parts and you have to make sure that the correct components for each order arrive or your customers will leave mean comments on social media.
Right now you have the following tables:
And those tables contain the following data:
You want to see a list of all the purchases made and the components in those purchases. You might write a query like the following:
select p.id as purchase_id, array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info, sum(cp.num_components * c.price) as purchase_total_price, customer_id, created_at from purchases as p inner join component_purchases as cp on p.id = cp.purchase_id inner join components as c on c.id = cp.component_id group by p.id order by p.created_at
Just a quick explanation in case anything in the query is unclear. We're joining the tables
components and then grouping the rows by
purchase_id. SQLFiddle here.
For each of the components in a purchase, the
sum function adds the multiple of the number of units of a component ordered and the price of a component. The
array_agg function composes an array of the details of the purchase. Notice that the
array_agg functions act across rows, but the
concat function acts across columns.
The result of our query is thus:
Check out the SQLFiddle here.
It's not an exceptionally long query, but you suspect that you'll be using it quite a lot and you'd prefer not to have to write the whole thing out all of the time.
That's where views come in:
create view purchase_information as select p.id as purchase_id, array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info, sum(cp.num_components * c.price) as purchase_total_price, customer_id, created_at from purchases as p inner join component_purchases as cp on p.id = cp.purchase_id inner join components as c on c.id = cp.component_id group by p.id order by p.created_at;
Now you have a simple way to reuse this query. Let's say that you want to know all of the purchases made within the last week. Now you can just write the following:
select * from purchase_information where created_at > now() - interval '7 months';
From within the PostgreSQL interactive terminal, you can use the command
\dv to see a list of all the views in the database.
Dropping a view is as simple as dropping a table:
drop view purchase_information;
Note that if you drop a table using
cascade, any views dependent on that table will be dropped as well.
It's also possible to perform operations on the data in the underlying tables of some views. Updatable views are views where data can be edited, inserted, or deleted. To be updatable, a view must be really basic. Think something along the lines of:
create view customer_info as select first_name, last_name, email from customers where last_name > 'F';
Updatable views can only select rows from a single table (or updatable view). They also cannot contain 'group_by', 'limit', or 'with' clauses.
You can see all of the relevant restrictions in the Postgres docs here.
If you do have an updatable view, you can do something like
insert into customer_info values ('Anne', 'Droid', 'firstname.lastname@example.org');
A new row for 'Anne Droid' has been inserted into the customer table.
Note that the
customer_info view only includes customers whose names begin with letters after 'E'. If you were to run the following query:
select * from customer_info;
Anne's name would not appear in the results because her last name begins with a 'D'. A new row has been inserted into the
customers table using the
customer_info view, but that row is not accessible via the view. SQLFiddle here
Note that you cannot insert values for columns that are present in the underlying table, but not in the view. The following query would result in an error:
insert into customer_info (first_name, last_name, email, phone) values ('Simon', 'Borg', 'email@example.com', '555-777-9999');
So, that's views in a nutshell. Let's move on to CTEs.
To start with, CTE stands for 'common table expression'. They're quite similar to views except that once the query is executed, a CTE ceases to be.
In the view
purchase_information, we were able to get all of the components in a purchase, as well as the price of the purchase. Let's say I wanted a query that would provide me with purchase information, but also the customers to whom the purchases belonged. This can be achieved pretty easily using a common table expression.
You start a CTE using the keyword 'with' followed by the name of the CTE and the query itself.
with purchases as (select p.id as purchase_id, array_agg(concat('[ ', cp.component_id::text, ', ', c.name, ', ', cp.num_components::text, ' ]')) as component_info, sum(cp.num_components * c.price) as purchase_total_price, customer_id from purchases as p inner join component_purchases as cp on p.id = cp.purchase_id inner join components as c on c.id = cp.component_id group by p.id order by p.created_at) select purchases.purchase_id, purchases.purchase_total_price, concat(cu.last_name, ', ', cu.first_name) as customer_name, concat(cu.email, ', ', cu.phone) as customer_contact_info, concat(sa.street1, ', ', sa.street2, ', ', sa.postal_code, ', ', sa.city, ', ', sa.region, ', ', sa.country) as customer_address, component_info from purchases inner join customers as cu on purchases.customer_id = cu.id inner join shipping_addresses as sa on sa.customer_id = cu.id;
purchases CTE can be referred to like a table. Of course, once the query is executed,
purchases doesn't exist.
The results of this query are:
Phew! Now you can send the correct robot parts to your customers.
But wait! There's some complexity to the orders that you've forgotten. Some components need to be combined with other components in order to work. For the sake of simplicity, let's say that any given component requires one or fewer other components to function. Drawing out the relationship between our components, we see that we get a tree graph.
To describe this in the database, we'll need to add another column to the
component table. The
subcomponent_id field will point to the component upon which the current component depends.
Let's say that a customer orders the component with id 4. In order to get all of the parts the customer needs, we'll need traverse the tree all the way down.
Luckily, Postgres has just the solution for this - recursive common table expressions. Recursive CTEs provide a way to allow queries to perform some action with a table, and then to perform the same action on the results of the previous action.
That's probably kind of confusing.
Below is the structure of a recursive common table expression:
with recursive cte_name as ( query to produce initial working table union/union all recursive query ) do something with cte_name
The recursive CTE begins much like a non-recursive CTE. You'll notice that the modifier
recursive is required after
with. Immediately after
recursive is the name of the CTE.
union operator is a query that initializes the working table. The working table is the result set upon which the recursive term is currently acting. For instance, you might put
select id, name from robots where name='Cindi Mayweather'. Since there is one and only one of the illustrious Cindi Mayweather, this query will return one row.
Note that this query is not executed again.
We'll come back to what purpose the
union operator serves in a bit. For now, let's talk about the recursive query. The recursive query will make a reference to the CTE name. Wherever you see the CTE name in the query, that is a reference to the working table.
Given a single component that needs to be shipped, we need to fetch the ids of all of its subcomponents, and their subcomponents, etc. To get all of the subcomponents for the component with an id of 4, we use the following query:
with recursive purchase_subcomponents as ( select id, subcomponent_id from components where id = 4 union select components.id, components.subcomponent_id from components inner join purchase_subcomponents on purchase_subcomponents.subcomponent_id = components.id ) select id from purchase_subcomponents;
That's pretty gross-looking, so let's go through what's happening step by step.
Execution of the cte starts with the non-recursive query. Running that query returns the following:
This is the first working table.
We see that an inner join of the working table and the components table produces the following:
From this, we get the next working table.
The working table is substituted for
purchase_components in the recursive query, with the following result:
And so on...
The recursion ends when the working table has no rows, i.e. when the values for the rows are all null. [equivalent???]
Now we come to what
union is doing. The
union operator acts on all of the working tables that have been generated, combining them all into one table. To ensure that there are no duplicate rows, you use
Hence the result of running the query is
Boom! Nailed it.
If you recall, up top we have a list of purchases with the associated components and customer information. Well, that component list isn't complete; we have to include subcomponents as well.
with recursive purchases_made as ( select p.id as purchase_id, c.id as component_id, c.name as component_name, subcomponent_id, num_components from purchases as p inner join component_purchases as cp on p.id = cp.purchase_id inner join components as c on c.id = cp.component_id order by p.created_at ), purchase_components as ( select * from purchases_made union select purchase_id, components.id, components.name, components.subcomponent_id, num_components from components inner join purchase_components on purchase_components.subcomponent_id = components.id ) select purchase_id, array_agg(concat('[', num_components::text, ' x cid:', component_id, ', ', component_name, ']')) as purchase_items, sum(num_components * components.price) as purchase_total_price, concat(cu.last_name, ', ', cu.first_name) as customer_name, concat(cu.email, ', ', cu.phone) as customer_contact_info, concat(sa.street1, ', ', sa.street2, ', ', sa.postal_code, ', ', sa.city, ', ', sa.region, ', ', sa.country) as customer_address from purchase_components inner join components on purchase_components.component_id = components.id inner join purchases on purchases.id = purchase_components.purchase_id inner join customers as cu on purchases.customer_id = cu.id inner join shipping_addresses as sa on sa.customer_id = cu.id group by purchase_id, cu.last_name, cu.first_name, cu.email, cu.phone, sa.street1, sa.street2, sa.postal_code, sa.city, sa.region, sa.country;
In this query, we have two CTEs. When you want to use multiple CTEs in a query, you place
with at the start of the group and separate each CTE with a comma. Notice that, even though only one of the CTEs is recursive, the
recursive modifier is placed directly after
with, as opposed to directly before the recursive CTE. Each CTE is executed normally. Another nifty thing about using
recursive is that its presence means your CTEs can be in any order, i.e. one CTE can make reference to a CTE that is defined later. If you're wondering how the
component_id alias showed up in the final results, remember that aliases are preserved in the results of a union.
To understand how this query works, we first need to understand what the results of the
purchases_made query are.
This is our first working table.
Following a process similar to the one in the earlier recursive query, the first inner join results in:
The results of the
purchase_components CTE can be found in this SQLFiddle.
I would recommend drawing out tables in a notebook to really hammer home some of the ideas presented in this blog post. And feel free to play around in the SQLFiddle examples provided. It's a pretty great tool.