Over the lifetime of an application, the application's database undergoes many transformations. New tables are added, new relationships are formed, columns are added and subtracted, data is massaged, and so on.
We know this well. This constant change is what underlies our need for database migrations.
These migrations span a wide range. On the one end we have the textbook modifications to our schema. We can pull up the Getting Started guide for Ecto to see how we might add a table or column. On the other end are more sophisticated migrations. These may involve multiple steps or even migrating and massaging data.
In this post, we'll explore this range.
Adding A Table
Let's add a
posts table for our blog app.
def change do create table(:posts) do add :title, :varchar add :content, :text timestamps() end end
This gives us a primary key column
id by default. This is a unique
integer column starting at
1 and counting upwards using a sequence managed
by the database. Every post ought to have a title and some content, so we've
added columns for those. Ecto gives us the flexibility to use data types
native to our database, so we've done just that with some of PostgreSQL's
native data types (
function will produce
updated_at timestamp columns.
mix ecto.migrate will add the
posts table and
sequence to our database schema.
Altering A Table
We said that each post ought to have a
title and some
content, but we
aren't actually enforcing that. Our database can enforce the presence of
those columns when inserting and updating records. We just have to tell it
to do so with the
not null modifier.
Let's generate a new migration.
def up do alter table(:posts) do modify :title, :varchar, null: false modify :content, :text, null: false end end
We use the
alter function to target our
posts table. The
function declares how the named column is to be modified. We have to specify
the data type of the column. If it is the same, nothing changes. This is,
however, an opportunity to change a column's type. For our purposes though,
we are just declaring that these columns cannot be null, hence the
up migration requires a
down migration. Because we are modifying
existing columns, the
down migration cannot be inferred, so we have to
spell it out explicitly.
def down do alter table(:posts) do modify :title, :varchar, null: true modify :content, :text, null: true end end
down migration explicitly says that these columns can be null. This
takes us back to where we were if need to rollback.
Adding A Column With A Default
Posts take time to write and sometimes go through a number of iterations.
We'd like to know whether a particular post is in a draft or published
state. We can achieve this by adding a
published flag in a new migration.
def change do alter table(:posts) do add :published, :boolean, null: false, default: false end end
A post is either published or it isn't, so a boolean
published column will
do the trick. This is a binary status, so
null doesn't mean anything to
us, so we make the column not nullable. Lastly, we want to be explicit about
publishing a post, so we default the
published status of a post to
Referencing Another Table
Our posts don't appear out of nowhere. People write them and those people
would probably like attribution. We should add a table of
people and then
reference those people in the posts1.
def change do create table(:people) do add :name, :varchar end alter table(:posts) do add :people_id, references(:people), null: false end end
This first creates the
people table, again with an implicit primary key
id column. The second portion of the migration alters the
posts table to
references function adds a foreign key constraint to ensure the
integrity of the relationship between people and their posts. In other
words, the database makes sure that post records never get orphaned. With a
foreign key constraint in place, we can be sure that a post pointing to a
person with an id of
6 will definitely be there. If we try to delete the
person with an id of
6, our database will stop us reminding us that we
need to deal with the posts that depend on person
6 first. Perhaps the
right thing to do is delete those posts as well.
It should be noted that by default the
function assumes the key it is referencing has a name of
id. So, in our
references(:people) means there will be a foreign key from
people. If you want to reference a
differently-named column, you can include the
:column option as part of
the second argument to
references specifying the intended name of the
Transitioning A Column
Let's imagine some time has passed and our app, now in production, has some people who are creating and publishing posts.
We'd like to be able to display in the UI the time at which a post was
published. We consider utilizing the
updated_at column when a post in the
published state, but a post can be updated after it is published, so this
is not an accurate indicator.
Our next thought is to add a
published_at timestamp column. If the post is
transitioned to the
published state, then we set
published_at to the
current time and then display that time as long as the post remains in the
published state. If
published is changed from
false for a
post, then we can
null out the
published_at value. That'll work, but
we've run into a data design smell.
With this second approach, we've introduced a denormalization of our data.
published_at fields have partially overlapping
concerns and it is possible for them to get out of sync. We can achieve the
same ends with a third, simplified option that only involves one column.
We can replace the
published flag with the
published_at timestamp. We
maintain the binary indicator because
published_at being null tells us it
is in a draft state whereas
published_at being some timestamp tells us
both that it is published and the time at which it was published.
Let's add this migration.
def change do alter table(:posts) do add :published_at, :timestamp remove :published end end
That will do the trick. But hold up a second. If we run this migration against our production database, we will be doing something very sad. We'll be throwing away a bunch of data and messing up the publish status of all our existing posts.
We need to transition some data before we can remove the
We need to set all existing unpublished posts to have
published_at. All the existing published posts, however, need to be
assigned a timestamp. We already know that our existing data model was not
sufficient for representing when a post was published. We are going to have
to do some approximation, and that's okay.
Let's say that, in terms of this data migration, setting the
value for all published posts to the same time that they were all last
updated is sufficient. So, for each post with
published set to
need to duplicate that post's
updated_at value into the newly-created
published_at column. That will require an
Let's modify our migration from above.
import Ecto.Query def up do alter table(:posts) do add :published_at, :timestamp end from(p in "posts", update: [set: [published_at: p.updated_at]], where: p.published) |> MyApp.Repo.update_all() alter table(:posts) do remove :published end end
This migration looks like exactly what we want, but when we run
ecto.migrate, we are going to see an odd error.
** (Postgrex.Error) ERROR (undefined_column): column p0.published_at does not exist
How can it not exist for our
update_all function when we clearly added
that column just a few lines above?
The issue is that the Ecto migrator is queueing up everything in our
migration to be executed against the database. When it reaches the
update_all statement it tries to create a prepared statement that involves
a column that has not yet been created.
To ensure that the first portion of our migration is executed before the
update_all is reached, we can utilize the
flush() after the first portion of the migration will force
everything queued up so far to be executed against the database. This clears
the way for the rest of our migration to work.
def up do alter table(:posts) do add :published_at, :timestamp end flush() from(p in "posts", update: [set: [published_at: p.updated_at]], where: p.published) |> MyApp.Repo.update_all() alter table(:posts) do remove :published end end
And like before, with any
up migration, we need a
def down do alter table(:posts) do add :published, :boolean, null: false, default: false end flush() from(p in "posts", update: [set: [published: true]], where: not is_nil(p.published_at)) |> MyApp.Repo.update_all() alter table(:posts) do remove :published_at end end
This requires some of the same tricks as the
up migration because we want
to be able to put the data back in place as best as we can in the event of
In this post, we've explored a range of simple to complex Ecto migrations involving both schema changes and data transformation. We looked at how you can create new tables as well as alter existing ones. We even explored in detail how you would go about massaging and migrating data in the event that you need to replace one column with another. Ecto is an abstraction on top of SQL and we may need to drop down to raw SQL migrations in some cases, but from what we've seen Ecto can get us pretty far.
Photo credit for cover image: Paul Morris, unsplash.com
The following migration assumes an empty database, as if we are in the midst of creating an initial MVP of our application. If the database already contained posts, then adding a non-nullable column would cause an error due to a violation of the
not nullconstraint. ↩
When using the
update_allfunction, there are two ways of specifying the terms of the update. As we did above, the
:updatekeyword can be used within our
Querystruct to specify how to update the relevant records. Because we went this route, we leave the required
update_allas an empty list. If we don't specify the update as part of the query, then we need to include those details as part of the