Elixir PostgreSQL
Ecto UPSERT for different Primary Keys in Elixir
Let's take a look into Ecto's implementation of PostgreSQL Upsert and how different types of primary keys could change the results of this command. Spoiler alert, you could be surprised by non expected returns, so bare with me.
A couple of days ago I found a strange behavior with the return of an "UPSERT" Ecto command. More specifically to the id. I was intrigued by that and I started to investigate if the type of the primary key would have a different behavior in this command.
Before jumping into that, there's no UPSERT
command in SQL nor in Ecto. Don't expect this feature to be present in all database systems, and don't expect that the databases that implement that will implement using the same syntax. This is just a common name that developers use to describe an atomic "INSERT OR UPDATE" SQL command.
PostgreSQL, for instance, introduced it on the 9.5 version and it takes the form of INSERT ... ON CONFLICT ... UPDATE ...
. As we can see this is a regular INSERT
command, with some kind of checking for existing data based on conflicts and then, if that happens, there's an UPDATE
section. All in a single SQL command. You can find out more information about [PostgreSQL UPSERT] in this great post.
MySQL as another example uses a slightly different command, it would be something like: INSERT ... ON DUPLICATE KEY UPDATE
. As we can infer, they are not the same SQL statement.
Ecto follows an approach and naming convention very similar to the PostgreSQL implementation. So everytime that I mention UPSERT in this post I mean that INSERT ... ON CONFLICT ... UPDATE...
PostgreSQL command.
Ecto UPSERT
In the Ecto land we can use "UPSERT" by calling a regular Ecto.Repo.insert/2
function and set some options that defines the upsert strategy. There are some ways to set this up, so please check the [Ecto UPSERT] for reference. In this post we're taking a look into an specific type of upserts: Insert a row into the database, or if that row exists, then we update a subset of the input.
I choose this strategy because I think that it's very powerful and useful command. It's a great fit to be used when importing or seeding data into a database, as this command can be executed many times for the same data without raising not unique constraints. Also if you want to seed some data, this is also a way to "reset" that data to the seeded one.
The main focus of this post is to discuss if the type of primary key affects the UPSERT command. So let's jump into the problem.
Project Setup
I'll start with a plain phoenix project:
mix phx.new hello_world
cd hello_world
mix ecto.create
To be honest this project could be a simple mix project with ecto setup, but I want to save some setup time and I want to use phoenix generators to create the schemas.
So the elixir and phoenix versions really don't matter for the scope of this, but "ecto" and "postgres" does, so here're the hex packages versions:
"ecto": "3.4.5"
"ecto_sql": "3.4.5"
"postgrex": "0.15.5"
And I am using PostgreSQL 11.
The Primary Keys
So here're the most used different setups for ids using Ecto in PostgreSQL that I can think of:
- Integer ID generated by PostgreSQL Sequence;
- UUID generated by Ecto;
- UUID generated by PostgreSQL (uuid-ossp);
Here are the phoenix commands to generate my schemas:
mix phx.gen.schema Ecommerce.Product1 products1 name:string:unique price:float status:string
mix phx.gen.schema Ecommerce.Product2 products2 name:string:unique price:float status:string --binary-id
mix phx.gen.schema Ecommerce.Product3 products3 name:string:unique price:float status:string --binary-id
The only code changes that I'll make are in the Product3
migration and ecto schema:
defmodule HelloWorld.Repo.Migrations.CreateEcommerceProducts3 do
use Ecto.Migration
def change do
+ execute("CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\";")
+
create table(:products3, primary_key: false) do
- add :id, :binary_id, primary_key: true
+ add :id, :binary_id, primary_key: true, default: fragment("uuid_generate_v4()")
add :name, :string
add :price, :float
add :status, :string
timestamps()
end
create unique_index(:products3, [:name])
end
end
and:
defmodule HelloWorld.Ecommerce.Product3 do
use Ecto.Schema
import Ecto.Changeset
- @primary_key {:id, :binary_id, autogenerate: true}
+ @primary_key {:id, :binary_id, autogenerate: false}
@foreign_key_type :binary_id
schema "products3" do
field :name, :string
field :price, :float
field :status, :string
timestamps()
end
end
At this point we have this situation:
Product1
Integer ID generated by PostgreSQL Sequence;Product2
UUID generated by Ecto;Product3
UUID generated by PostgreSQL (uuid-ossp);
And we finish this section with:
mix ecto.migrate
Test Plan
My plan to test how Ecto and PostgreSQL will behave for the UPSERT command is simply open the iex
console:
iex -S mix
Then we'll alias the modules that we'll be using to avoid repeating ourselves too much:
import Ecto.Query, only: [from: 2]
alias HelloWorld.Ecommerce.Product1
alias HelloWorld.Ecommerce.Product2
alias HelloWorld.Ecommerce.Product3
alias HelloWorld.Repo
Finally for each Product version that we've just created I want to:
- Upsert the product with some attributes, this will cause an INSERT
- Update the same row
- Upsert the product with other attributes, this will cause an UPDATE
I added a regular UPDATE
SQL in between the UPSERT calls to simulate that an UPSERT would update only the fields that we are allowing in our strategy, and also, to simulate that the UPSERT would return fresh data.
insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}
update_attrs = %{status: "OUT_OF_STOCK"}
upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}
I kept the changes as easy as it can be for the purpose of this comparison.
Next, there's an Ecto UPSERT sample to be used in this post:
Repo.insert(
product,
conflict_target: :name,
on_conflict: {:replace, [:price]},
)
Our UPSERT will try to INSERT
a product and if there's a UNIQUE CONSTRAINT violation on the field :name
then it will UPDATE
only the field :price
.
Let's see now what it will happen to each Product version.
Product1
with Integer ID
Let's start with the first UPSERT for the Product1 schema:
iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}
iex > {:ok, inserted_p1} = %Product1{} |>
... > Product1.changeset(insert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)
{:ok, %HelloWorld.Ecommerce.Product1{
id: 1,
name: "Flying V Guitar",
price: 550.0,
status: "ACTIVE",
}}
So far so good, all the data that I specified was returned, including the id generated by the PostgreSQL sequence. Let's move on:
iex > update_attrs = %{status: "OUT_OF_STOCK"}
iex > {:ok, _p1} = inserted_p1 |> Product1.changeset(update_attrs) |> Repo.update()
We just changed that product status to OUT_OF_STOCK
, and finally:
iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}
iex > {:ok, updated_p1} = %Product1{} |>
... > Product1.changeset(upsert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)
{:ok, %HelloWorld.Ecommerce.Product1{
id: 1,
name: "Flying V Guitar",
price: 599.0,
status: "ACTIVE",
}}
At this point we can see that the price got successfully changed, but the status
was returned as ACTIVE
and I expected to be OUT_OF_STOCK
as this was an UPSERT that happens to update, and the strategy was to update only the :price
field. This might be stale data that comes from the inputed attrs, let's check that:
iex > Repo.get(Product1, updated_p1.id)
%HelloWorld.Ecommerce.Product1{
id: 1,
name: "Flying V Guitar",
price: 599.0,
status: "OUT_OF_STOCK",
}
Cool, the UPSERT worked well for insert and update with a tiny note that updates will return stale data. For me the best way to solve that is to add another option to the insert/2
call to return fresh data:
iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 711.0}
iex > {:ok, updated_p1} = %Product1{} |>
... > Product1.changeset(upsert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)
{:ok, %HelloWorld.Ecommerce.Product1{
id: 1,
name: "Flying V Guitar",
price: 711.0,
status: "OUT_OF_STOCK",
}}
That's great! This command behaves exactly how I wanted, the price got updated again, the status has not changed and now it returns fresh data. A quick look into the prepared statement created by Ecto:
INSERT INTO "products1" ("name","price","status","inserted_at","updated_at")
VALUES ($1,$2,$3,$4,$5)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
RETURNING "id"
["Flying V Guitar", 599.0, "ACTIVE", ~N[2020-07-31 15:28:37], ~N[2020-07-31 15:28:37]]
And next, the same call but with the option returning: true
on:
INSERT INTO "products1" ("name","price","status","inserted_at","updated_at")
VALUES ($1,$2,$3,$4,$5)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
RETURNING "id", "updated_at", "inserted_at", "status", "price", "name"
["Flying V Guitar", 599.0, "ACTIVE", ~N[2020-07-31 15:28:57], ~N[2020-07-31 15:28:57]]
As we can observe Ecto adds RETURNING "id"
by default, and if we want fresh data we basically need to tell postgres which columns that we are interested and Ecto will handle that for us.
Product2
with Ecto UUID
Product2
uses a binary id and this id is managed by Ecto instead of PostgreSQL. In order to do that Ecto generates a random uuid and adds it into the insert statement. Let's see what happens if we run the same flow:
iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}
iex > {:ok, inserted_p2} = %Product2{} |>
... > Product2.changeset(insert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)
{:ok, %HelloWorld.Ecommerce.Product2{
id: "5957ab17-4866-45f7-8228-dedee25be2e6",
name: "Flying V Guitar",
price: 550.0,
status: "ACTIVE",
}}
This was the insert and so far so good. Moving on:
iex > update_attrs = %{status: "OUT_OF_STOCK"}
iex > {:ok, _p2} = inserted_p2 |> Product2.changeset(update_attrs) |> Repo.update()
And the UPSERT again:
iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}
iex > {:ok, updated_p2} = %Product2{} |>
... > Product2.changeset(upsert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)
{:ok, %HelloWorld.Ecommerce.Product2{
id: "9be668fd-8b28-4c46-a4f3-e24f6d799636",
name: "Flying V Guitar",
price: 599.0,
status: "ACTIVE",
}}
Here we can notice the same problemas as before, so the price got updated, the status didn't get an update but it returned a bad value as we are not using the returning: true
just yet. But the major thing that got my attention is that the id is a different value. This is a very confusing problem. In this case I am pretty sure that PostgreSQL didn't create a new role, as I have a database contraint that prevents that, but the id should be the same. The problem is seen clearly in the generated SQL:
INSERT INTO "products2" ("name","price","status","inserted_at","updated_at","id")
VALUES ($1,$2,$3,$4,$5,$6)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
["Flying V Guitar", 599.0, "ACTIVE", ~N[2020-07-31 15:32:29], ~N[2020-07-31 15:32:29], <<35, 236, 107, 115, 150, 61, 76, 102, 135, 9, 249, 201, 106, 106, 157, 77>>]
For some reason Ecto is not adding RETURNING "id"
into this statement.
The whole problem here is that we cannot trust that returned id anymore. The id will be there, but it won't be in the database. This was an open eyes event for me.
In any case let's try again but now with returning: true
:
iex > {:ok, updated_p2} = %Product2{} |>
... > Product2.changeset(upsert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)
{:ok, %HelloWorld.Ecommerce.Product2{
id: "5957ab17-4866-45f7-8228-dedee25be2e6",
name: "Flying V Guitar",
price: 599.0,
status: "OUT_OF_STOCK",
}}
Fantastic! This worked the same way as in the Product1
and we can see that this id
is the same as the one returned in the first call of this type.
Product3
with PostgreSQL UUID
Product3
is the one that uses a binary UUID as the id field but this field is autogenerated by PostgreSQL through an Extension called uuid-ossp
. As this is not generated by Ecto I'd expect a similar behavior to the Product1
, so let's see:
iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}
iex > {:ok, inserted_p3} = %Product3{} |>
... > Product3.changeset(insert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)
{:ok, %HelloWorld.Ecommerce.Product3{
id: nil,
name: "Flying V Guitar",
price: 550.0,
status: "ACTIVE",
}}
Wow, id
is nil? I can't even tell by that if the command succeed or not. Luckly we have an unique constraint on :name
, let's use it:
iex > Repo.one(from p in Product3, where: p.name == "Flying V Guitar")
%HelloWorld.Ecommerce.Product3{
id: "725baa3a-d638-4ab5-af3c-b521a1400447",
name: "Flying V Guitar",
price: 550.0,
status: "ACTIVE",
}
Ok, so we could get the product created, that's good news. The problem might be again in the RETURNING "id"
part, so here's the generated prepared statement:
INSERT INTO "ecommerce_products3" ("name","price","status","inserted_at","updated_at")
VALUES ($1,$2,$3,$4,$5)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
["Flying V Guitar", 550.0, "ACTIVE", ~N[2020-08-01 03:00:53], ~N[2020-08-01 03:00:53]]
So apparently Ecto does not add RETURNING "id"
as default in this scenario as well. So at this point I removed manually the last row created because I want to try it again with returning: true
:
iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}
iex > {:ok, inserted_p3} = %Product3{} |>
... > Product3.changeset(insert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)
{:ok, %HelloWorld.Ecommerce.Product3{
id: "2f1a629a-10e7-4c97-94dd-9c618341abaa",
name: "Flying V Guitar",
price: 550.0,
status: "ACTIVE",
}}
OK, this seems it have worked just as we expect. Let's continue:
iex > update_attrs = %{status: "OUT_OF_STOCK"}
iex > {:ok, _p3} = inserted_p3 |> Product3.changeset(update_attrs) |> Repo.update()
For the last UPSERT call I will call with returning: true
again as we know the problem already:
iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}
iex > {:ok, updated_p3} = %Product3{} |>
... > Product3.changeset(upsert_attrs) |>
... > Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)
{:ok, %HelloWorld.Ecommerce.Product3{
id: "2f1a629a-10e7-4c97-94dd-9c618341abaa",
name: "Flying V Guitar",
price: 599.0,
status: "OUT_OF_STOCK",
}}
This command worked as expected.
Conclusion
In order to use UPSERT in Ecto with PostgreSQL we need to understand how this powerfull command will behave in our application. As I showed in these 3 scenarios using Ecto in the wrong way could be confusing for the developer and introduce bugs in the app.
My big take away from this is to start using returning: true
in every UPSERT call as default. This way we can avoid stale and wrong data and ids. The performance implications of using returning: true
seems to be insignificant and the benefits are beyond great.
That's all for today, I hope you have enjoyed and thanks for reading!
At Hashrocket, we love Elixir and Phoenix! Reach out if you need help with your Elixir projects!
[ecto upsert]: https://hexdocs.pm/ecto/Ecto.Repo.html#c:insert/2-upserts [postgresql upsert]: https://hashrocket.com/blog/posts/upsert-records-with-postgresql-9-5
Photo by Aperture Vintage on Unsplash