Elixir PostgreSQL

Ecto UPSERT for different Primary Keys in Elixir

by Vinicius Negrisolo on August 4, 2020

Let's take a look into Ecto's implementation of PostgreSQL Upsert and how different types of primary keys could change the results of this command. Spoiler alert, you could be surprised by non expected returns, so bare with me.

A couple of days ago I found a strange behavior with the return of an "UPSERT" Ecto command. More specifically to the id. I was intrigued by that and I started to investigate if the type of the primary key would have a different behavior in this command.

Before jumping into that, there's no UPSERT command in SQL nor in Ecto. Don't expect this feature to be present in all database systems, and don't expect that the databases that implement that will implement using the same syntax. This is just a common name that developers use to describe an atomic "INSERT OR UPDATE" SQL command.

PostgreSQL, for instance, introduced it on the 9.5 version and it takes the form of INSERT ... ON CONFLICT ... UPDATE .... As we can see this is a regular INSERT command, with some kind of checking for existing data based on conflicts and then, if that happens, there's an UPDATE section. All in a single SQL command. You can find out more information about [PostgreSQL UPSERT] in this great post.

MySQL as another example uses a slightly different command, it would be something like: INSERT ... ON DUPLICATE KEY UPDATE. As we can infer, they are not the same SQL statement.

Ecto follows an approach and naming convention very similar to the PostgreSQL implementation. So everytime that I mention UPSERT in this post I mean that INSERT ... ON CONFLICT ... UPDATE... PostgreSQL command.

Ecto UPSERT

In the Ecto land we can use "UPSERT" by calling a regular Ecto.Repo.insert/2 function and set some options that defines the upsert strategy. There are some ways to set this up, so please check the [Ecto UPSERT] for reference. In this post we're taking a look into an specific type of upserts: Insert a row into the database, or if that row exists, then we update a subset of the input.

I choose this strategy because I think that it's very powerful and useful command. It's a great fit to be used when importing or seeding data into a database, as this command can be executed many times for the same data without raising not unique constraints. Also if you want to seed some data, this is also a way to "reset" that data to the seeded one.

The main focus of this post is to discuss if the type of primary key affects the UPSERT command. So let's jump into the problem.

Project Setup

I'll start with a plain phoenix project:

mix phx.new hello_world
cd hello_world
mix ecto.create

To be honest this project could be a simple mix project with ecto setup, but I want to save some setup time and I want to use phoenix generators to create the schemas.

So the elixir and phoenix versions really don't matter for the scope of this, but "ecto" and "postgres" does, so here're the hex packages versions:

"ecto": "3.4.5"
"ecto_sql": "3.4.5"
"postgrex": "0.15.5"

And I am using PostgreSQL 11.

The Primary Keys

So here're the most used different setups for ids using Ecto in PostgreSQL that I can think of:

Integer ID generated by PostgreSQL Sequence;
UUID generated by Ecto;
UUID generated by PostgreSQL (uuid-ossp);

Here are the phoenix commands to generate my schemas:

mix phx.gen.schema Ecommerce.Product1 products1 name:string:unique price:float status:string

mix phx.gen.schema Ecommerce.Product2 products2 name:string:unique price:float status:string --binary-id

mix phx.gen.schema Ecommerce.Product3 products3 name:string:unique price:float status:string --binary-id

The only code changes that I'll make are in the Product3 migration and ecto schema:

defmodule HelloWorld.Repo.Migrations.CreateEcommerceProducts3 do
  use Ecto.Migration

  def change do
+   execute("CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\";")
+
    create table(:products3, primary_key: false) do
-     add :id, :binary_id, primary_key: true
+     add :id, :binary_id, primary_key: true, default: fragment("uuid_generate_v4()")
      add :name, :string
      add :price, :float
      add :status, :string

      timestamps()
    end

    create unique_index(:products3, [:name])
  end
end

and:

defmodule HelloWorld.Ecommerce.Product3 do
  use Ecto.Schema
  import Ecto.Changeset

- @primary_key {:id, :binary_id, autogenerate: true}
+ @primary_key {:id, :binary_id, autogenerate: false}
  @foreign_key_type :binary_id
  schema "products3" do
    field :name, :string
    field :price, :float
    field :status, :string

    timestamps()
  end
end

At this point we have this situation:

Product1 Integer ID generated by PostgreSQL Sequence;
Product2 UUID generated by Ecto;
Product3 UUID generated by PostgreSQL (uuid-ossp);

And we finish this section with:

mix ecto.migrate

Test Plan

My plan to test how Ecto and PostgreSQL will behave for the UPSERT command is simply open the iex console:

iex -S mix

Then we'll alias the modules that we'll be using to avoid repeating ourselves too much:

import Ecto.Query, only: [from: 2]
alias HelloWorld.Ecommerce.Product1
alias HelloWorld.Ecommerce.Product2
alias HelloWorld.Ecommerce.Product3
alias HelloWorld.Repo

Finally for each Product version that we've just created I want to:

Upsert the product with some attributes, this will cause an INSERT
Update the same row
Upsert the product with other attributes, this will cause an UPDATE

I added a regular UPDATE SQL in between the UPSERT calls to simulate that an UPSERT would update only the fields that we are allowing in our strategy, and also, to simulate that the UPSERT would return fresh data.

insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}
update_attrs = %{status: "OUT_OF_STOCK"}
upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}

I kept the changes as easy as it can be for the purpose of this comparison.

Next, there's an Ecto UPSERT sample to be used in this post:

Repo.insert(
  product,
  conflict_target: :name,
  on_conflict: {:replace, [:price]},
)

Our UPSERT will try to INSERT a product and if there's a UNIQUE CONSTRAINT violation on the field :name then it will UPDATE only the field :price.

Let's see now what it will happen to each Product version.

`Product1` with Integer ID

Let's start with the first UPSERT for the Product1 schema:

iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}

iex > {:ok, inserted_p1} = %Product1{} |>
... >   Product1.changeset(insert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)

{:ok, %HelloWorld.Ecommerce.Product1{
  id: 1,
  name: "Flying V Guitar",
  price: 550.0,
  status: "ACTIVE",
}}

So far so good, all the data that I specified was returned, including the id generated by the PostgreSQL sequence. Let's move on:

iex > update_attrs = %{status: "OUT_OF_STOCK"}

iex > {:ok, _p1} = inserted_p1 |> Product1.changeset(update_attrs) |> Repo.update()

We just changed that product status to OUT_OF_STOCK, and finally:

iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}

iex > {:ok, updated_p1} = %Product1{} |>
... >   Product1.changeset(upsert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)

{:ok, %HelloWorld.Ecommerce.Product1{
  id: 1,
  name: "Flying V Guitar",
  price: 599.0,
  status: "ACTIVE",
}}

At this point we can see that the price got successfully changed, but the status was returned as ACTIVE and I expected to be OUT_OF_STOCK as this was an UPSERT that happens to update, and the strategy was to update only the :price field. This might be stale data that comes from the inputed attrs, let's check that:

iex > Repo.get(Product1, updated_p1.id)

%HelloWorld.Ecommerce.Product1{
  id: 1,
  name: "Flying V Guitar",
  price: 599.0,
  status: "OUT_OF_STOCK",
}

Cool, the UPSERT worked well for insert and update with a tiny note that updates will return stale data. For me the best way to solve that is to add another option to the insert/2 call to return fresh data:

iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 711.0}

iex > {:ok, updated_p1} = %Product1{} |>
... >   Product1.changeset(upsert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)

{:ok, %HelloWorld.Ecommerce.Product1{
  id: 1,
  name: "Flying V Guitar",
  price: 711.0,
  status: "OUT_OF_STOCK",
}}

That's great! This command behaves exactly how I wanted, the price got updated again, the status has not changed and now it returns fresh data. A quick look into the prepared statement created by Ecto:

INSERT INTO "products1" ("name","price","status","inserted_at","updated_at")
VALUES ($1,$2,$3,$4,$5)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
RETURNING "id"
["Flying V Guitar", 599.0, "ACTIVE", ~N[2020-07-31 15:28:37], ~N[2020-07-31 15:28:37]]

And next, the same call but with the option returning: true on:

INSERT INTO "products1" ("name","price","status","inserted_at","updated_at")
VALUES ($1,$2,$3,$4,$5)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
RETURNING "id", "updated_at", "inserted_at", "status", "price", "name"
["Flying V Guitar", 599.0, "ACTIVE", ~N[2020-07-31 15:28:57], ~N[2020-07-31 15:28:57]]

As we can observe Ecto adds RETURNING "id" by default, and if we want fresh data we basically need to tell postgres which columns that we are interested and Ecto will handle that for us.

`Product2` with Ecto UUID

Product2 uses a binary id and this id is managed by Ecto instead of PostgreSQL. In order to do that Ecto generates a random uuid and adds it into the insert statement. Let's see what happens if we run the same flow:

iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}

iex > {:ok, inserted_p2} = %Product2{} |>
... >   Product2.changeset(insert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)

{:ok, %HelloWorld.Ecommerce.Product2{
  id: "5957ab17-4866-45f7-8228-dedee25be2e6",
  name: "Flying V Guitar",
  price: 550.0,
  status: "ACTIVE",
}}

This was the insert and so far so good. Moving on:

iex > update_attrs = %{status: "OUT_OF_STOCK"}

iex > {:ok, _p2} = inserted_p2 |> Product2.changeset(update_attrs) |> Repo.update()

And the UPSERT again:

iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}

iex > {:ok, updated_p2} = %Product2{} |>
... >   Product2.changeset(upsert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)

{:ok, %HelloWorld.Ecommerce.Product2{
  id: "9be668fd-8b28-4c46-a4f3-e24f6d799636",
  name: "Flying V Guitar",
  price: 599.0,
  status: "ACTIVE",
}}

Here we can notice the same problemas as before, so the price got updated, the status didn't get an update but it returned a bad value as we are not using the returning: true just yet. But the major thing that got my attention is that the id is a different value. This is a very confusing problem. In this case I am pretty sure that PostgreSQL didn't create a new role, as I have a database contraint that prevents that, but the id should be the same. The problem is seen clearly in the generated SQL:

INSERT INTO "products2" ("name","price","status","inserted_at","updated_at","id")
VALUES ($1,$2,$3,$4,$5,$6)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
["Flying V Guitar", 599.0, "ACTIVE", ~N[2020-07-31 15:32:29], ~N[2020-07-31 15:32:29], <<35, 236, 107, 115, 150, 61, 76, 102, 135, 9, 249, 201, 106, 106, 157, 77>>]

For some reason Ecto is not adding RETURNING "id" into this statement.

The whole problem here is that we cannot trust that returned id anymore. The id will be there, but it won't be in the database. This was an open eyes event for me.

In any case let's try again but now with returning: true:

iex > {:ok, updated_p2} = %Product2{} |>
... >   Product2.changeset(upsert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)

{:ok, %HelloWorld.Ecommerce.Product2{
  id: "5957ab17-4866-45f7-8228-dedee25be2e6",
  name: "Flying V Guitar",
  price: 599.0,
  status: "OUT_OF_STOCK",
}}

Fantastic! This worked the same way as in the Product1 and we can see that this id is the same as the one returned in the first call of this type.

`Product3` with PostgreSQL UUID

Product3 is the one that uses a binary UUID as the id field but this field is autogenerated by PostgreSQL through an Extension called uuid-ossp. As this is not generated by Ecto I'd expect a similar behavior to the Product1, so let's see:

iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}

iex > {:ok, inserted_p3} = %Product3{} |>
... >   Product3.changeset(insert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name)

{:ok, %HelloWorld.Ecommerce.Product3{
  id: nil,
  name: "Flying V Guitar",
  price: 550.0,
  status: "ACTIVE",
}}

Wow, id is nil? I can't even tell by that if the command succeed or not. Luckly we have an unique constraint on :name, let's use it:

iex > Repo.one(from p in Product3, where: p.name == "Flying V Guitar")

%HelloWorld.Ecommerce.Product3{
  id: "725baa3a-d638-4ab5-af3c-b521a1400447",
  name: "Flying V Guitar",
  price: 550.0,
  status: "ACTIVE",
}

Ok, so we could get the product created, that's good news. The problem might be again in the RETURNING "id" part, so here's the generated prepared statement:

INSERT INTO "ecommerce_products3" ("name","price","status","inserted_at","updated_at")
VALUES ($1,$2,$3,$4,$5)
ON CONFLICT ("name") DO
UPDATE SET "price" = EXCLUDED."price"
["Flying V Guitar", 550.0, "ACTIVE", ~N[2020-08-01 03:00:53], ~N[2020-08-01 03:00:53]]

So apparently Ecto does not add RETURNING "id" as default in this scenario as well. So at this point I removed manually the last row created because I want to try it again with returning: true:

iex > insert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 550.0}

iex > {:ok, inserted_p3} = %Product3{} |>
... >   Product3.changeset(insert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)

{:ok, %HelloWorld.Ecommerce.Product3{
  id: "2f1a629a-10e7-4c97-94dd-9c618341abaa",
  name: "Flying V Guitar",
  price: 550.0,
  status: "ACTIVE",
}}

OK, this seems it have worked just as we expect. Let's continue:

iex > update_attrs = %{status: "OUT_OF_STOCK"}

iex > {:ok, _p3} = inserted_p3 |> Product3.changeset(update_attrs) |> Repo.update()

For the last UPSERT call I will call with returning: true again as we know the problem already:

iex > upsert_attrs = %{name: "Flying V Guitar", status: "ACTIVE", price: 599.0}

iex > {:ok, updated_p3} = %Product3{} |>
... >   Product3.changeset(upsert_attrs) |>
... >   Repo.insert(on_conflict: {:replace, [:price]}, conflict_target: :name, returning: true)

{:ok, %HelloWorld.Ecommerce.Product3{
  id: "2f1a629a-10e7-4c97-94dd-9c618341abaa",
  name: "Flying V Guitar",
  price: 599.0,
  status: "OUT_OF_STOCK",
}}

This command worked as expected.

Conclusion

In order to use UPSERT in Ecto with PostgreSQL we need to understand how this powerfull command will behave in our application. As I showed in these 3 scenarios using Ecto in the wrong way could be confusing for the developer and introduce bugs in the app.

My big take away from this is to start using returning: true in every UPSERT call as default. This way we can avoid stale and wrong data and ids. The performance implications of using returning: true seems to be insignificant and the benefits are beyond great.

That's all for today, I hope you have enjoyed and thanks for reading!

At Hashrocket, we love Elixir and Phoenix! Reach out if you need help with your Elixir projects!

[ecto upsert]: https://hexdocs.pm/ecto/Ecto.Repo.html#c:insert/2-upserts [postgresql upsert]: https://hashrocket.com/blog/posts/upsert-records-with-postgresql-9-5

Photo by Aperture Vintage on Unsplash