Heading image for post: Running Out Of IDs

PostgreSQL

Running Out Of IDs

Profile picture of Josh Branchaud

People have fingerprints. Products have bar codes. Cars have vehicle identification numbers (VINs). Internet connected devices have IP addresses. Anything that needs to be uniquely identifiable tends to have a mechanism for uniquely identifying it. It makes sense. This is particularly important for the data that we store as part of the applications we build. Every user, every product, every financial transaction should all have some kind of identifier that we can use to uniquely identify it.

The limits of these identifiers is the subject of this post.

Though we can utilize fancier schemes like UUIDs and composite keys, the most common approach to uniquely identifying records is an id -- an integer that we guarantee is unique. But where do these unique integer ids come from?

A Sequence

In PostgreSQL, we generally create an id column as a primary key using serial.

create table products (
  id serial primary key,
  ...
);

The serial numeric data type is what we call an auto-incrementing integer. Without going into all of the details of what serial does, I will point out that upon creating our table, we also create a sequence -- products_id_seq.

A sequence is a special kind of table with a single-row that we use to keep track of what unique integer we should assign next when adding a record to a table.

Let's take a look at the description of our new products_id_seq.

> \d products_id_seq
       Sequence "public.products_id_seq"
    Column     |  Type   |        Value
---------------+---------+---------------------
 sequence_name | name    | products_id_seq
 last_value    | bigint  | 1
 start_value   | bigint  | 1
 increment_by  | bigint  | 1
 max_value     | bigint  | 9223372036854775807
 min_value     | bigint  | 1
 cache_value   | bigint  | 1
 log_cnt       | bigint  | 0
 is_cycled     | boolean | f
 is_called     | boolean | f
Owned by: public.products.id

The sequence tells us a bunch of things. The parts that are interesting to us right now are last_value and increment_by. But before we can make any sense of those two values, we need to check out the nextval function.

Why is it that we care about the nextval function? Well, it is being used to generate the default value for our id column.

> \d products
                         Table "public.products"
 Column |  Type   |                       Modifiers
--------+---------+-------------------------------------------------------
 id     | integer | not null default nextval('products_id_seq'::regclass)
 ...

Each time we insert a record into our products table (without specifying the id), nextval is invoked to determine the next available value in our sequence. The name of the sequence to be used is specified as the argument. nextval looks at the last_value and increment_by columns for the specified sequence to compute the next value that it should return.

When nextval is invoked for the first time, the is_called column is still false, so the start_value (in this case, 1) is returned. The is_called column is then toggled to true. Now that is_called is true, all subsequent invocations of nextval on that sequence will add the value of the increment_by column to the value of the last_value column. The last_value column is updated accordingly and this value is returned to whoever called it -- in our case, it is used as the default value for our id column in products.

If we wanted to do something fancy and non-conventional with our sequence, we could alter parts of it. For example, if we only wanted to assign even numbered id values, we could set the start_value to be 2 and change the increment_by value to be 2. In this case, each invocation of nextval would yield an even number, monotonically increasing 2 at a time.

We could even setup a sequence that starts at 1,000,000 and counts backward by 1 until it reaches -1,000,000. I can't imagine why, but it is nice to know we have the flexibility.

Generally speaking, we can utilize our sequence and the default value of our id column as is.

Exceeding A Sequence

Another column that appears in the description of our sequence is max_value. A little curiosity may lead us to wonder, "What happens when we reach that max_value?"

Let's find out.

Counting to and exceeding 9223372036854775807 (a number I don't know how to pronounce) is going to take longer than you or I have time for. To make this experiment a bit more manageable, let's alter our sequence a bit.

> alter sequence products_id_seq maxvalue 3;
ALTER SEQUENCE

The max_value of our sequence is now 3. Let's do a couple inserts.

> insert into products default values;
INSERT 0 1

> insert into products default values;
INSERT 0 1

> insert into products default values;
INSERT 0 1

Three insertions means we've reached our max_value of 3, so what happens if we do one more insert?

> insert into products default values;
ERROR:  nextval: reached maximum value of sequence "products_id_seq" (3)

Reaching the "maximum value" of our sequence is a real problem in a production system. This means we will no longer be able to insert records into the relevant table until we've dealt with the issue.

How do we remedy this situation? Well, that is a topic for another blog post.

A better question for now is, how likely or practical is it that we do exceed our sequence?

9 Quintillion

I've since looked up the pronunciation. A bigint can support counting up to about 9.2 quintillion. That is a lot of records.

So, you are concerned about reaching 9.2 quintillion unique IDs for the records in your database? That is kind of like being concerned about reaching $9.2 quintillion in your bank account. Generally speaking, this is what you might call "a nice problem to have." In other words, running a business that is pushing the bounds of your software, hardware, etc. hopefully means business is good.

But what does this look like in practical terms?

Let's say business is so good that we are inserting 10,000 records per minute into our table. So, how long would it take to max out our sequence? Let's compute it in years.

> select 9200000000000000000 / (10000 * 365 * 24 * 60::bigint) as years;
   years
------------
 1750380517

That's a lot of years.

This means that the size of your storage, the write speed of your hardware, and the lifespan of both your business and the human race are all considerations before exceeding the sequence.

There is one minor issue to consider though.

When you generate a sequence as we did at the beginning of this post, using serial, a bigint (64 bit) is used by the sequence, but the id column only utilizes integer (32 bit). Let's do the same math, but compute it in weeks.

> select 2100000000 / (10000 * 7 * 24 * 60::bigint) as weeks;
 weeks
-------
    20

Twenty weeks is a much different story.

So, when creating tables using serial, we may not be in danger of exceeding our sequence, but if we have serious amounts of incoming data, we may max out our id column.

Closing Thoughts

Running out of IDs is probably not a concern in most web applications, but it is important to know the limitations of the software on which you depend. Keep on using serial for most use cases and keep bigserial in your back pocket if a real need arises.

Also, don't forget that the UUID type is a solid option for primary keys, even at scale.

More posts about sql PostgreSQL