PostgreSQL
Running Out Of IDs
People have fingerprints. Products have bar codes. Cars have vehicle identification numbers (VINs). Internet connected devices have IP addresses. Anything that needs to be uniquely identifiable tends to have a mechanism for uniquely identifying it. It makes sense. This is particularly important for the data that we store as part of the applications we build. Every user, every product, every financial transaction should all have some kind of identifier that we can use to uniquely identify it.
The limits of these identifiers is the subject of this post.
Though we can utilize fancier schemes like UUIDs and composite keys, the
most common approach to uniquely identifying records is an id
-- an
integer that we guarantee is unique. But where do these unique integer id
s
come from?
A Sequence
In PostgreSQL, we generally create an id
column as a primary key using
serial
.
create table products (
id serial primary key,
...
);
The serial
numeric data type is what we call an auto-incrementing integer.
Without going into all of the details of what serial
does, I will point
out that upon creating our table, we also create a sequence --
products_id_seq
.
A sequence is a special kind of table with a single-row that we use to keep track of what unique integer we should assign next when adding a record to a table.
Let's take a look at the description of our new products_id_seq
.
> \d products_id_seq
Sequence "public.products_id_seq"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | products_id_seq
last_value | bigint | 1
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 0
is_cycled | boolean | f
is_called | boolean | f
Owned by: public.products.id
The sequence tells us a bunch of things. The parts that are interesting to
us right now are last_value
and increment_by
. But before we can make any
sense of those two values, we need to check out the nextval
function.
Why is it that we care about the nextval
function? Well, it is being used
to generate the default value for our id
column.
> \d products
Table "public.products"
Column | Type | Modifiers
--------+---------+-------------------------------------------------------
id | integer | not null default nextval('products_id_seq'::regclass)
...
Each time we insert a record into our products
table (without specifying
the id
), nextval
is invoked to determine the next available value in our
sequence. The name of the sequence to be used is specified as the
argument. nextval
looks at the last_value
and increment_by
columns for
the specified sequence to compute the next value that it should return.
When nextval
is invoked for the first time, the is_called
column is
still false
, so the start_value
(in this case, 1
) is returned. The
is_called
column is then toggled to true
. Now that is_called
is true,
all subsequent invocations of nextval
on that sequence will add the value
of the increment_by
column to the value of the last_value
column. The
last_value
column is updated accordingly and this value is returned to
whoever called it -- in our case, it is used as the default value for our
id
column in products
.
If we wanted to do something fancy and non-conventional with our sequence,
we could alter parts of it. For example, if we only wanted to assign even
numbered id
values, we could set the start_value
to be 2
and change
the increment_by
value to be 2
. In this case, each invocation of
nextval
would yield an even number, monotonically increasing 2 at a time.
We could even setup a sequence that starts at 1,000,000
and counts
backward by 1
until it reaches -1,000,000
. I can't imagine why, but it
is nice to know we have the flexibility.
Generally speaking, we can utilize our sequence and the default value of our
id
column as is.
Exceeding A Sequence
Another column that appears in the description of our sequence is
max_value
. A little curiosity may lead us to wonder, "What happens when we
reach that max_value
?"
Let's find out.
Counting to and exceeding 9223372036854775807
(a number I don't know how
to pronounce) is going to take longer than you or I have time for. To make
this experiment a bit more manageable, let's alter our sequence a bit.
> alter sequence products_id_seq maxvalue 3;
ALTER SEQUENCE
The max_value
of our sequence is now 3
. Let's do a couple inserts.
> insert into products default values;
INSERT 0 1
> insert into products default values;
INSERT 0 1
> insert into products default values;
INSERT 0 1
Three insertions means we've reached our max_value
of 3
, so what happens
if we do one more insert
?
> insert into products default values;
ERROR: nextval: reached maximum value of sequence "products_id_seq" (3)
Reaching the "maximum value" of our sequence is a real problem in a production system. This means we will no longer be able to insert records into the relevant table until we've dealt with the issue.
How do we remedy this situation? Well, that is a topic for another blog post.
A better question for now is, how likely or practical is it that we do exceed our sequence?
9 Quintillion
I've since looked up the pronunciation. A bigint
can support counting up
to about 9.2 quintillion. That is a lot of records.
So, you are concerned about reaching 9.2 quintillion unique IDs for the records in your database? That is kind of like being concerned about reaching $9.2 quintillion in your bank account. Generally speaking, this is what you might call "a nice problem to have." In other words, running a business that is pushing the bounds of your software, hardware, etc. hopefully means business is good.
But what does this look like in practical terms?
Let's say business is so good that we are inserting 10,000 records per minute into our table. So, how long would it take to max out our sequence? Let's compute it in years.
> select 9200000000000000000 / (10000 * 365 * 24 * 60::bigint) as years;
years
------------
1750380517
That's a lot of years.
This means that the size of your storage, the write speed of your hardware, and the lifespan of both your business and the human race are all considerations before exceeding the sequence.
There is one minor issue to consider though.
When you generate a sequence as we did at the beginning of this post, using
serial
, a bigint
(64 bit) is used by the sequence, but the id
column
only utilizes integer
(32 bit). Let's do the same math, but compute it in
weeks.
> select 2100000000 / (10000 * 7 * 24 * 60::bigint) as weeks;
weeks
-------
20
Twenty weeks is a much different story.
So, when creating tables using serial
, we may not be in danger of
exceeding our sequence, but if we have serious amounts of incoming data, we
may max out our id
column.
Closing Thoughts
Running out of IDs is probably not a concern in most web applications, but
it is important to know the limitations of the software on which you depend.
Keep on using serial
for most use cases and keep bigserial
in your back
pocket if a real need arises.
Also, don't forget that the UUID
type is a solid option for primary
keys,
even at scale.