Running Out Of IDs
People have fingerprints. Products have bar codes. Cars have vehicle identification numbers (VINs). Internet connected devices have IP addresses. Anything that needs to be uniquely identifiable tends to have a mechanism for uniquely identifying it. It makes sense. This is particularly important for the data that we store as part of the applications we build. Every user, every product, every financial transaction should all have some kind of identifier that we can use to uniquely identify it.
The limits of these identifiers is the subject of this post.
Though we can utilize fancier schemes like UUIDs and composite keys, the
most common approach to uniquely identifying records is an
id -- an
integer that we guarantee is unique. But where do these unique integer
In PostgreSQL, we generally create an
id column as a primary key using
create table products ( id serial primary key, ... );
serial numeric data type is what we call an auto-incrementing integer.
Without going into all of the details of what
serial does, I will point
out that upon creating our table, we also create a sequence --
A sequence is a special kind of table with a single-row that we use to keep track of what unique integer we should assign next when adding a record to a table.
Let's take a look at the description of our new
> \d products_id_seq Sequence "public.products_id_seq" Column | Type | Value ---------------+---------+--------------------- sequence_name | name | products_id_seq last_value | bigint | 1 start_value | bigint | 1 increment_by | bigint | 1 max_value | bigint | 9223372036854775807 min_value | bigint | 1 cache_value | bigint | 1 log_cnt | bigint | 0 is_cycled | boolean | f is_called | boolean | f Owned by: public.products.id
The sequence tells us a bunch of things. The parts that are interesting to
us right now are
increment_by. But before we can make any
sense of those two values, we need to check out the
Why is it that we care about the
nextval function? Well, it is being used
to generate the default value for our
> \d products Table "public.products" Column | Type | Modifiers --------+---------+------------------------------------------------------- id | integer | not null default nextval('products_id_seq'::regclass) ...
Each time we insert a record into our
products table (without specifying
nextval is invoked to determine the next available value in our
sequence. The name of the sequence to be used is specified as the
nextval looks at the
increment_by columns for
the specified sequence to compute the next value that it should return.
nextval is invoked for the first time, the
is_called column is
false, so the
start_value (in this case,
1) is returned. The
is_called column is then toggled to
true. Now that
is_called is true,
all subsequent invocations of
nextval on that sequence will add the value
increment_by column to the value of the
last_value column. The
last_value column is updated accordingly and this value is returned to
whoever called it -- in our case, it is used as the default value for our
id column in
If we wanted to do something fancy and non-conventional with our sequence,
we could alter parts of it. For example, if we only wanted to assign even
id values, we could set the
start_value to be
2 and change
increment_by value to be
2. In this case, each invocation of
nextval would yield an even number, monotonically increasing 2 at a time.
We could even setup a sequence that starts at
1,000,000 and counts
1 until it reaches
-1,000,000. I can't imagine why, but it
is nice to know we have the flexibility.
Generally speaking, we can utilize our sequence and the default value of our
id column as is.
Exceeding A Sequence
Another column that appears in the description of our sequence is
max_value. A little curiosity may lead us to wonder, "What happens when we
Let's find out.
Counting to and exceeding
9223372036854775807 (a number I don't know how
to pronounce) is going to take longer than you or I have time for. To make
this experiment a bit more manageable, let's alter our sequence a bit.
> alter sequence products_id_seq maxvalue 3; ALTER SEQUENCE
max_value of our sequence is now
3. Let's do a couple inserts.
> insert into products default values; INSERT 0 1 > insert into products default values; INSERT 0 1 > insert into products default values; INSERT 0 1
Three insertions means we've reached our
3, so what happens
if we do one more
> insert into products default values; ERROR: nextval: reached maximum value of sequence "products_id_seq" (3)
Reaching the "maximum value" of our sequence is a real problem in a production system. This means we will no longer be able to insert records into the relevant table until we've dealt with the issue.
How do we remedy this situation? Well, that is a topic for another blog post.
A better question for now is, how likely or practical is it that we do exceed our sequence?
I've since looked up the pronunciation. A
bigint can support counting up
to about 9.2 quintillion. That is a lot of records.
So, you are concerned about reaching 9.2 quintillion unique IDs for the records in your database? That is kind of like being concerned about reaching $9.2 quintillion in your bank account. Generally speaking, this is what you might call "a nice problem to have." In other words, running a business that is pushing the bounds of your software, hardware, etc. hopefully means business is good.
But what does this look like in practical terms?
Let's say business is so good that we are inserting 10,000 records per minute into our table. So, how long would it take to max out our sequence? Let's compute it in years.
> select 9200000000000000000 / (10000 * 365 * 24 * 60::bigint) as years; years ------------ 1750380517
That's a lot of years.
This means that the size of your storage, the write speed of your hardware, and the lifespan of both your business and the human race are all considerations before exceeding the sequence.
There is one minor issue to consider though.
When you generate a sequence as we did at the beginning of this post, using
bigint (64 bit) is used by the sequence, but the
integer (32 bit). Let's do the same math, but compute it in
> select 2100000000 / (10000 * 7 * 24 * 60::bigint) as weeks; weeks ------- 20
Twenty weeks is a much different story.
So, when creating tables using
serial, we may not be in danger of
exceeding our sequence, but if we have serious amounts of incoming data, we
may max out our
Running out of IDs is probably not a concern in most web applications, but
it is important to know the limitations of the software on which you depend.
Keep on using
serial for most use cases and keep
bigserial in your back
pocket if a real need arises.
Also, don't forget that the
UUID type is a solid option for primary
even at scale.