PostgreSQL
Generate Dates in PostgreSQL
PostgreSQL has shipped with the generate_series()
function for quite some
time. This function is often used to generate series of numeric
data. For
instance, in my post, Understanding Common Table Expressions with
FizzBuzz,
I used it to generate a series of integers from 1 to 100.
As of PostgreSQL 8.4, support for generating series of timestamp data was added. I don't see this used often, so let's take a look at it.
We could take a peak at the online documentation to see the function
signature, or we could pull it up in psql
using \df
.
> \df generate_series()
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+-----------------+-----------------------------------+--------------------------------------------------------------------+--------
pg_catalog | generate_series | SETOF bigint | bigint, bigint | normal
pg_catalog | generate_series | SETOF bigint | bigint, bigint, bigint | normal
pg_catalog | generate_series | SETOF integer | integer, integer | normal
pg_catalog | generate_series | SETOF integer | integer, integer, integer | normal
pg_catalog | generate_series | SETOF numeric | numeric, numeric | normal
pg_catalog | generate_series | SETOF numeric | numeric, numeric, numeric | normal
pg_catalog | generate_series | SETOF timestamp with time zone | timestamp with time zone, timestamp with time zone, interval | normal
pg_catalog | generate_series | SETOF timestamp without time zone | timestamp without time zone, timestamp without time zone, interval | normal
The last two records are what we are looking for -- support for timestamps with and without time zones.
Notice that each requires three arguments. The first two are the lower and upper bound timestamps of the series to be generated. The interval specifies what amount of spacing to put between each timestamp when generating the series.
Let's try it out by generating the series of days in 2017.
> select generate_series(
(date '2017-01-01')::timestamp,
(date '2017-12-31')::timestamp,
interval '1 day'
);
generate_series
---------------------
2017-01-01 00:00:00
2017-01-02 00:00:00
2017-01-03 00:00:00
2017-01-04 00:00:00
2017-01-05 00:00:00
2017-01-06 00:00:00
2017-01-07 00:00:00
2017-01-08 00:00:00
2017-01-09 00:00:00
2017-01-10 00:00:00
2017-01-11 00:00:00
2017-01-12 00:00:00
...
Take note that we have to satisfy the function signature, so we create a
date using date
and then coerce it to a timestamp
. There are 365
results, so I've truncate them a bit.
PostgreSQL understands the calendar, so you can even count on proper handling of concepts like a leap year.
> select * from (
select generate_series(
(date '2020-01-01')::timestamp,
(date '2020-12-31')::timestamp,
interval '1 day'
)
) as twenty_twenty(d)
where date_part('month', twenty_twenty.d) = 2;
d
---------------------
2020-02-01 00:00:00
2020-02-02 00:00:00
2020-02-03 00:00:00
2020-02-04 00:00:00
2020-02-05 00:00:00
2020-02-06 00:00:00
2020-02-07 00:00:00
2020-02-08 00:00:00
2020-02-09 00:00:00
2020-02-10 00:00:00
2020-02-11 00:00:00
2020-02-12 00:00:00
2020-02-13 00:00:00
2020-02-14 00:00:00
2020-02-15 00:00:00
2020-02-16 00:00:00
2020-02-17 00:00:00
2020-02-18 00:00:00
2020-02-19 00:00:00
2020-02-20 00:00:00
2020-02-21 00:00:00
2020-02-22 00:00:00
2020-02-23 00:00:00
2020-02-24 00:00:00
2020-02-25 00:00:00
2020-02-26 00:00:00
2020-02-27 00:00:00
2020-02-28 00:00:00
2020-02-29 00:00:00
We repurposed our previous query for the year 2020. After wrapping it in a
subquery, we are able to filter what we are looking at, in this case, just
the month of February (i.e. when the month is 2
). Notice it has 29 days!
We've only look at intervals of '1 day'
so far, but we can do whatever
interval we want. How about every 3 days?
> select generate_series(
(date '2017-01-01')::timestamp,
(date '2017-12-31')::timestamp,
interval '3 days'
);
generate_series
---------------------
2017-01-01 00:00:00
2017-01-04 00:00:00
2017-01-07 00:00:00
2017-01-10 00:00:00
2017-01-13 00:00:00
2017-01-16 00:00:00
2017-01-19 00:00:00
2017-01-22 00:00:00
...
Or every month?
> select generate_series(
(date '2017-01-01')::timestamp,
(date '2017-12-31')::timestamp,
interval '1 month'
);
generate_series
---------------------
2017-01-01 00:00:00
2017-02-01 00:00:00
2017-03-01 00:00:00
2017-04-01 00:00:00
2017-05-01 00:00:00
2017-06-01 00:00:00
2017-07-01 00:00:00
2017-08-01 00:00:00
2017-09-01 00:00:00
2017-10-01 00:00:00
2017-11-01 00:00:00
2017-12-01 00:00:00
We can even use a really odd interval of time.
> select generate_series(
(date '2017-01-01')::timestamp,
(date '2017-12-31')::timestamp,
interval '1 month 1 day 1 hour'
);
generate_series
---------------------
2017-01-01 00:00:00
2017-02-02 01:00:00
2017-03-03 02:00:00
2017-04-04 03:00:00
2017-05-05 04:00:00
2017-06-06 05:00:00
2017-07-07 06:00:00
2017-08-08 07:00:00
2017-09-09 08:00:00
2017-10-10 09:00:00
2017-11-11 10:00:00
2017-12-12 11:00:00
PostgreSQL's date and timestamp capabilities are quite powerful. What we've
looked at above can be put to great use in report generation. For instance,
if you need aggregate data for certain intervals of time, you can join your
timestamped data against any generate_series
table. Instead of pulling
large amounts of data into application land for processing, it can be done
in the database. Your CPU and your users will thank you.
Cover image by bady qb on Unsplash.com