Hashrocket.com / blog

Faster JSON Generation with PostgreSQL

posted on and written by in

Image 100x100 jack christensen

A new feature in PostgreSQL 9.2 is JSON support. It includes a JSON data type and two JSON functions. These allow us to return JSON directly from the database server. This article covers how it is done and includes a benchmark comparing it with traditional Rails JSON generation techniques.

How To

The simplest way to return JSON is with row_to_json() function. It accepts a row value and returns a JSON value.

select row_to_json(words) from words;

This will return a single column per row in the words table.

{"id":6013,"text":"advancement","pronunciation":"advancement",...}

However, sometimes we only want to include some columns in the JSON instead of the entire row. In theory we could use the row constructor method.

select row_to_json(row(id, text)) from words;

While this does return only the id and text columns, unfortunately it loses the field names and replaces them with f1, f2, f3, etc.

{"f1":6013,"f2":"advancement"}

To work around this we must either create a row type and cast the row to that type or use a subquery. A subquery will typically be easier.

select row_to_json(t)
from (
  select id, text from words
) t

This results in the JSON output for which we would hope:

    {"id":6013,"text":"advancement"}

The other commonly used technique is array_agg and array_to_json. array_agg is a aggregate function like sum or count. It aggregates its argument into a PostgreSQL array. array_to_json takes a PostgreSQL array and flattens it into a single JSON value.

    select array_to_json(array_agg(row_to_json(t)))
    from (
      select id, text from words
    ) t

This will result in a JSON array of objects:

    [{"id":6001,"text":"abaissed"},{"id":6002,"text":"abbatial"},{"id":6003,"text":"abelia"},...]

In exchange for a substantial jump in complexity, we can also use subqueries to return an entire object graph:

select row_to_json(t)
from (
  select text, pronunciation,
    (
      select array_to_json(array_agg(row_to_json(d)))
      from (
        select part_of_speech, body
        from definitions
        where word_id=words.id
        order by position asc
      ) d
    ) as definitions
  from words
  where text = 'autumn'

This could return a result like the following:

{
  "text": "autumn",
  "pronunciation": "autumn",
  "definitions": [
    {
        "part_of_speech": "noun",
        "body": "skilder wearifully uninfolded..."
    },
    {
        "part_of_speech": "verb",
        "body": "intrafissural fernbird kittly..."
    },
    {
        "part_of_speech": "adverb",
        "body": "infrugal lansquenet impolarizable..."
    }
  ]
}

Obviously, the SQL to generate this JSON response is far more verbose than generating it in Ruby. Let's see what we get in exchange.

Benchmarks

I created a sample benchmark application to test multiple JSON generation approaches. The sample domain is a dictionary. The source is at https://github.com/JackC/json_api_bench.

The first test is of an extremely light weight auto-complete search. The result set is simply an array of strings. I tested three approaches: loading the entire ActiveRecord domain model, using pluck, and using PostgreSQL (view source).

+-----------------------------+----------+
| Name                        | Reqs/Sec |
+-----------------------------+----------+
| Quick Search Domain         | 467.84   |
| Quick Search Pluck          | 496.89   |
| Quick Search PostgreSQL     | 540.54   |
+-----------------------------+----------+

Pluck should probably be the preferred approach in this case. While PostgreSQL is about 8% faster, the code is less clear.

The next test is of a slightly richer word search. It returns an array of objects that each include text, pronunciation, part of speech, and definition (view source).

+-----------------------------+----------+
| Name                        | Reqs/Sec |
+-----------------------------+----------+
| Rich Search Domain          | 322.58   |
| Rich Search Select All      | 418.85   |
| Rich Search PostgreSQL      | 500.00   |
+-----------------------------+----------+

In this case, select_all should still usually be preferred over PostgreSQL. The loss of clarity is not worth the 19% performance increase.

Now we get to a test of an entire object graph for a word. This returns an object with text, pronunciation, an array of definitions, an array of quotes, an array of synonyms, and an array of antonyms (view source)

+-----------------------------+----------+
| Name                        | Reqs/Sec |
+-----------------------------+----------+
| Definition Domain           | 130.72   |
| Definition PostgreSQL       | 457.14   |
+-----------------------------+----------+

Now things start to get more favorable for PostgreSQL. In exchange for a substantial block of SQL we get a 3.51x throughput increase.

Finally, we run a test of returning multiple definitions per call. This is a more synthetic benchmark to exercise heavy weight API responses (view source).

+-----------------------------+----------+
| Name                        | Reqs/Sec |
+-----------------------------+----------+
| Many Definitions Domain     | 25.82    |
| Many Definitions PostgreSQL | 330.58   |
+-----------------------------+----------+

For a large JSON object graph PostgreSQL JSON generation can offer well over 12x the throughput.

Conclusions

PostgreSQL is always faster than traditional Rails JSON generation, but the code is always more verbose. For simple responses that do not involve nested objects, the performance gain is insufficient to warrant the loss in code clarity. As the object graph increases in size and complexity, the performance gains become more and more attractive.

Posted in Development and tagged with PostgreSQL