Hashrocket.com / blog

Large jack

16 Tips from the 2014 Winter Miniconf

posted on and written by in

Image 100x100 jack christensen

Hashrocket hosts an internal conference twice a year called Miniconf where we take a day to share ideas with each other. This last Miniconf I gave a quick 20 minute talk with a grab bag of Linux, shell, PostgreSQL, and Ruby tricks. This article is that talk converted to text format.

Autojump

Autojump is a quick way to navigate the file system. It keeps a database of where you have been in the file system and uses that database to jump back based on partial paths. For example, Go projects tend to have deep directory structures. With autojump it is easy to go to a favorite directory.

jack@hk-4~$ j pgx
/Users/jack/dev/go/src/github.com/jackc/pgx
jack@hk-4~/dev/go/src/github.com/jackc/pgx$

Autojump is available via apt-get and homebrew.

Etckeeper

Etckeeper tracks the /etc directory in Git. This eliminates the need to make manual backups of config files before editing. On Debian platforms it hooks into the apt-get process and automatically adds and commits any changes package installation or removal caused.

Shell Brace Expansion

It is often necessary to run a command with multiple arguments that only differ by a few letters. Brace expansion saves key strokes by eliminating the need to retype the duplicate portions. For example, to change the extension on a file:

jack@hk-4/tmp/example$ touch test.foo
jack@hk-4/tmp/example$ mv test.{foo,bar}
jack@hk-4/tmp/example$ ls
test.bar

Brace expansion is supported in bash and several other popular shells.

Shell Command Quick Substitution

It is common to need to run a command very similar to the previous command. Sometimes a sequence of commands are similar, and sometimes simple typo needs to be corrected. Quick substitution performs a find and replace on the previous command and reruns it.

jack@hk-4/tmp/example$ echo 'Hello, world'
Hello, world
jack@hk-4/tmp/example$ ^Hello^Goodbye
echo 'Goodbye, world'
Goodbye, world

Tail Multiple Files

I had used tail for years with the -f option to watch log files, but somehow never realized that it could tail multiple files simultaneously. This is invaluable when you expect something to be logged, but you do not know to what file it will be logged.

jack@hk-4/var/log$ tail -f *

The -n 0 option is sometimes also useful to avoid the initial wall of input from the last ten lines of every file.

Auto-formatting psql Output

The default output of psql can be hard to read when the output rows are wider than the terminal.

jack=# select * from information_schema.columns;
table_catalog |
table_schema    |              table_name               |
column_name             | ordinal_position |              column_default
| is_nullable |        data_type         | character_maximum_length |
character_octet_length | numeric_precision | numeric_precision_radix |
numeric_scale | datetime_precision | interval_type | interval_precision |
character_set_catalog | character_set_schema | character_set_name |
collation_catalog | collation_schema | collation_name | domain_catalog |
domain_schema    |   domain_name   | udt_catalog | udt_schema |   udt_name   |
scope_catalog | scope_schema | scope_name | maximum_cardinality | dtd_identifier
| is_self_referencing | is_identity | identity_generation | identity_start |
identity_increment | identity_maximum | identity_minimum | identity_cycle |
is_generated | generation_expression | is_updatable ---------------+------------
--------+---------------------------------------+-------------------------------
------+------------------+-------------------------------------------+----------
---+--------------------------+--------------------------+----------------------
--+-------------------+-------------------------+---------------+---------------
-----+---------------+--------------------+-----------------------+-------------
---------+--------------------+-------------------+-
<snip>

Fortunately, psql can automatically switch between vertical and horizontal layout based on the width of output.

jack=# \x auto
Expanded display is used automatically.
jack=# select * from information_schema.columns;
-[ RECORD 1 ]------------+------------------------------------------
table_catalog            | jack
table_schema             | public
table_name               | alarms
column_name              | id
ordinal_position         | 1
column_default           | nextval('alarms_id_seq'::regclass)
is_nullable              | NO
data_type                | integer
character_maximum_length |
<snip>

Timing Query Execution with psql

psql makes it easy to measure how long a query takes to run. Simple enable it with \timing.

jack=# \timing
Timing is on.
jack=# select count(*) from information_schema.columns;
 count
-------
  1566
(1 row)

Time: 10.577 ms

Saving psql Settings

It would be nice not to have to manually enable \timing and \x auto every time you load psql. The .psqlrc file is the solution. It is executed on startup by psql.

jack@hk-4~$ cat .psqlrc
\x auto
\timing
jack@hk-4~$ psql
Expanded display is used automatically.
Timing is on.
psql (9.3.5)
Type "help" for help.

jack=#

Date and Time Ranges

Time periods are frequently stored in databases for resource allocation, reservation, and logging. The typical way of storing this would be to use two columns, one for the start and one for the end of the range. While this works to some extent, the PostgreSQL range types offer a number of advantages. For example, there is nothing to stop an end_date column from being less than a start_date column. But with a PostgreSQL daterange that is impossible.

jack=# select daterange('2014-12-12', '2014-12-01');
ERROR:  range lower bound must be less than or equal to range upper bound

Ranges also have a number of operators to test inclusion of an element or if another range overlaps. For example, the code below uses the contains element operator (@>).

jack=# select daterange('2014-12-01', '2014-12-12') @> '2014-12-06'::date;
 ?column?
----------
 t
(1 row)

Exclusion Constraints

Ranges are especially valuable when combined with exclusion constraints. For example, with this combination PostgreSQL can prevent duplicate overlapping room reservations.

First, we need to install the btree_gist extension to make an exclusion constraint that includes an integer type.

jack=# create extension btree_gist;
CREATE EXTENSION

Now, we will create a table with the exclusion constraint. The constraint will exclude any rows from being inserted where an existing row matches based on room_name equality and date_range overlap.

jack=# create table reservations(
jack(#   id serial primary key,
jack(#   room_name text not null,
jack(#   guest_name text not null,
jack(#   date_range daterange not null,
jack(#   exclude using gist (room_name with =, date_range with &&)
jack(# );
CREATE TABLE

We will insert a record.

jack=# insert into reservations(room_name, guest_name, date_range)
jack-#   values('101', 'George', daterange('2014-12-01', '2014-12-05'));
INSERT 0 1

Finally, we will insert a record that overlaps, but PostgreSQL will stop it.

jack=# insert into reservations(room_name, guest_name, date_range)
jack-#   values('101', 'John', daterange('2014-12-03', '2014-12-07'));
ERROR:  conflicting key value violates exclusion constraint "reservations_room_name_date_range_excl"
DETAIL:  Key (room_name, date_range)=(101, [2014-12-03,2014-12-07)) conflicts with existing key (room_name, date_range)=(101, [2014-12-01,2014-12-05)).

Building JSON in SQL

PostgreSQL has many JSON functions and operators for JSON generation. One of the most useful is row_to_json. Continuing with the previous example of reservations, let's turn the row into JSON.

jack=# select row_to_json(r) from reservations r;
-[ RECORD 1 ]----------------------------------------------------------------------------------------
row_to_json | {"id":1,"room_name":"101","guest_name":"George","date_range":"[2014-12-01,2014-12-05)"}

Sometimes we want to group many rows into a single JSON array. This is where json_agg comes in handy.

Let's insert a few more reservations.

jack=# insert into reservations(room_name, guest_name, date_range)
jack-#   values('102', 'John', daterange('2014-12-03', '2014-12-07'));
INSERT 0 1
jack=# insert into reservations(room_name, guest_name, date_range)
jack-#   values('101', 'Thomas', daterange('2014-12-05', '2014-12-10'));
INSERT 0 1

Now lets get all the records as a single JSON array.

jack=# select json_agg(row_to_json(r)) from reservations r;
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
json_agg | [{"id":1,"room_name":"101","guest_name":"George","date_range":"[2014-12-01,2014-12-05)"}, {"id":3,"room_name":"102","guest_name":"John","date_range":"[2014-12-03,2014-12-07)"}, {"id":4,"room_name":"101","guest_name":"Thomas","date_range":"[2014-12-05,2014-12-10)"}]

See Faster JSON Generation with PostgreSQL for more information.

Extracting JSON in SQL

Sometimes you need to extract data from JSON and operate on it relationally. PostgreSQL has functions and operators for JSON extraction.

Let's start with a simple JSON object store.

jack=# create table objects(
jack(#   id serial primary key,
jack(#   body json not null
jack(# );
CREATE TABLE

Then add some records.

jack=# insert into objects(body) values
jack-#   ('{"name": "John", "age": 30, "weight": 180, "gender": "male"}'),
jack-#   ('{"name": "Emily", "gender": "female"}'),
jack-#   ('{"name": "George", "occupation": "developer"}');
INSERT 0 3

json_each is a set-returning function that returns a row for each key/value pair in the JSON object.

jack=# select *
jack-# from json_each((select body from objects where id=1));
  key   | value
--------+--------
 name   | "John"
 age    | 30
 weight | 180
 gender | "male"
(4 rows)

Lateral Joins

To continue the previous example, what if we wanted to count all the keys in all the rows in the objects table? Lateral joins are the solution. A lateral join lets a table expression refer to columns from previous from items.

jack=# select key, count(*)
jack-# from objects
jack-#   cross join lateral json_each(objects.body)
jack-# group by key;
    key     | count
------------+-------
 occupation |     1
 gender     |     2
 weight     |     1
 name       |     3
 age        |     1
(5 rows)

Note that in this particular case the lateral key word is optional -- it is included for clarity.

Generating Series

PostgreSQL makes it easy to generate a series of numbers.

jack=# select *
jack-# from generate_series(1, 5);
 generate_series
-----------------
               1
               2
               3
               4
               5
(5 rows)

It also can work with times.

jack=# select *
jack-# from generate_series(
jack(#   '2014-01-01'::timestamptz,
jack(#   '2014-01-03'::timestamptz,
jack(#   '8 hours'
jack(# );
    generate_series
------------------------
 2014-01-01 00:00:00-06
 2014-01-01 08:00:00-06
 2014-01-01 16:00:00-06
 2014-01-02 00:00:00-06
 2014-01-02 08:00:00-06
 2014-01-02 16:00:00-06
 2014-01-03 00:00:00-06
(7 rows)

Executing Javascript in PostgreSQL

Occasionally, it may be desirable to execute Javascript in the database. With the plv8 extension this is possible.

Here is a simple example of a name formatter function in JSON.

dev=# create extension plv8;
CREATE EXTENSION
dev=#
dev=# create function format_name(first_name text, middle_name text, last_name text)
dev-# returns text AS $$
dev$#
dev$#   var name = first_name
dev$#
dev$#   if(middle_name) {
dev$#     name += " " + middle_name
dev$#   }
dev$#
dev$#   name += " " + last_name
dev$#
dev$#   return name
dev$# $$ language plv8 immutable;
CREATE FUNCTION
dev=#
dev=# select format_name('John', NULL, 'Smith');
 format_name
-------------
 John Smith
(1 row)

dev=#
dev=# select format_name('John', 'Mark', 'Smith');
   format_name
-----------------
 John Mark Smith
(1 row)

plv8 is installable via apt-get and is part of Postgres.app. Unfortunately, it is not available via homebrew.

Extracting Data with the Ruby Flip-flop

Sometimes you may need to search a huge file for a starting delimiter and extract data until you get to a ending delimiter. For example, a SQL dump may include millions of lines and you may need to extract the lines for a single table.

Ruby has got it covered and it can do it in one line! This will involve a lot of Ruby shortcuts so we will build up one step at a time.

To start with, the ruby binary takes a -e option that will execute ruby code.

jack@hk-3~$ ruby -e 'puts 42'
42

Next we will create a test file to work with. Save the following as giant.log.

bunch of data we don't care about
bla
bla
bla

BEGIN
we care about this
END

adfkjl
adsf

The -n option tells Ruby to assume a while gets loop around the code we pass with -e. $_ is a magic variable that stores the last value returned by gets.

jack@hk-3~$ ruby -n -e 'puts $_' < giant.log
bunch of data we don't care about
bla
bla
bla

BEGIN
we care about this
END

adfkjl
adsf

At this point, all we have done is duplicate cat. But we can do more. The first idea would be to only print if we match a regular expression. If we use a suffix if with a regex then Ruby automatically compares it with $_.

jack@hk-3~$ ruby -n -e 'puts $_ if /BEGIN/' < giant.log
BEGIN

This is getting closer. But we want from BEGIN to END. Enter the flip-flop operator.

jack@hk-3~$ ruby -n -e 'puts $_ if /BEGIN/../END/' < giant.log
BEGIN
we care about this
END

The flip-flop operator looks like a range, but it isn't. Instead it evaluates to true when the first condition is met and stays true until the last condition is met.

Summary

I use some of these almost every day. Others, many months will go by between uses. But I have found all of them very useful. I hope you do too.

Posted in Development and tagged with postgresql shell linux ruby