Heading image for post: Ruby on the Command Line

Ruby on the Command Line

Profile picture of Jack Christensen

Ruby is strongly identified with Rails and web development. But Ruby is far more than a web language. Ruby has a rich set of tools for shell scripting and text processing.

Let's start by executing a Ruby command. The -e argument tells Ruby to run the provided code and exit.

$ ruby -e 'puts "Hello, world."'
Hello, world.

Multiple statements can be separated by a ;.

$ ruby -e 'puts "Hello, world."; puts "It is now #{Time.now}."'
Hello, world.
It is now 2017-08-08 16:43:56 -0500.

Or multiple -e arguments can be provided.

$ ruby -e 'puts "Hello, world."' -e 'puts "It is now #{Time.now}."'
Hello, world.
It is now 2017-08-08 16:45:10 -0500.

Reading Files

Where this really becomes valuable is to easily process text. Ruby is a Swiss army knife of text processing tools that can be used in place of sed and awk. For the next examples we will use the text of Ulysses by Alfred Lord Tennyson.

We'll start by printing the nth line of a file. STDIN is an IO object. It has a readlines method that will return the entire file as an array of lines. Remember that arrays are zero-indexed in Ruby so this actually prints the 59th line.

$ ruby -e 'puts STDIN.readlines[58]' < ulysses.txt
‘Tis not too late to seek a newer world.

Sometimes it would be useful if our script could read from STDIN or files passed as arguments. ARGF merges these concepts. We can read from ARGF instead and Ruby will actually read from STDIN or files passed on the command line as necessary.

$ ruby -e 'puts ARGF.readlines[58]' < ulysses.txt
‘Tis not too late to seek a newer world.

Note there is no redirection of STDIN below.

$ ruby -e 'puts ARGF.readlines[58]' ulysses.txt
‘Tis not too late to seek a newer world.

Extracting the First Lines of a File

We can duplicate the functionality of head by using first(n).

$ ruby -e 'puts ARGF.readlines.first(5)' ulysses.txt
It little profits that an idle king,
By this still hearth, among these barren crags,
Matched with an aged wife, I mete and dole
Unequal laws unto a savage race,
That hoard, and sleep, and feed, and know not me.

Extracting the Last Lines of a File

tail can also be duplicated by using last(n).

$ ruby -e 'puts ARGF.readlines.last(5)' ulysses.txt
We are not now that strength which in old days
Moved earth and heaven, that which we are, we are,
One equal temper of heroic hearts,
Made weak by time and fate, but strong in will
To strive, to seek, to find, and not to yield.

Printing Matches to a Regular Expression

Oftentimes we need to process arguments line by line such as done by grep. This is facilited by the -n flag. It wraps a loop around your script like so: while gets; <your script here>; end. Ruby sets the magic global $_ to the last line returned from gets. Combining these gives us a one-liner similar to grep.

$ ruby -n -e 'puts $_ if $_ =~ /\bold\b/i' ulysses.txt
Free hearts, free foreheads—you and I are old;
Old age hath yet his honor and his toil.
We are not now that strength which in old days

The conditional can further be shortened by omitting $_ =~. This works because Ruby considers a lone regex in a conditional to be a match against $_.

$ ruby -n -e 'puts $_ if /\bold\b/i' ulysses.txt
Free hearts, free foreheads—you and I are old;
Old age hath yet his honor and his toil.
We are not now that strength which in old days

Finding and Replacing Text

The -p flag works the same as -n except it also prints $_ at the end of the loop. This can be used to find and replace. Note that gsub! is used as we need to mutate $_.

$ ruby -p -e '$_.gsub!(/\bking\b/, "monarch")' ulysses.txt
It little profits that an idle monarch,
By this still hearth, among these barren crags,
...

Processing Tabular Data

The -a flag can be used with -n or -p to auto-split each line. It calls $F = $_.split before each loop. This can be used to easily extract columnar data. In this example we will print the 3rd word of each line.

$ ruby -n -a -e 'puts $F[2]' ulysses.txt
profits
still
an
...

Extracting Lines Between Starting and Ending Delimiters

Sometimes we need to extract lines from a file starting with one delimiter and ending with another. The little-known flip-flop operator is the solution. While it appears to be a range, the flip-flop is Ruby syntax that becomes true when the first condition is triggered and stays true until after the last condition is triggered.

$ ruby -n -e 'puts $_ if /^Old/../^Not/' ulysses.txt
Old age hath yet his honor and his toil.
Death closes all; but something ere the end,
Some work of noble note, may yet be done,
Not unbecoming men that strove with gods.

Counting Words

What if we wanted to do some sort of aggregate computation such counting the number of words? If the file is small we could read and process the entire file at once.

$ ruby -e 'puts ARGF.read.split.size' ulysses.txt
564

However, the previous solution would be impractical if the source file was exceptionally large. In that case we would need to process the file line by line. But we need to do some work before all lines and after all lines. This can be accomplished with BEGIN and END. BEGIN and END run the code contained in braces at the beginning and end of the program respectively.

ruby -n -a -e 'BEGIN { wc = 0 }' -e 'wc += $F.size' -e 'END { puts wc }' ulysses.txt

Computing Word Frequency

Now let's try something even more advanced. Let's compute the frequency of the words used in Ulysses and print only the words that appear more than 10 times. In the BEGIN block we create a hash that will hold the count of words. We give it a default value of 0. In the body of the implicit loop we add one for each occurance of a word. In the END block we filter, sort, and print the word frequency data.

$ ruby -n -a -e 'BEGIN { words = Hash.new(0) }' \
  -e '$F.each { |w| words[w] += 1 }' \
  -e 'END { words.select {|k,v| v > 10}.sort_by { |k,v| [-v, k] }.each { |k,v| puts "#{k} - #{v}" } }' \
  ulysses.txt
and - 27
the - 24
to - 17
I - 14
of - 12

Conclusion

Ruby is an expressive, full-featured programming language, but it also has a lot of (Perl-inspired) shorthand that make it convenient to use at the shell.