Bash scripting for the reluctant

October 25, 2014May 27, 2016

Should I read this?

This is a consolidated reference for helping those already familiar with bash to become more comfortable with it in order to leverage its strengths. It links to informative references that explain their subject matter better than I can.

Concepts and techniques

Writing robust scripts

There are several options for configuring bash to behave more sanely in the presence of surprises. Some commands also have options for running in a more fail-friendly manner.

set -u or set -o nounset
set -e or set -o errexit
set -o pipefail
mkdir -p
rm -f
quoting variable references, as in: "$@"
find -print0 ... | xargs -0 ...
trap
set -o noclobber

If anything in this list falls outside of your comfort zone, read this before continuing.

Also, unless your script intentionally makes use of pathname expansion (aka globbing), you should disable it via set -f. If you do make use of globbing, you should use shopt -s failglob to produce errors for non-matching patterns.

Use printf instead of echo

There are serious portability concerns with using echo that can lead to nasty surprises. Use the basic feature set of printf instead:

printf '%s\n' "$var"

Special parameters and variables

It's not important to memorize what $-, $_, etc. are for, but make sure this list doesn't contain any surprises. You should also be familiar with $PPID which is used to get the current parent process ID.

Script-relative paths

If your script is bundled with accompanying files, you will want to reference the paths of these files in terms of the current script's location. Doing so allows you to both relocate the bundle and invoke the script from anywhere without breaking any of the relative file references. Read this for more options and detail.

#!/bin/bash
# dir-example
set -eufo pipefail

here="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

# refer to the 'data' file living in the script directory
... "$here/data" ...

Passing arguments by name

Passing arguments by name (or keyword), rather than position, makes them self-documenting. When passing by position, it's easier to mistakenly transpose two arguments.

A typical solution to processing script arguments is to use something like getopts.

An alternative way to pass arguments by name is to assign them to variables.

#!/bin/bash
# var-arg-example
set -eufo pipefail

# ARG1 is required.
# ARG2 is optional; it will be assigned to 'default' if unset.
# Use := instead of = to also perform assignment when ARG2 is the empty string.
: ${ARG2='default'}
printf '%s %s\n' "$ARG1" "$ARG2"

# Invoke named-arg-example.

> ./var-arg-example
#./var-arg-example: line 8: ARG1: unbound variable

> export ARG1=foo

> ./var-arg-example
#foo default

> ARG2=bar ./var-arg-example
#foo bar

> ARG1=crow ARG2=bar ./var-arg-example
#crow bar

There is a trade-off being made with regard to name clashes. The upside is using global variable parameters allows arguments to either be explicitly passed inline, or to be exported and used across multiple invocations with the option of overriding them. The downside is the potential for unintentional name clashes with existing global variables.

Input and output

Manipulation of standard I/O in bash is somewhat indirect. Unless using redirections, stdio is implicitly consumed/produced by subcommands. Here are some useful formulas.

Consume part of stdin, assigning to variables, using read.

> read -n 3 threechars  # Your input ends on the same line as the next prompt.
abc> printf '%s\n' "$threechars"
abc

> read line  # This time, the next prompt starts on its own line.  Why?
#this input is read until you hit enter
> printf '%s\n' "$line"
#this input is read until you hit enter

Consume all of stdin using cat.

ALL_OF_STDIN=$(cat)  # this also demonstrates command-substitution
... use $ALL_OF_STDIN ...

Consume all of stdin, writing it to a file while also sending it to stdout using tee.

# Here is another use of tee.
printf '%s\n' 'important config' | sudo tee /etc/normally-cannot-write-here > /dev/null

# Note that the following will *not* normally succeed.
sudo printf '%s\n' 'important config' > /etc/normally-cannot-write-here

Capture the stdout of a command as a string with command substitution.
```
printf '%s\n' "Today is $(date)"
```
Treat a the I/O of a command like a file (more accurately, like a pipe) with process substitution.
```
diff data <(uniq data)
```
Redirect the current script's standard I/O using exec. More generally, use exec to manipulate arbitrary file descriptors.
```
printf '%s\n' 'this goes to stdout'
exec > some-file
printf '%s\n' 'this goes to some-file'
```

Flexibly tie the I/O of processes together using named pipes.

> mkfifo to-show
> ls -l
#prw-r--r-- 1 user user    0 ... to-show|

> {
>   printf '%s\n' show > to-show
>   printf '%s\n' these > to-show
>   printf '%s\n' lines > to-show
> } &
#[1] 1234

> jobs
#[1]+  Running { printf '%s\n' show > to-show;
#                printf '%s\n' these > to-show;
#                printf '%s\n' lines > to-show; } &

> cat < to-show
#show
#these
#lines
#[1]+  Done    { printf '%s\n' show > to-show;
#                printf '%s\n' these > to-show;
#               #printf '%s\n' lines > to-show; }

Describe input in-situ using 'here documents'.

cat << EXAMPLEDOC
All of these
lines are
treated as
input
EXAMPLEDOC

Manipulate an entire directory tree as a stream using tar.

tar cpvf - DIR -C ORIGIN | ssh -T -e none REMOTE-HOST 'tar xvf - -C DESTINATION'

You should also read about relational text processing.

Concurrency

Append a & to a command to run it concurrently with the remainder of a script's execution. Pause the script's execution until the child processes terminate using wait.

# With multiple processors, some of these may be able to run in parallel
for i in {1..10}; do
  slow_process < "input$i" > "output$i" &
done
wait
... # use output{1..10}

Here is an example demonstrating more interaction with a child process.

# Though this example is a bit too simplistic, it's often a good idea to create
# a temporary working directory to store data related to each child process.
> CHILD_DIR=$(mktemp -d child.XXXXXXXXXX)
> printf '%s\n' "$CHILD_DIR"
#child.iu9Ncsshzc

# Set up an output channel to receive messages from the child process.
> CHILD_OUT="$CHILD_DIR/out"
> mkfifo "$CHILD_OUT"

# Launch a process that outputs the result of its "work" every 5 seconds.
> {
>   while true; do
>     sleep 5
>     printf '%s\n' 'work asynchronously' > "$CHILD_OUT"
>   done
> } &
#[1] 12345

# Remember its PID.
> CHILD_PID="$!"
> printf '%s\n' "$CHILD_PID"
#12345

> jobs
#[1]+  Running                 { while true; do
#    sleep 5; printf '%s\n' 'work asynchronously' > "$CHILD_OUT";
#done; } &

# Pull some results out of the channel.
> cat < "$CHILD_OUT"
#work asynchronously

# The child process blocks when writing to the channel until we try to read.
# At most one result will be queued up at a time.  If we immediately try
# reading a second time, we notice a pause due to the child sleep.
> cat < "$CHILD_OUT"
#(up to a 5-second pause)
#work asynchronously

# Thanks to the while loop, the child process will continue until we are ready
# to stop it.
> kill "$CHILD_PID"
#[1]+  Terminated              { while true; do
#    sleep 5; printf '%s\n' ' work asynchronously' > "$CHILD_OUT";
#done; }

# Clean up.
> rm -r "$CHILD_DIR"

Read more about these and other job control commands.

Avoid running multiple instances of a command at the same time by using flock.

# /etc/cron.d/special
... typical cron things ...
# Run our special process every 5 minutes.
# Run it with flock to prevent overlap if it runs for longer than 5 minutes.
*/5 * * * *  user  flock -n /tmp/special-process.lock /path/to/special-process

Effective use of ssh

The ssh command is more than just a way to interactively log into a remote host. It allows remote command execution, transferring files, and various forms of proxying. This explains more.