Shell scripting error handling and debugging

Shell scripts fail in quiet and expensive ways when error handling is missing. A script can copy half a backup, skip one command, and still exit with status 0 if the wrong pattern is used. For entry-level technicians, the goal is simple: when something breaks, the script should stop early, print a useful message, and leave the system in a clean state.

This guide focuses on Bash because it is available by default on Debian 13.3, Ubuntu 24.04.3 LTS, Ubuntu 25.10, Fedora 43, RHEL 10.1, and RHEL 9.7. The details below are practical patterns you can move into daily operations.

why error handling matters in shell scripts

Shell scripting error handling and debugging visual summary diagram — Visual summary of the key concepts in this guide.

In production, shell scripts often run as cron jobs, deployment hooks, backup tasks, and health checks. A bad script usually does one of these:

Continues after a failed command and writes incomplete data.
Hides failures because output goes to /dev/null.
Runs two copies at once and creates race conditions.
Deletes temporary files from the wrong path when a variable is empty.

For beginners, this means longer troubleshooting time and unclear logs. For operators, this means downtime, corrupted files, and noisy on-call alerts. Good error handling does not make scripts fancy. It makes them predictable.

start with safe defaults

Use a strict Bash header in most operational scripts:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

trap 'rc=$?; echo "ERROR line $LINENO: $BASH_COMMAND (exit $rc)" >&2' ERR

backup_src="/srv/app/data"
backup_dst="/srv/backups/data.tar.gz"

tar -czf "$backup_dst" "$backup_src"
echo "Backup completed: $backup_dst"

What each option does:

-e: stop on unhandled command errors.
-u: fail when using an unset variable.
pipefail: if a pipeline fails anywhere, the whole pipeline fails.
-E with trap ERR: keeps the error trap active in functions and subshells.

Do not assume set -e catches every case. Commands inside if tests, boolean chains, and some subshell contexts can behave differently. Test the script with real failure scenarios.

Also, quote variables unless you need word splitting. Unquoted variables are a common source of data loss when paths contain spaces or wildcards.

handle expected failures yourself

Some failures are expected, such as temporary network errors. Handle them explicitly instead of hoping strict mode solves everything.

#!/usr/bin/env bash
set -Eeuo pipefail

retry() {
  local attempts=$1
  local delay=$2
  shift 2

  local n=1
  while true; do
    if "$@"; then
      return 0
    fi
    if [ "$n" -ge "$attempts" ]; then
      echo "Command failed after $attempts attempts: $*" >&2
      return 1
    fi
    echo "Attempt $n failed. Retrying in $delay seconds..." >&2
    sleep "$delay"
    n=$((n + 1))
  done
}

url="https://repo.example.internal/health"
retry 4 3 curl --fail --silent --show-error "$url" > /tmp/repo-health.json

This pattern is useful for package mirror checks and internal API calls. It avoids false alarms during short outages but still fails hard after the retry limit.

When you must allow one command to fail, do it in a visible way:

if ! rm -f /tmp/old-report.lock; then
  echo "Could not remove stale lock file" >&2
fi

That is safer than adding || true everywhere, which can hide real bugs.

debugging methods that work under pressure

When a script fails in production, use layered debugging. Start with syntax checks, then trace execution.

# 1) Syntax check only (no execution)
bash -n deploy.sh

# 2) Full command trace
bash -x deploy.sh 2> /tmp/deploy.trace

# 3) Add line-number tracing inside script
export PS4='+${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]}: '
exec 3> /tmp/script-xtrace.log
export BASH_XTRACEFD=3
set -x
# sensitive commands should be outside traced blocks
set +x

bash -n catches missing fi, done, or quote errors fast. bash -x shows exactly which command failed and with which expanded values. In incident response, this can reduce recovery time from hours to minutes.

Install and run ShellCheck in CI when possible. It catches many quoting, variable, and control-flow mistakes before deployment.

cleanup and lock files prevent repeat incidents

Two production safeguards are often skipped: cleanup traps and file locks. They prevent stale temp files and duplicate concurrent runs.

#!/usr/bin/env bash
set -Eeuo pipefail

lock_file="/var/lock/inventory-sync.lock"
exec 9>"$lock_file"
if ! flock -n 9; then
  echo "inventory-sync is already running" >&2
  exit 1
fi

tmp_dir="$(mktemp -d /tmp/inventory-sync.XXXXXX)"
cleanup() {
  rm -rf "$tmp_dir"
}
trap cleanup EXIT

# Simulated workload
cp /srv/inventory/current.csv "$tmp_dir/"
awk -F, 'NR > 1 {print $1","$3}' "$tmp_dir/current.csv" > "$tmp_dir/summary.csv"
cp "$tmp_dir/summary.csv" /srv/inventory/summary.csv

If this script crashes, the EXIT trap still removes temporary files. If cron starts a second copy, flock -n exits early instead of creating write conflicts.

compatibility notes for Debian, Ubuntu, Fedora, and RHEL

Distribution	Error-handling and debugging notes
Debian 13.3	`/bin/sh` points to `dash`. If your script needs `pipefail`, arrays, or `[[ ]]`, use a Bash shebang explicitly.
Ubuntu 24.04.3 LTS / 25.10	Same `dash` behavior for `/bin/sh`. Cron scripts copied from Bash tutorials fail if they are started with `sh script.sh` by mistake.
Fedora 43	`/bin/sh` is Bash in POSIX mode. Bash scripts are fine with `#!/usr/bin/env bash`. ShellCheck is available through `dnf install ShellCheck`.
RHEL 10.1 / RHEL 9.7	Behavior is close between 10.1 and 9.7 for Bash operational scripts. Validate repo availability for ShellCheck in your environment; many teams provide it through internal mirrors.

The key compatibility rule is simple: if you wrote Bash, run Bash. Do not rely on /bin/sh behavior being the same everywhere.

summary

Reliable shell scripting is mostly disciplined basics: strict mode, explicit handling for expected failures, useful tracing, cleanup traps, and lock files. Beginners get faster troubleshooting because scripts fail loudly and at the right line. Operators get safer production runs with fewer partial updates and duplicate jobs. If you keep these patterns in every new script, debugging becomes routine instead of emergency work.

Bash linux shell debugging error-handling

why error handling matters in shell scripts

start with safe defaults

handle expected failures yourself

debugging methods that work under pressure

cleanup and lock files prevent repeat incidents

compatibility notes for Debian, Ubuntu, Fedora, and RHEL

summary

Continue Reading

Viewing and editing text with less nano and vim

User and group administration passwd shadow and sudo

Troubleshoot network issues with ss ip ping and tracepath

Stay in the Loop