Level 1

Regex basics with grep sed and awk

Maximilian B. 2 min read 64 views

Regular expressions, usually called regex, are small patterns used to match text. On Linux, you use them every day with grep, sed, and awk. They help you find bad log lines, rewrite config files, and filter structured output. For entry-level technicians, regex can feel abstract at first. The practical view is simpler: regex is a way to describe text rules so the shell can do repetitive work for you. If your pattern is too broad, you match the wrong lines and can push bad changes into production. If your pattern is too strict, you miss the error that caused the incident. This guide focuses on patterns that are safe and useful in real operations.

What regex does in grep, sed, and awk

Regex basics with grep sed and awk visual summary diagram
Visual summary of the key concepts in this guide.

All three tools support regex, but they use it for different jobs:

  • grep finds matching lines.
  • sed edits matching text, often in streams or files.
  • awk evaluates fields and can run logic on matching rows.

By default, grep and sed use basic regular expressions. In daily work, many operators prefer extended mode because it is easier to read. Use grep -E and sed -E for that mode. In awk, regex is already integrated in conditions like $0 ~ /pattern/. The engine details differ across tools, so test your expression on sample data before touching production files.

Core patterns to memorize first

You do not need the full regex language to be effective. A small set covers most troubleshooting and maintenance tasks.

# Build sample data for practice
cat > /tmp/regex-demo.log <<'LOG'
2026-02-25 10:44:11 INFO ssh login user=alex src=10.20.1.14
2026-02-25 10:45:20 WARN ssh failed user=root src=10.20.1.99
2026-02-25 10:45:59 ERROR nginx upstream timeout host=app01
2026-02-25 10:46:03 WARN sudo failed user=backup src=10.20.1.77
LOG

# Start and end of line
grep -E '^2026-02-25' /tmp/regex-demo.log
grep -E 'host=app01$' /tmp/regex-demo.log

# Character classes and repetition
grep -E 'src=10\.20\.1\.[0-9]+' /tmp/regex-demo.log

# Alternatives with |
grep -E 'WARN|ERROR' /tmp/regex-demo.log

# "Any character" with . and wildcard count with *
grep -E 'user=.* src=' /tmp/regex-demo.log

Important detail: a dot means "any character" in regex, so an IP address dot must be escaped as \.. That single mistake is common and can match unintended data.

Using grep for fast and safe incident checks

grep is usually the first step during incident response. It is quick, read-only, and works well in pipelines. A good pattern can cut thousands of lines down to ten useful ones.

# Find failed SSH auth lines with line numbers
sudo grep -En 'Failed password|authentication failure' /var/log/auth.log

# On RHEL/Fedora family, auth messages are commonly in secure
sudo grep -En 'Failed password|authentication failure' /var/log/secure

# Recursively audit a config tree for uncommented PermitRootLogin
grep -REn '^[[:space:]]*PermitRootLogin[[:space:]]+yes' /etc/ssh

# Exclude noisy health checks from nginx access logs
grep -Ev '"GET /healthz|"GET /metrics' /var/log/nginx/access.log

Production consequence: if you forget anchors like ^, your search may match comments or old backup lines and produce wrong conclusions. For example, matching PermitRootLogin yes without checking line start can also hit commented lines such as # PermitRootLogin yes. During audits, that can lead to false compliance reports.

Using sed for controlled replacements

sed is powerful because it can edit many lines quickly. That is also why it can break files fast if your regex is loose. The safe pattern is preview first, then edit with a backup.

# Preview replacement only (no file change)
sed -E 's/(^\s*MaxAuthTries\s+)[0-9]+/\16/' /etc/ssh/sshd_config | head -n 20

# In-place edit with automatic backup copy
sudo sed -E -i.bak 's/(^\s*MaxAuthTries\s+)[0-9]+/\16/' /etc/ssh/sshd_config

# Verify result and validate daemon config
grep -En '^\s*MaxAuthTries\s+' /etc/ssh/sshd_config
sudo sshd -t

# Reload only after a clean validation
sudo systemctl reload sshd

The capture group (...) keeps the left side of the setting, and \1 reuses it in the replacement. This avoids rewriting spacing and comments more than needed. In production, that lowers diff noise and makes peer review easier.

Another safe habit is to limit edits to precise files, not broad recursive loops, until you have tested the command. One bad recursive sed -i can modify templates, examples, and live configs in one run.

Using awk when fields matter more than full lines

awk is better than plain grep when data has columns. You can split fields and then apply regex to just one field. This reduces false matches.

# Show non-system users from /etc/passwd (UID >= 1000) using field logic
awk -F: '$3 >= 1000 {print $1, $3, $7}' /etc/passwd

# Match only failed sudo events and print selected tokens
awk '/sudo failed/ {print $1, $2, $6, $7}' /tmp/regex-demo.log

# Match source IP pattern and count hits per address
awk 'match($0, /src=10\.20\.1\.[0-9]+/) {print substr($0, RSTART, RLENGTH)}' /tmp/regex-demo.log | \
  sort | uniq -c | sort -nr

For operators, the value is precision: you can match the right field and then count, sum, or trigger conditions. That makes awk useful for capacity checks, login analysis, and quick one-off reports during outages.

Compatibility notes for Debian, Ubuntu, Fedora, and RHEL

These examples are compatible with current mainstream releases: Debian 13.3, Ubuntu 24.04.3 LTS, Ubuntu 25.10, Fedora 43, and RHEL 10.1. They also work on RHEL 9.7 with the same command style.

  • Prefer -E for extended regex in grep and sed. It is clearer and widely supported in these distributions.
  • Avoid depending on grep -P in automation unless you confirmed it in your environment. PCRE support can vary by build and policy.
  • On Debian and Ubuntu, awk may point to mawk by default, while Fedora and RHEL often use gawk. Basic regex usage is the same, but some advanced gawk-specific functions are not portable.
  • Locale affects character classes like [[:alpha:]]. For predictable script behavior, many teams run parsing commands with LC_ALL=C.
# Check which awk implementation is active
awk -W version 2>/dev/null || gawk --version | head -n 1

# Force predictable byte-based matching in scripts
LC_ALL=C grep -E '^[a-z0-9._-]+$' input.txt

Summary

Regex becomes manageable when you treat it as a small toolbox, not a giant theory topic. Start with anchors, classes, and repetition. Use grep to find, sed to replace with backups, and awk when fields matter. In production work, the biggest win is not clever syntax. It is careful scope, preview steps, and validation after every change. That approach scales from beginner labs to live systems on Debian 13.3, Ubuntu 24.04.3 LTS and 25.10, Fedora 43, RHEL 10.1, and RHEL 9.7.

Share this article
X / Twitter LinkedIn Reddit