When a production system stops responding or connections behave unexpectedly, you need Linux network troubleshooting tools that show you exactly what is happening on the wire and in the kernel. This article covers the core network diagnostic toolkit: tcpdump for packet capture, ss for socket inspection, nmap for port scanning and service discovery, traceroute and mtr for path analysis, and supporting tools like dig, curl, iperf3, and ethtool. Mastering these tools is the difference between guessing and solving network problems systematically. All examples are tested on Debian 13.3, Ubuntu 24.04.3 LTS, Fedora 43, and RHEL 10.1. For foundational connectivity testing, see our network troubleshooting with ss, ip, ping, and tracepath guide.
tcpdump: Capturing and Analyzing Network Packets
tcpdump reads packets directly from a network interface (or a saved pcap file) and displays them in a readable format. It uses BPF (Berkeley Packet Filter) expressions to select which packets to capture. On a busy server, an unfiltered capture generates enormous output, so always apply filters.
Essential tcpdump capture filters
# Capture only TCP traffic on port 443 (HTTPS)
sudo tcpdump -i enp3s0 tcp port 443
# Capture DNS queries and responses (port 53, both TCP and UDP)
sudo tcpdump -i any port 53
# Capture traffic to/from a specific host
sudo tcpdump -i enp3s0 host 10.0.1.100
# Capture SYN packets only (connection initiations)
sudo tcpdump -i enp3s0 'tcp[tcpflags] & (tcp-syn) != 0 and tcp[tcpflags] & (tcp-ack) == 0'
# Capture ICMP (ping, unreachable, etc.)
sudo tcpdump -i enp3s0 icmp
# Save to file for Wireshark analysis (-w), limit to 1000 packets (-c)
sudo tcpdump -i enp3s0 -w /tmp/capture.pcap -c 1000 tcp port 3306
# Read from saved file with verbose output
tcpdump -r /tmp/capture.pcap -vvv | head -50
Practical tip: use -nn to disable DNS and port name resolution. On a system with slow DNS, tcpdump itself will lag without this flag. Add -tttt for human-readable timestamps. Use -s 0 (or the default on modern versions) to capture full packet contents instead of just the first 262144 bytes.
Reading tcpdump output and TCP flags
A typical TCP line looks like this:
14:32:05.123456 IP 10.0.1.50.45678 > 10.0.1.100.443: Flags [S], seq 123456789, win 64240, length 0
The flags field tells you the TCP connection state: [S] is SYN, [S.] is SYN-ACK, [.] is ACK, [P.] is PSH-ACK (data), [F.] is FIN-ACK, [R] is RST. If you see a burst of [S] without corresponding [S.], the remote side is not responding. If you see [R] immediately after [S], the port is closed or a firewall is rejecting connections.
Advanced tcpdump techniques for production debugging
Beyond basic filtering, tcpdump offers advanced techniques for isolating specific problems in production environments:
# Capture packets with specific TCP flags (RST only — find connection resets)
sudo tcpdump -nn -i enp3s0 'tcp[tcpflags] & (tcp-rst) != 0'
# Capture retransmissions by looking for duplicate SEQ numbers
# Save a full capture, then analyze with tshark
sudo tcpdump -nn -i enp3s0 -w /tmp/retrans.pcap tcp port 443
tshark -r /tmp/retrans.pcap -Y "tcp.analysis.retransmission" -T fields -e frame.number -e ip.src -e ip.dst
# Rotating capture files (10 files of 100MB each, rolling)
sudo tcpdump -nn -i enp3s0 -w /tmp/capture-%Y%m%d-%H%M%S.pcap -G 3600 -W 10 tcp port 80
# Capture only packets larger than a threshold (find jumbo frames or MTU issues)
sudo tcpdump -nn -i enp3s0 'greater 1500'
# Capture ARP traffic to debug IP conflicts or MAC resolution failures
sudo tcpdump -nn -i enp3s0 arp
ss: Inspecting Linux Socket State and Connections
ss (socket statistics) replaced netstat and is faster because it reads directly from kernel data structures via netlink rather than parsing /proc. It shows TCP, UDP, UNIX, and raw sockets with filtering capabilities that netstat never had.
# Show all listening TCP sockets with process info
ss -tlnp
# Show all established TCP connections
ss -tnp state established
# Show connections to a specific destination port
ss -tnp dst :3306
# Show connections from a specific source
ss -tnp src 10.0.1.0/24
# Show sockets in TIME_WAIT state with count
ss -tan state time-wait | wc -l
# Show detailed socket info (timers, memory, congestion control)
ss -tinp dst :443
# Show UDP sockets
ss -ulnp
# Summary statistics
ss -s
The -i flag with TCP shows internal state: round-trip time (rtt), congestion window (cwnd), retransmissions, and the congestion control algorithm. This is invaluable when diagnosing throughput problems. If rtt is high or retrans keeps incrementing, the network path has issues. If cwnd stays small, congestion or packet loss is throttling the connection.
# Detailed TCP connection info
ss -tinp dst :443
# Example output:
# State Recv-Q Send-Q Local Address:Port Peer Address:Port
# ESTAB 0 0 10.0.1.50:45678 10.0.1.100:443
# cubic wscale:7,7 rto:204 rtt:1.5/0.5 ato:40 mss:1448
# cwnd:10 retrans:0/0 rcv_space:29200
nmap: Port Scanning and Network Service Discovery
nmap scans hosts to discover open ports, identify running services, and detect operating systems. In production, use it to verify firewall rules, confirm that only expected ports are exposed, and audit server configurations. For firewall management details, see our nftables and firewalld guide.
# Basic TCP SYN scan of common ports (requires root for SYN scan)
sudo nmap -sS 10.0.1.100
# Scan specific ports
sudo nmap -sS -p 22,80,443,3306 10.0.1.100
# Full port range scan (takes longer)
sudo nmap -sS -p 1-65535 10.0.1.100
# Service version detection
sudo nmap -sV -p 22,80,443 10.0.1.100
# UDP scan (slow, but necessary for DNS, SNMP, NTP)
sudo nmap -sU -p 53,123,161 10.0.1.100
# OS detection
sudo nmap -O 10.0.1.100
# Scan an entire subnet for live hosts (ping sweep)
sudo nmap -sn 10.0.1.0/24
# Script scan for known vulnerabilities (use with permission only)
sudo nmap --script vuln -p 443 10.0.1.100
Only scan systems you own or have explicit authorization to test. Unauthorized port scanning is a policy violation in most organizations and may be illegal depending on jurisdiction.
A common production workflow: after deploying a new server, run nmap -sS -sV -p 1-65535 against it from outside the firewall to verify that only the intended services are reachable. Compare the output against your expected service list. Any unexpected open port is either a misconfiguration or a security concern. For a comprehensive approach to server security auditing, refer to our Linux server security with nftables, firewalld, and port scanning article.
traceroute, tracepath, and mtr for Network Path Analysis
These tools map the network path between your system and a destination by sending packets with incrementing TTL values.
traceroute sends UDP packets (or ICMP/TCP with flags) and records each hop that returns a TTL-exceeded message. tracepath is similar but also discovers the path MTU. mtr (My Traceroute) combines traceroute with continuous ping, showing real-time loss and latency statistics per hop.
# Standard traceroute (UDP by default)
traceroute 10.50.1.1
# TCP traceroute to port 443 (useful when ICMP/UDP is blocked)
sudo traceroute -T -p 443 10.50.1.1
# tracepath with MTU discovery
tracepath 10.50.1.1
# mtr for continuous monitoring (report mode, 100 packets)
mtr -r -c 100 10.50.1.1
# mtr in real-time interactive mode
mtr 10.50.1.1
When reading mtr output, focus on two columns: Loss% and Avg (average latency). A single hop showing loss does not always mean a problem at that router. Many routers rate-limit ICMP responses (they deprioritize ICMP processing). The test is whether loss persists at subsequent hops. If hop 5 shows 30% loss but hops 6-10 show 0%, hop 5 is just rate-limiting ICMP. If hop 5 shows 30% and all subsequent hops also show 30%, the issue is at or before hop 5.
DNS Diagnostics with dig and nslookup
# Query A record with full details
dig example.com A
# Short answer only
dig +short example.com A
# Query specific DNS server
dig @8.8.8.8 example.com A
# Trace delegation from root servers
dig +trace example.com A
# Reverse lookup
dig -x 93.184.216.34
# Query MX records
dig example.com MX +short
# Check DNSSEC validation
dig +dnssec example.com A
# Compare system resolver vs direct DNS query
getent hosts example.com
dig +short example.com A
When DNS resolution fails, first determine whether the problem is the local resolver stack or the upstream server. Run getent hosts <name> (uses the system resolver) and dig @<upstream> <name> (queries the upstream server directly). If getent fails but dig succeeds, the problem is local (systemd-resolved, /etc/resolv.conf, nsswitch.conf). If both fail, the problem is upstream or the record does not exist.
Diagnosing common DNS failures
DNS issues are among the most frequent causes of application failures. Here is a systematic approach to diagnosing the most common DNS problems on Linux:
# Check which DNS server the system is actually using
resolvectl status 2>/dev/null || cat /etc/resolv.conf
# Test if the DNS server is reachable at all
dig @10.0.1.2 +timeout=2 +tries=1 example.com A
# Check for SERVFAIL (indicates upstream server error or DNSSEC failure)
dig example.com A | grep -i "status"
# NOERROR = success, NXDOMAIN = name doesn't exist, SERVFAIL = server error
# Verify resolution chain for internal domains
dig +trace internal.company.com A
# Check TTL values (low TTLs mean frequent re-queries)
dig example.com A | grep -A 1 "ANSWER SECTION"
# Flush local DNS cache (systemd-resolved)
sudo resolvectl flush-caches
resolvectl statistics
HTTP Testing and Latency Measurement with curl
# Fetch HTTP headers only
curl -I https://example.com
# Measure connection timing
curl -o /dev/null -s -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTLS: %{time_appconnect}s\nTotal: %{time_total}s\n" https://example.com
# Test with specific Host header (useful for vhosts behind load balancers)
curl -H "Host: app.example.com" http://10.0.1.100/
# Follow redirects and show headers
curl -L -v https://example.com 2>&1 | head -40
# POST with JSON data
curl -X POST -H "Content-Type: application/json" \
-d '{"key":"value"}' https://api.example.com/endpoint
# Download with wget, limit bandwidth
wget --limit-rate=1M https://releases.example.com/large-file.iso
The curl -w format string is extremely useful for diagnosing latency. It breaks down the total request time into DNS lookup, TCP connect, TLS handshake, and data transfer phases. If time_namelookup is high, the problem is DNS. If time_connect minus time_namelookup is high, the TCP path or server is slow. If time_appconnect minus time_connect is high, TLS negotiation is the bottleneck.
Bandwidth Testing with iperf3 and Interface Diagnostics
# On the server side (listening)
iperf3 -s
# On the client side (testing TCP throughput)
iperf3 -c 10.0.1.100 -t 30
# Test UDP throughput at 1 Gbps target rate
iperf3 -c 10.0.1.100 -u -b 1G -t 10
# Reverse test (server sends, client receives)
iperf3 -c 10.0.1.100 -R -t 30
# Check interface statistics
cat /proc/net/dev
# Detailed NIC info and error counters
ethtool -S enp3s0 | head -30
# Check link speed and duplex
ethtool enp3s0 | grep -E 'Speed|Duplex|Link detected'
The /proc/net/dev file shows per-interface packet and byte counters, including drops, errors, and overruns. If the "drop" counter keeps incrementing, the kernel is discarding packets, typically because the receive buffer is full or the CPU cannot process interrupts fast enough. ethtool -S provides driver-level statistics that break down drops by cause: rx_missed, rx_fifo_errors, or tx_aborted_errors each point to different root causes.
Systematic Network Troubleshooting Methodology
When something breaks, resist the urge to start randomly changing things. Follow this systematic troubleshooting sequence:
- Define the symptom precisely. "Cannot reach the database" is different from "connections to port 3306 time out after 30 seconds" or "connections to port 3306 get RST immediately."
- Check local interface and address.
ip -br addrandip link show. Is the interface up? Is the IP correct? - Check local routing.
ip route get <destination>. Does the kernel know how to reach the target? - Check connectivity hop by hop. Ping the gateway, then the next hop, then the destination. Use mtr for continuous testing.
- Check DNS if using hostnames.
getent hosts <name>anddig @<server> <name>. - Check transport layer.
ss -tlnpon the server to verify the service is listening.nmap -sS -p <port>from the client to verify the port is reachable. - Capture packets if needed. Run tcpdump on both ends to see whether SYN packets arrive and what the server responds with.
- Check application layer.
curl -vfor HTTP,openssl s_clientfor TLS, protocol-specific clients for databases.
This bottom-up approach isolates the faulty layer. Most problems are found by step 4 (routing/connectivity) or step 6 (service not listening, firewall blocking). Only reach for tcpdump when the simpler tools give ambiguous results.
Quick Reference - Cheats
| Task | Command |
|---|---|
| Capture traffic on port 443 | tcpdump -nn -i enp3s0 tcp port 443 |
| Save capture to file | tcpdump -w /tmp/cap.pcap -c 5000 -i enp3s0 |
| Listening TCP sockets | ss -tlnp |
| Connections to a port | ss -tnp dst :3306 |
| TCP connection details | ss -tinp |
| Scan common ports | nmap -sS 10.0.1.100 |
| Service version detection | nmap -sV -p 22,80,443 10.0.1.100 |
| Continuous path analysis | mtr -r -c 100 10.50.1.1 |
| DNS query with trace | dig +trace example.com A |
| HTTP timing breakdown | curl -o /dev/null -s -w "DNS:%{time_namelookup} TCP:%{time_connect}\n" URL |
| Bandwidth test | iperf3 -c 10.0.1.100 -t 30 |
| NIC link speed/errors | ethtool enp3s0 / ethtool -S enp3s0 |
| Interface packet counters | cat /proc/net/dev |
Summary
Network troubleshooting is a skill that improves with a consistent method. Start from the bottom of the stack (link, address, route) and work up to the application layer. Use ss to understand socket state, tcpdump when you need to see actual packets, nmap to verify what is reachable from the outside, and mtr to identify where packets disappear on the path. The curl timing breakdown and iperf3 bandwidth tests quantify problems that "it feels slow" cannot. Every tool here ships in standard repositories on Debian 13.3, Ubuntu 24.04.3 LTS, Fedora 43, and RHEL 10.1. The key is knowing which tool answers which question, and reaching for the right one instead of guessing.