Install NVIDIA Drivers and CUDA on Linux Server for AI: The No-Nonsense Guide (2026)

Getting NVIDIA drivers and CUDA working on a Linux server dedicated to AI workloads is not inherently difficult, but the number of wrong ways to do it vastly outnumbers the right ways. Between distribution-specific packaging, kernel module choices, CUDA toolkit version matrices, and the ever-present Nouveau conflict, a straightforward driver install can spiral into hours of debugging. This guide eliminates that. We cover exactly what to install, in what order, on Ubuntu 24.04, RHEL 9, and Fedora 41 — the three distributions that dominate production AI server deployments in 2026.

The stakes are real. A misconfigured GPU driver means your LLM inference server sits idle, your training jobs fail silently, or worse — your system boots to a black screen. Whether you are setting up a single workstation with a GeForce RTX 4090 for local model experimentation or provisioning a rack of servers with A100s or H100s for production inference with Ollama or vLLM, the driver installation process follows the same fundamental steps. Get the driver right, get CUDA right, and everything else — from NVIDIA Container Toolkit to PyTorch — falls into place.

This guide is written for Linux system administrators who need working GPU compute, not desktop graphics. We skip X11/Wayland configuration entirely and focus exclusively on headless server deployment. Every command has been tested. Every failure mode listed comes from real production incidents.

The Decision Tree: Which Driver, Which Module, Which CUDA

Before touching a package manager, you need to make three decisions. Getting these wrong means reinstalling later, so take sixty seconds to think through each one.

Decision 1: Driver Version

NVIDIA maintains multiple driver branches simultaneously. For AI workloads in 2026, the relevant branches are:

570.x (latest production branch) — Supports all current GPUs from Hopper (H100/H200), Ada Lovelace (RTX 4000/5000 series), and Blackwell (B200). This is the default choice for new installations.
560.x (previous production branch) — Still receives security updates. Use this if you have validated a specific CUDA toolkit version against 560 and cannot risk a driver change.
550.x (legacy production) — Required for certain older Ampere configurations. End-of-life approaching; migrate when possible.
535.x (legacy) — Only if you are running Kepler or Maxwell GPUs for budget inference with quantized models, as covered in our Tesla P40 budget LLM server guide.

Rule of thumb: Install the newest driver that supports your GPU. Newer drivers are backward-compatible with older CUDA toolkit versions, but older drivers cannot run newer CUDA toolkits. Check the CUDA Toolkit Release Notes for the minimum required driver version.

Decision 2: Open vs Proprietary Kernel Module

Since driver version 515, NVIDIA ships two kernel modules:

nvidia-open — Open-source kernel module (MIT/GPLv2 dual-licensed). Recommended for Turing (RTX 2000+), Ampere, Ada Lovelace, Hopper, and Blackwell GPUs. This is NVIDIA's default since driver 560 on supported hardware. Better integration with kernel updates, faster bug fixes from the community, and full feature parity with the proprietary module on supported GPUs.
nvidia (proprietary) — Closed-source kernel module. Required for Pascal (GTX 1000 series) and older GPUs. Also the fallback if you encounter regressions with the open module on specific kernel versions.

Rule of thumb: If your GPU is Turing or newer (which includes every GPU you should be buying for AI in 2026), use the open kernel module. If you are repurposing older Tesla P40 or V100 cards, use the proprietary module.

Decision 3: CUDA Toolkit Version

The CUDA toolkit is separate from the driver. You need it for compiling CUDA code and for the runtime libraries that frameworks like PyTorch, TensorFlow, and llama.cpp depend on.

CUDA 12.8 — Latest stable. Required for Blackwell (B200/B100) specific features. Compatible with driver 570+.
CUDA 12.6 — Widely tested, broadest framework compatibility. Compatible with driver 560+. The safe choice for most deployments.
CUDA 12.4 — Use only if a specific application pins this version.

Rule of thumb: Install CUDA 12.6 unless you have a specific reason to choose otherwise. Most AI frameworks target 12.6 as their primary build target. If you are using Ollama for model serving, it bundles its own CUDA runtime, so you may not need the full toolkit at all — but install it anyway for flexibility with other tools.

Step 1: Blacklist Nouveau

Nouveau is the open-source reverse-engineered NVIDIA driver that ships with most Linux distributions. It provides basic display output but zero compute capability. It also conflicts directly with the NVIDIA proprietary/open driver. If Nouveau is loaded when you try to install the NVIDIA driver, the installation will either fail or produce a broken setup.

On a headless server, Nouveau should not be loaded at all, but many distributions load it by default even without a display connected. Check first:

# Check if Nouveau is currently loaded
lsmod | grep nouveau

# If you see output, Nouveau is loaded and must be blacklisted

Create the blacklist configuration:

# Create blacklist file
sudo tee /etc/modprobe.d/blacklist-nouveau.conf <<'EOF'
blacklist nouveau
options nouveau modeset=0
EOF

# Rebuild initramfs
# Ubuntu/Debian:
sudo update-initramfs -u

# RHEL/Fedora:
sudo dracut --force

# Reboot
sudo reboot

After reboot, verify Nouveau is gone:

lsmod | grep nouveau
# Should produce NO output

# Verify NVIDIA hardware is detected
lspci | grep -i nvidia
# Should show your GPU(s)

Step 2: Install NVIDIA Drivers on Ubuntu 24.04 LTS

Ubuntu 24.04 provides the cleanest NVIDIA driver experience of any distribution, largely because Canonical and NVIDIA collaborate on the packaging. There are two installation paths: the Ubuntu repository packages and the NVIDIA CUDA repository packages. We recommend the NVIDIA repository — it gets updates faster and gives you consistent package naming across distributions.

Method A: NVIDIA Official Repository (Recommended)

# Install prerequisites
sudo apt update
sudo apt install -y build-essential dkms linux-headers-$(uname -r)

# Add NVIDIA CUDA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update

# Install the driver (open kernel module — recommended for Turing+)
sudo apt install -y nvidia-open-570
# Or for the proprietary module (Pascal and older):
# sudo apt install -y cuda-drivers-570

# Reboot to load the new kernel module
sudo reboot

Method B: Ubuntu Repository

# List available drivers
sudo ubuntu-drivers list

# Install the recommended driver automatically
sudo ubuntu-drivers install

# Or install a specific version
sudo apt install -y nvidia-driver-570

# Reboot
sudo reboot

Post-Install Verification on Ubuntu

# Verify driver is loaded
nvidia-smi

You should see output similar to:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15    CUDA Version: 12.8      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090       Off  | 00000000:01:00.0  Off |                  Off |
|  0%   32C    P8               9W / 450W |       1MiB / 24564MiB |      0%      Default |
+-----------------------------------------+------------------------+----------------------+

Key things to verify: the driver version matches what you installed, CUDA version shows the maximum supported CUDA (not necessarily what you have installed), and your GPU is listed with the correct memory size. If nvidia-smi fails with "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver," see the troubleshooting section below.

Step 3: Install NVIDIA Drivers on RHEL 9

Red Hat Enterprise Linux 9 requires a few extra steps because of its conservative package policies and the need to work around Secure Boot and kernel module signing in enterprise environments. For organizations already running Ollama on RHEL, see our RHEL 9 and Rocky Linux enterprise setup guide.

Enable Required Repositories

# Enable EPEL (Extra Packages for Enterprise Linux)
sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm

# Enable the CodeReady Builder repository (needed for DKMS)
sudo subscription-manager repos --enable codeready-builder-for-rhel-9-x86_64-rpms
# On Rocky Linux / AlmaLinux:
# sudo dnf config-manager --set-enabled crb

# Install kernel development packages
sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make dkms

Add NVIDIA Repository and Install Driver

# Add the official NVIDIA CUDA repository for RHEL 9
sudo dnf config-manager --add-repo \
  https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

# Clean and refresh metadata
sudo dnf clean all
sudo dnf makecache

# Install the open kernel module driver (Turing+ GPUs)
sudo dnf module install -y nvidia-driver:570-open

# Or for proprietary module (older GPUs):
# sudo dnf module install -y nvidia-driver:570

# Reboot
sudo reboot

RHEL-Specific Considerations

Secure Boot: If Secure Boot is enabled, the NVIDIA kernel module must be signed. RHEL's DKMS integration can handle this automatically if you enroll a MOK (Machine Owner Key):

# Generate a signing key pair
sudo openssl req -new -x509 -newkey rsa:2048 -keyout /root/nvidia-signing-key.key \
  -outform DER -out /root/nvidia-signing-key.der -nodes -days 36500 \
  -subj "/CN=NVIDIA Module Signing Key/"

# Enroll the key with MOK
sudo mokutil --import /root/nvidia-signing-key.der
# You will be prompted to set a password — remember it for the next reboot

# Reboot and complete MOK enrollment in the UEFI interface
sudo reboot

SELinux: The NVIDIA driver packages include the correct SELinux contexts. If you encounter "Permission denied" errors related to GPU device nodes, check audit logs:

# Check for SELinux denials related to NVIDIA
sudo ausearch -m AVC -ts recent | grep nvidia

# If needed, generate and install a custom policy module
sudo ausearch -m AVC -ts recent | audit2allow -M nvidia-local
sudo semodule -i nvidia-local.pp

Step 4: Install NVIDIA Drivers on Fedora 41

Fedora runs closer to the bleeding edge than RHEL, which means newer kernels that sometimes require newer drivers. The RPM Fusion repository is the standard way to install NVIDIA drivers on Fedora.

Enable RPM Fusion and Install

# Enable RPM Fusion Free and Non-Free repositories
sudo dnf install -y \
  https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm \
  https://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

# Install the NVIDIA driver with CUDA support
sudo dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda

# IMPORTANT: Wait for the kernel module to build
# akmods builds the module asynchronously — check status:
sudo akmods --force
sudo dracut --force

# Reboot
sudo reboot

Alternative: NVIDIA Official Repository on Fedora

# Add NVIDIA's Fedora repository
sudo dnf config-manager --add-repo \
  https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo

# Install driver
sudo dnf clean all
sudo dnf install -y nvidia-open-570

# Rebuild initramfs and reboot
sudo dracut --force
sudo reboot

Fedora Kernel Update Considerations

Fedora pushes kernel updates frequently. With the RPM Fusion akmods approach, kernel modules rebuild automatically on kernel update. With the NVIDIA repository approach, you may need to manually rebuild DKMS modules after a kernel update:

# After a kernel update, verify the module is built for the new kernel
dkms status
# Should show: nvidia/570.86.15, [kernel-version], x86_64: installed

# If not, trigger a rebuild
sudo dkms autoinstall

Step 5: Verify the Driver Installation with nvidia-smi

Regardless of distribution, the verification steps are identical. Run these commands after every driver installation or update:

# Basic GPU information
nvidia-smi

# Detailed query — useful for scripting and monitoring
nvidia-smi --query-gpu=name,driver_version,memory.total,memory.free,temperature.gpu,power.draw --format=csv
# Output:
# name, driver_version, memory.total [MiB], memory.free [MiB], temperature.gpu, power.draw [W]
# NVIDIA GeForce RTX 4090, 570.86.15, 24564 MiB, 24340 MiB, 32, 9.45

# Verify the kernel module loaded correctly
lsmod | grep nvidia
# Should show: nvidia, nvidia_modeset, nvidia_uvm, nvidia_drm

# Check driver messages in kernel log
dmesg | grep -i nvidia | tail -20

# List all GPUs with PCIe information
nvidia-smi topo -m
# Shows GPU topology — useful for multi-GPU setups

For multi-GPU servers, verify every GPU is visible. If you have 4 GPUs but nvidia-smi shows only 2, check PCIe slot seating and BIOS settings for bifurcation or Above 4G Decoding. For detailed GPU hardware planning, see our GPU buyer's guide for LLMs.

Step 6: Install the CUDA Toolkit

The CUDA toolkit provides the compiler (nvcc), runtime libraries, and development headers needed to build GPU-accelerated applications. Many AI frameworks ship pre-compiled CUDA binaries, so you may not need nvcc — but the runtime libraries are essential.

Ubuntu 24.04

# If you already added the NVIDIA CUDA repository in Step 2, just install:
sudo apt install -y cuda-toolkit-12-6

# This installs CUDA 12.6 toolkit without changing your driver
# For the latest (12.8):
# sudo apt install -y cuda-toolkit-12-8

RHEL 9

# Using the repository added in Step 3:
sudo dnf install -y cuda-toolkit-12-6

Fedora 41

# Using the NVIDIA repository:
sudo dnf install -y cuda-toolkit-12-6

Configure Environment Variables

After installing the CUDA toolkit, add it to your PATH and library path. Create a profile script so it applies to all users:

# Create CUDA environment script
sudo tee /etc/profile.d/cuda.sh <<'EOF'
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
EOF

# Apply immediately in current session
source /etc/profile.d/cuda.sh

# Verify CUDA toolkit installation
nvcc --version
# Should show: Cuda compilation tools, release 12.6, V12.6.xxx

# Run a sample to verify GPU compute works
# (optional but recommended for first-time setup)
cuda-install-samples-12.6.sh ~/cuda-samples
cd ~/cuda-samples/Samples/1_Utilities/deviceQuery
make
./deviceQuery
# Should end with: Result = PASS

Multiple CUDA Toolkit Versions

You can install multiple CUDA toolkit versions side by side. Each installs to /usr/local/cuda-XX.Y/ and a symlink /usr/local/cuda points to the default version:

# Install multiple versions
sudo apt install -y cuda-toolkit-12-6 cuda-toolkit-12-8

# Switch between them by updating the symlink
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda

# Verify
nvcc --version

Step 7: Docker GPU Runtime (NVIDIA Container Toolkit)

If you run AI workloads in containers — and you should for reproducibility and isolation — you need the NVIDIA Container Toolkit. This is the bridge between your host GPU driver and containerized applications. We have a comprehensive NVIDIA Container Toolkit guide covering advanced configuration, but here is the essential setup.

Install the NVIDIA Container Toolkit

# Add the NVIDIA container toolkit repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

# For RHEL/Fedora:
# curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
#   sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# sudo dnf install -y nvidia-container-toolkit

Configure Docker Runtime

# Configure Docker to use the NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

# Test GPU access inside a container
sudo docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

# You should see the same nvidia-smi output as on the host

Docker Compose GPU Configuration

For multi-container AI stacks (e.g., Ollama with Docker Compose), configure GPU access in your compose file:

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all  # or specific count: 1
              capabilities: [gpu]
    volumes:
      - ollama-data:/root/.ollama
    ports:
      - "11434:11434"

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama-data:

For production Docker GPU passthrough configurations, including device isolation and MPS (Multi-Process Service), see our dedicated guide.

Step 8: Enable Persistence Mode

By default, the NVIDIA driver unloads and reinitializes the GPU each time the last process using it exits. This adds 1-3 seconds of latency to the first CUDA call after an idle period. For AI servers that handle intermittent requests (like an Ollama API server), this latency is unacceptable.

Persistence mode keeps the driver initialized at all times, eliminating cold-start latency:

# Enable persistence mode (does not survive reboot)
sudo nvidia-smi -pm 1

# Verify
nvidia-smi --query-gpu=persistence_mode --format=csv,noheader
# Should show: Enabled

Make Persistence Mode Permanent with systemd

NVIDIA provides a persistence daemon that is the recommended way to maintain persistence mode across reboots:

# The nvidia-persistenced service should already be installed with the driver
# Enable and start it
sudo systemctl enable nvidia-persistenced
sudo systemctl start nvidia-persistenced

# Verify it is running
systemctl status nvidia-persistenced

# Alternative: If nvidia-persistenced is not available, create a systemd service
sudo tee /etc/systemd/system/nvidia-persistence.service <<'EOF'
[Unit]
Description=NVIDIA Persistence Mode
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable nvidia-persistence.service
sudo systemctl start nvidia-persistence.service

For complete Ollama production hardening including persistence mode integration, see our Ollama systemd hardening guide.

Step 9: Power Management for AI Servers

GPU power management directly affects both performance and electricity costs. A single NVIDIA A100 can draw 400W under full load. A server with 8 GPUs can pull 3.2kW from GPUs alone. For real electricity cost calculations, see our LLM power consumption analysis.

Query Current Power Settings

# Check current power limit and default limit
nvidia-smi --query-gpu=power.limit,power.default_limit,power.max_limit --format=csv
# Output: 450.00 W, 450.00 W, 500.00 W

# Check current performance state
nvidia-smi --query-gpu=pstate --format=csv,noheader
# P0 = maximum performance, P8 = idle

Set Power Limits

Reducing the power limit by 10-20% typically reduces performance by only 3-5% while significantly reducing heat and electricity costs. This is a common optimization for 24/7 inference servers:

# Set power limit for GPU 0 to 380W (from 450W default on RTX 4090)
sudo nvidia-smi -i 0 -pl 380

# Set for all GPUs
sudo nvidia-smi -pl 380

# Make permanent via systemd service
sudo tee /etc/systemd/system/nvidia-power.service <<'EOF'
[Unit]
Description=Set NVIDIA GPU Power Limits
After=nvidia-persistenced.service

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pl 380
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable nvidia-power.service

GPU Clock Management

# Lock GPU clocks for consistent inference latency
# This prevents the GPU from dynamically scaling clocks, which causes variable latency
sudo nvidia-smi -lgc 1200,2100  # min,max MHz for RTX 4090

# Lock memory clocks
sudo nvidia-smi -lmc 10501      # MHz for GDDR6X on RTX 4090

# Reset to default (dynamic scaling)
sudo nvidia-smi -rgc
sudo nvidia-smi -rmc

# Check current clock speeds
nvidia-smi --query-gpu=clocks.current.graphics,clocks.current.memory --format=csv

Thermal Management

For rack-mounted servers without adequate cooling, you can set a temperature target. The GPU will throttle if it exceeds this temperature:

# Query thermal throttle status
nvidia-smi --query-gpu=temperature.gpu,temperature.gpu.tlimit --format=csv

# Set temperature threshold (GPU will throttle above this)
# Note: Not all GPUs support user-configurable thermal limits
# Monitor temperature continuously
watch -n 1 nvidia-smi --query-gpu=temperature.gpu,power.draw,utilization.gpu --format=csv

For comprehensive GPU monitoring dashboards with Prometheus and Grafana, see our GPU monitoring guide.

The 5 Most Common Failures and Fixes

These five issues account for the vast majority of NVIDIA driver problems on Linux servers. We have seen each one multiple times in production environments.

Failure 1: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver"

This is the single most common error. It means the NVIDIA kernel module is not loaded.

# Diagnose
lsmod | grep nvidia    # Is the module loaded?
dmesg | grep -i nvidia # Any errors in kernel log?
cat /proc/driver/nvidia/version  # Driver file version

# Common causes and fixes:

# Cause A: Nouveau is still loaded
lsmod | grep nouveau
# Fix: Follow the Nouveau blacklisting steps above, then reboot

# Cause B: Kernel update broke DKMS module
dkms status
# If nvidia module shows as "added" but not "installed":
sudo dkms install nvidia/570.86.15 -k $(uname -r)

# Cause C: Secure Boot blocking unsigned module
sudo mokutil --sb-state
# If SecureBoot enabled, sign the module or disable SecureBoot in BIOS

# Cause D: Wrong driver version for your GPU
# Check if your GPU is supported by the installed driver version
lspci -nn | grep -i nvidia
# Compare the PCI ID against NVIDIA's supported GPU list

# Nuclear option: Remove everything and start fresh
sudo apt purge -y 'nvidia-*' 'cuda-*' 'libnvidia-*'  # Ubuntu
# sudo dnf remove -y 'nvidia-*' 'cuda-*'              # RHEL/Fedora
sudo reboot
# Then reinstall from scratch

Failure 2: "No devices were found" in nvidia-smi

The driver is loaded but cannot see any GPUs.

# Check if the GPU is visible on the PCIe bus
lspci | grep -i nvidia

# If lspci shows nothing:
# - Physical issue: reseat the GPU card
# - BIOS issue: enable Above 4G Decoding and MMIO in BIOS
# - PCIe issue: try a different slot

# If lspci shows the GPU but nvidia-smi does not:
# Check for GPU fallen off the bus
dmesg | grep -i "fell off the bus"
dmesg | grep -i "Xid"
# Xid errors indicate GPU hardware or driver issues

# Check IOMMU settings (can interfere with GPU access)
dmesg | grep -i iommu
# If IOMMU is grabbing the GPU, add to kernel cmdline:
# intel_iommu=on iommu=pt  (for Intel)
# amd_iommu=on iommu=pt    (for AMD)

Failure 3: CUDA Version Mismatch Errors

Applications report CUDA errors even though nvidia-smi shows a CUDA version.

# Understanding the version mismatch:
# nvidia-smi shows the MAXIMUM CUDA version supported by your DRIVER
# nvcc --version shows the CUDA TOOLKIT version you have INSTALLED
# These are different things!

# Check both versions
nvidia-smi | head -3        # Driver's max CUDA support
nvcc --version               # Installed toolkit version

# The toolkit version must be <= the driver's max CUDA version
# Driver 570 supports CUDA up to 12.8
# If you installed CUDA 12.8 toolkit but have driver 560 (max CUDA 12.6):
# Either upgrade the driver or downgrade the toolkit

# Also check for conflicting CUDA installations
which nvcc
ls -la /usr/local/cuda*
echo $LD_LIBRARY_PATH

# Fix: Ensure /usr/local/cuda symlink points to the right version
sudo rm /usr/local/cuda
sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda

Failure 4: Driver Breaks After Kernel Update

This happens when the distribution updates the kernel and the NVIDIA DKMS module fails to rebuild for the new kernel.

# Check DKMS status for all kernels
dkms status

# If the module is missing for the current kernel:
sudo dkms install nvidia/570.86.15 -k $(uname -r)

# If DKMS itself fails (missing kernel headers):
# Ubuntu:
sudo apt install -y linux-headers-$(uname -r)
# RHEL:
sudo dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
# Fedora:
sudo dnf install -y kernel-devel-$(uname -r)

# Rebuild
sudo dkms autoinstall
sudo reboot

# Prevention: Pin the kernel version if stability is critical
# Ubuntu:
sudo apt-mark hold linux-image-$(uname -r) linux-headers-$(uname -r)
# RHEL/Fedora:
sudo dnf versionlock add kernel kernel-devel kernel-headers

Failure 5: GPU Memory Allocation Failures

Applications crash with "CUDA out of memory" even though nvidia-smi shows free memory. This is covered extensively in our GPU memory troubleshooting guide, but here are the quick fixes:

# Check actual GPU memory usage
nvidia-smi

# Check for zombie processes holding GPU memory
sudo fuser -v /dev/nvidia*

# Kill zombie processes
sudo fuser -k /dev/nvidia*

# Check for memory fragmentation (common with PyTorch)
# Set environment variable to help with fragmentation:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

# For Ollama specifically:
# Reduce model context size or use a smaller quantization
# See: /article.php?slug=ollama-gpu-memory-troubleshooting-linux

# Monitor memory usage in real time
watch -n 1 nvidia-smi --query-gpu=memory.used,memory.free,memory.total --format=csv

Multi-GPU Configuration

For servers with multiple GPUs, additional configuration ensures optimal performance. This is critical for multi-GPU LLM inference workloads.

# List all GPUs and their topology
nvidia-smi topo -m

# Check NVLink status (if applicable)
nvidia-smi nvlink -s

# Set specific GPUs visible to an application
export CUDA_VISIBLE_DEVICES=0,1  # Only GPUs 0 and 1

# Check PCIe bandwidth between GPUs
nvidia-smi topo -p2p r
# N/A = no direct path, OK = direct path available

# For Ollama multi-GPU:
# Ollama automatically uses all visible GPUs for large models
# Set CUDA_VISIBLE_DEVICES to restrict which GPUs Ollama uses
CUDA_VISIBLE_DEVICES=0,1 ollama serve

Verifying the Complete Stack

After completing all installation steps, run this comprehensive verification script to confirm everything is working:

#!/bin/bash
echo "=== NVIDIA Driver and CUDA Verification ==="
echo ""

echo "1. Driver Version:"
nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -1

echo ""
echo "2. GPU(s) Detected:"
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader

echo ""
echo "3. CUDA Toolkit Version:"
nvcc --version 2>/dev/null | grep "release" || echo "CUDA toolkit not installed (may not be needed)"

echo ""
echo "4. Kernel Module:"
lsmod | grep nvidia | awk '{print $1}' | sort

echo ""
echo "5. Persistence Mode:"
nvidia-smi --query-gpu=persistence_mode --format=csv,noheader

echo ""
echo "6. Power Limit:"
nvidia-smi --query-gpu=power.limit --format=csv,noheader

echo ""
echo "7. Nouveau Blacklisted:"
if lsmod | grep -q nouveau; then
    echo "WARNING: Nouveau is still loaded!"
else
    echo "OK - Nouveau not loaded"
fi

echo ""
echo "8. Docker GPU Access:"
if command -v docker &>/dev/null; then
    docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null || echo "Docker GPU test failed"
else
    echo "Docker not installed"
fi

echo ""
echo "=== Verification Complete ==="

FAQ

Do I need to install CUDA separately if I only use Ollama?

No. Ollama bundles its own CUDA runtime libraries, so you only need the NVIDIA driver installed on the host. The CUDA toolkit (with nvcc) is only required if you compile CUDA code yourself or use frameworks like PyTorch that need the full toolkit. That said, installing the CUDA toolkit does not conflict with Ollama and gives you flexibility for other AI tools later.

Can I run different CUDA versions for different applications?

Yes. Install multiple CUDA toolkit versions side by side (e.g., cuda-toolkit-12-6 and cuda-toolkit-12-8). Each installs to its own directory under /usr/local/. Set CUDA_HOME and update the /usr/local/cuda symlink to switch between them. Alternatively, use Docker containers with different CUDA base images — this is the cleanest approach for production. See our NVIDIA Container Toolkit guide for container-based CUDA management.

Should I use the open or proprietary NVIDIA kernel module?

Use the open kernel module (nvidia-open) if your GPU is Turing architecture or newer (RTX 2000+, T4, A100, H100, RTX 4000/5000 series, B200). The open module is NVIDIA's default recommendation for these GPUs since driver 560. It has full feature parity with the proprietary module on supported hardware and benefits from better kernel integration and community contributions. Use the proprietary module only for Pascal (GTX 1000, P40, P100) or older GPUs.

How do I prevent kernel updates from breaking my NVIDIA driver?

The DKMS (Dynamic Kernel Module Support) system automatically rebuilds the NVIDIA kernel module when a new kernel is installed. Ensure dkms, kernel-headers, and kernel-devel packages are installed. Run dkms status after every kernel update to verify the module rebuilt successfully. For maximum stability on production servers, pin the kernel version and only update it during planned maintenance windows when you can immediately verify the GPU driver still works.

What is the minimum NVIDIA driver version I need for AI workloads in 2026?

For current AI frameworks and models, driver 560+ with CUDA 12.6 support is the practical minimum. Driver 570 is recommended as it supports all current GPU architectures including Blackwell and includes performance optimizations for inference workloads. If you are using Ollama, vLLM, or PyTorch with models released in 2025-2026, driver 560 is the floor — anything older will likely cause compatibility issues with quantized model formats like GGUF Q4 variants or newer attention mechanisms.