Post

CVE 2025-23266 - An update

Introduction

On my previous post about CVE 2025-23266 I wrote about the following aspects of this vulnerabilty:

  • Dynamic linker behavior (necessary to understand how does LD_PRELOAD works)
  • Alternate runtimes (neccessary to understand what Nvidia Container Toolkit is, after all)
  • CUDA Forward Compatibility (necessary to understand why does the NVidia Container toolkit has a createContainer hook)
  • Container device interface (what are createContainer hooks)

But one thing that was missing from my initial writeup was this (from the original Wiz writeup):

While prestart hooks run in a clean, isolated context, createContainer hooks have a critical property: they inherit environment variables from the container image unless explicitly configured not to.

So I set out to find out exactly what was going on in there.

NVidia Container Runtime Architecture

The Nvidia Container runtime is a shim to runc. According to (Github)[https://github.com/NVIDIA/nvidia-container-toolkit/tree/main/cmd/nvidia-container-runtime]:

When a create command is detected, the incoming OCI runtime specification is modified in place and the command is forwarded to the low-level runtime.

NVidia container toolkit architecture

(Originally from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.9.0/arch-overview.html.)

On my last post I mentioned the newNVIDIAContainerRuntime function, which does exactly that. So how does runc handles this runtime spec - and how important is the createContainer hook?

Runc

Here is where everything made sense to me.

1
2
3
4
5
6
7
8
9
	// In case we have any StartContainer hooks to run, and they don't
	// have environment configured explicitly, make sure they will be run
	// with the same environment as container's init.
	//
	// NOTE the above described behavior is not part of runtime-spec, but
	// rather a de facto historical thing we afraid to change.
	if h := l.config.Config.Hooks[configs.StartContainer]; len(h) > 0 {
		h.SetDefaultEnv(l.config.Env)
	}

(From https://github.com/opencontainers/runc/blob/main/libcontainer/standard_init_linux.go#L201)

For whevever historical reason, runc ended up treating the startContainer hook in a special way. For this type of hook only, they use the environment configuration which comes from the spec (l.config.Env).

Then, on SetDefaultEnv:

https://github.com/opencontainers/runc/blob/main/libcontainer/configs/config.go#L536

1
2
3
4
5
6
7
func (hooks HookList) SetDefaultEnv(env []string) {
	for _, h := range hooks {
		if ch, ok := h.(CommandHook); ok && len(ch.Env) == 0 {
			ch.Env = env
		}
	}
}

Pay close attention to len(ch.Env) == 0. That conditional means “if the current command hook environment length is zero”, or in other words “if the current command hook has not defined its own environment variables”.

Which finally explains why does a spec like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
{
  "ociVersion": "1.2.1",
  "process": {
    "user": {
      "uid": 0,
      "gid": 0,
      "additionalGids": [0, 10]
    },
    "args": [
      "sh"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "LD_PRELOAD=./poc.so",
      "NVIDIA_VISIBLE_DEVICES=all"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
        "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
        "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
      ],
      "effective": [
        "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
        "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
        "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
      ],
      "permitted": [
        "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW",
        "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE",
        "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE"
      ]
    },
    "apparmorProfile": "docker-default",
    "oomScoreAdj": 0
  },
  "root": {
    "path": "/var/lib/docker/overlay2/b29b8b924bfcb60a5b2adadd98c935fe659480f24be017259f56cf49ead0d3ed/merged"
  },
  "hostname": "e078abd14abe",
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc",
      "options": ["nosuid", "noexec", "nodev"]
    },
  ],
  "hooks": {
    "prestart": [
      {
        "path": "/usr/bin/nvidia-container-runtime-hook",
        "args": [
          "nvidia-container-runtime-hook",
          "prestart"
        ],
        "env": [
          "LANG=en_US.UTF-8",
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin",
          "NOTIFY_SOCKET=/run/systemd/notify",
          "INVOCATION_ID=6a71d6bbe6634f0baebf6075a9ece8b2",
          "JOURNAL_STREAM=8:22608",
          "SYSTEMD_EXEC_PID=2089",
          "OTEL_SERVICE_NAME=dockerd",
          "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf",
          "OTEL_EXPORTER_OTLP_METRICS_PROTOCOL=http/protobuf",
          "TMPDIR=/var/lib/docker/tmp"
        ]
      },
      "createContainer": [
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "enable-cuda-compat",
          "--host-driver-version=575.64.03"
        ]
      },
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache"
        ]
      }
    ]
  },

is vulnerable:

  • LD_PRELOAD is set at the container spec level (this corresponds to l.config.Env)
  • Since the createContainer hook has not defined its own environment variables, it ends up inheriting the ones from l.config.Env, since this is the parameter for the SetDefaultEnv call, which does ch.Env = env whenever the command hook own environment environments are non-existent.

The fix adds a "NVIDIA_CTK_DEBUG=false for each one of the createContainer hooks to prevent this “historical behavior” to trigger

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    "createContainer": [
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "enable-cuda-compat",
          "--host-driver-version=575.64.03"
        ],
        "env": [
          "NVIDIA_CTK_DEBUG=false"
        ]
      },
      {
        "path": "/usr/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache"
        ],
        "env": [
          "NVIDIA_CTK_DEBUG=false"
        ]
      }

Final thoughts

Reading the statement on standard_init_linux.go about this behavior was surprising to me. Not because of the behavior in itself (I understand that sometimes we need to compromise in name of “backwasrds compatibility”). But I still think this is too much of a compromise, which can leads to other vulnerabilities in the future.

I wonder if I could open up a pull request to runc and suggest them to hide this behing a CLI flag, or a configuration. Let’s see.

This post is licensed under CC BY 4.0 by the author.