CVE-2025-23319

NVIDIA · NVIDIA Multiple Products

A high-severity vulnerability exists in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions.

Executive summary

A high-severity vulnerability exists in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions. An unauthenticated attacker can send a specially crafted network request to the server's Python backend, causing an out-of-bounds write, which could lead to a system crash or allow the attacker to execute arbitrary code and take control of the affected server.

Vulnerability

The vulnerability lies within the Python backend component of the NVIDIA Triton Inference Server. By sending a specifically crafted network request, a remote attacker can trigger an out-of-bounds write condition. This memory corruption flaw allows an attacker to write data outside of the allocated memory buffer, which can be leveraged to overwrite critical application data, leading to a denial-of-service (DoS) by crashing the server, or potentially achieving arbitrary code execution with the permissions of the Triton server process.

Business impact

This vulnerability is rated as High severity with a CVSS score of 8.1. Successful exploitation could lead to a complete compromise of the AI inference server. This could result in the theft of sensitive data or intellectual property being processed by AI models, disruption of critical AI-driven business operations leading to service outages, and significant reputational damage. A compromised server could also be used as a foothold for an attacker to move laterally within the corporate network, escalating the incident's overall impact.

Remediation

Immediate Action: The primary remediation is to apply the security updates provided by NVIDIA to all affected Triton Inference Servers immediately. Following the update, administrators should monitor for any signs of post-patch exploitation attempts and review historical access logs for unusual requests targeting the Python backend that may indicate a past compromise.

Proactive Monitoring: Implement enhanced logging on Triton servers to capture detailed request information. Security teams should monitor for malformed or unusually large requests, unexpected server crashes, and anomalous outbound network traffic. System-level monitoring should be configured to alert on unauthorized processes or command execution on the server.

Compensating Controls: If patching cannot be immediately deployed, implement network-level controls to mitigate risk. Restrict access to the Triton Inference Server to only trusted, authorized IP addresses and systems. Place the server behind a Web Application Firewall (WAF) or Intrusion Prevention System (IPS) with rulesets designed to detect and block memory corruption exploits and malformed requests.

Exploitation status

Public Exploit Available: false

Analyst recommendation

Given the high CVSS score of 8.1 and the potential for remote code execution, this vulnerability represents a significant risk to the organization. Although it is not currently listed in the CISA KEV, the ease of exploitation (sending a network request) increases the likelihood of future attacks. We strongly recommend that all organizations prioritize the immediate application of NVIDIA's security updates to all vulnerable systems. If patching is delayed, compensating controls such as network segmentation and access control lists must be implemented as a critical temporary measure.