CVE-2025-23328

NVIDIA · NVIDIA Triton Inference Server for Windows and Linux

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions.

Executive summary

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions. An attacker can send a specially crafted input to the server, causing it to write data outside of its intended memory buffer, which could lead to a system crash or allow the attacker to execute arbitrary code and take control of the affected system.

Vulnerability

The vulnerability is an out-of-bounds write. This occurs when the software attempts to write data to a memory location that is outside the boundaries of the allocated buffer. An unauthenticated remote attacker can trigger this flaw by sending a specially crafted input request to the Triton Inference Server, which fails to properly validate the input size, leading to memory corruption. Successful exploitation could result in a denial-of-service (DoS) condition by crashing the server process or could be leveraged to achieve arbitrary code execution with the permissions of the server process.

Business impact

This vulnerability is rated as High severity with a CVSS score of 7.5. Successful exploitation could have a significant business impact by compromising the integrity, availability, and confidentiality of AI/ML services. If an attacker achieves arbitrary code execution, they could gain control of the inference server, potentially stealing sensitive proprietary models, accessing the data being processed, or using the compromised server as a pivot point to attack other systems on the network. A denial-of-service attack would disrupt critical business applications that rely on the inference server, leading to service outages, reputational damage, and financial loss.

Remediation

Immediate Action: Apply the security updates released by NVIDIA immediately to all affected Triton Inference Servers. After patching, it is crucial to monitor systems for any signs of attempted exploitation and review server access and application logs for anomalous activity that may have occurred prior to patch deployment.

Proactive Monitoring: Implement enhanced monitoring on affected servers. Look for server process crashes, unexpected resource consumption (CPU/memory spikes), or error logs indicating memory corruption. Network monitoring should be configured to detect and alert on unusually large or malformed requests sent to the inference server's API endpoints.

Compensating Controls: If immediate patching is not feasible, implement the following controls to reduce risk:

Place the Triton Inference Server behind a Web Application Firewall (WAF) or an Intrusion Prevention System (IPS) with rules to inspect and block malformed or malicious inputs.
Restrict network access to the server, ensuring it is only accessible from trusted, authorized application front-ends and not directly exposed to the internet.
Run the server process with the lowest possible user privileges to limit the impact of a potential code execution exploit.

Exploitation status

Public Exploit Available: false

Analyst recommendation

Due to the high severity rating (CVSS 7.5) and the potential for remote code execution, this vulnerability poses a significant risk to the organization. We strongly recommend that all system owners identify affected NVIDIA Triton Inference Servers and apply the vendor-supplied security updates on an emergency basis. While there is no current evidence of active exploitation, the risk of compromise is substantial. Organizations should prioritize patching and implement the suggested monitoring and compensating controls to protect critical AI/ML infrastructure.