CVE-2025-23321

NVIDIA · NVIDIA Triton Inference Server for Windows and Linux

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions.

Executive summary

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions. An unauthenticated attacker can send a specially crafted request to the server, causing a "divide by zero" error that results in a denial of service, crashing the application and disrupting critical AI/ML operations. Organizations using the affected software should prioritize immediate patching to prevent service outages.

Vulnerability

This vulnerability is a flaw in the input validation logic of the NVIDIA Triton Inference Server. An unauthenticated, remote attacker can exploit this by sending a specially crafted request containing values that will be used as a divisor in a mathematical operation within the server's code. Because the server fails to sanitize this input and check for a zero value before the division occurs, it triggers an unhandled exception, causing the server process to terminate abruptly and resulting in a Denial of Service (DoS) condition.

Business impact

This vulnerability is rated as High severity with a CVSS score of 7.5. The primary business impact is the loss of availability for critical services that rely on the Triton Inference Server for AI and machine learning model inferencing. Successful exploitation would render these services inoperable, potentially halting production workflows, disrupting customer-facing applications, and interrupting data analysis pipelines. This can lead to direct financial loss, reputational damage, and a loss of confidence in the organization's services. The low complexity of the attack means that even a low-skilled attacker could disrupt key business functions.

Remediation

Immediate Action: Organizations must apply the security updates provided by NVIDIA immediately to all affected Triton Inference Server instances. After patching, system administrators should verify that the service is running correctly. It is also critical to monitor for any signs of exploitation attempts by reviewing server and application access logs for anomalous requests or crash events.

Proactive Monitoring: Implement monitoring to detect potential exploitation attempts. This includes configuring alerts for unexpected server crashes or restarts, monitoring application logs for arithmetic exception errors or stack traces, and analyzing network traffic for malformed or unusual requests targeting the Triton Inference Server's API endpoints.

Compensating Controls: If immediate patching is not feasible, implement compensating controls to reduce the risk. Restrict network access to the Triton Inference Server to only trusted, internal systems. If the server must be exposed, place it behind a Web Application Firewall (WAF) or an Intrusion Prevention System (IPS) with rules configured to inspect and block malformed requests that could trigger the vulnerability.

Exploitation status

Public Exploit Available: false

Analyst recommendation

Given the high severity score (CVSS 7.5) and the critical role of the Triton Inference Server in AI/ML infrastructure, this vulnerability poses a significant risk of service disruption. Although it is not currently listed on the CISA KEV list, its potential impact warrants immediate action. We strongly recommend that all organizations prioritize applying the vendor-supplied patches to all vulnerable systems without delay. In parallel, implement the proactive monitoring and compensating controls detailed above to create a defense-in-depth security posture and mitigate risk for systems awaiting a maintenance window.