CVE-2025-23322

NVIDIA · NVIDIA Multiple Products

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, a key component for AI and machine learning operations.

Executive summary

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, a key component for AI and machine learning operations. An attacker can exploit this flaw by sending and then cancelling multiple requests, which can cause the server to crash or potentially allow the attacker to run unauthorized code. This poses a significant risk of service disruption to critical applications and could lead to a complete system compromise.

Vulnerability

The vulnerability is a double-free memory corruption error within the NVIDIA Triton Inference Server. An unauthenticated remote attacker can trigger this condition by sending multiple requests to the server and then cancelling those requests in a specific timing window before the server has fully processed them. This race condition causes the application to attempt to release the same memory block twice, leading to memory corruption, which can be leveraged by an attacker to cause a denial-of-service (DoS) by crashing the server or, in a more advanced attack, achieve arbitrary code execution (ACE) with the privileges of the server process.

Business impact

This vulnerability is rated as High severity with a CVSS score of 7.5. Exploitation could have a significant business impact by disrupting services that rely on AI/ML models served by the Triton Inference Server. A denial-of-service attack would render these applications unavailable, halting business processes and potentially causing financial loss. If an attacker achieves arbitrary code execution, they could gain control of the underlying server, leading to sensitive data exfiltration, lateral movement across the network, or the deployment of ransomware, posing a critical risk to the organization's data integrity and security posture.

Remediation

Immediate Action: Apply the security updates released by NVIDIA across all affected Triton Inference Server instances immediately. Concurrently, security teams should actively monitor for any signs of exploitation attempts and conduct a thorough review of system and application access logs for anomalous activity, particularly focusing on the time before patch deployment.

Proactive Monitoring: Security teams should monitor for an unusual volume of cancelled requests in Triton server logs, unexpected server process crashes or restarts, and abnormal memory consumption spikes on the host system. On the network level, monitor for rapid, repeated connection and cancellation patterns originating from a single source IP address.

Compensating Controls: If patching cannot be performed immediately, implement the following controls:

  • Restrict network access to the Triton Inference Server to only trusted, internal application sources.
  • Implement strict rate-limiting on incoming requests to prevent an attacker from sending the high volume of requests needed to trigger the race condition.
  • Place the server behind a Web Application Firewall (WAF) or reverse proxy capable of inspecting and blocking malformed or suspicious request patterns.

Exploitation status

Public Exploit Available: false

Analyst recommendation

Given the high severity score (CVSS 7.5) and the potential for both service disruption and complete system compromise, it is strongly recommended that organizations prioritize the immediate deployment of the vendor-supplied security patches. The affected software is often a core component of critical business infrastructure, increasing the urgency. Although this vulnerability is not currently listed on the CISA KEV list, its potential impact warrants immediate and decisive action to mitigate risk.