CVE-2025-23325

NVIDIA · NVIDIA Multiple Products

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions.

Executive summary

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, affecting both Windows and Linux versions. An unauthenticated attacker can send a specially crafted request over the network, causing the server to crash and become unavailable. This presents a significant risk of Denial of Service (DoS), which could disrupt critical AI and machine learning operations that rely on this platform.

Vulnerability

The vulnerability exists within the input processing module of the NVIDIA Triton Inference Server. An attacker can send a malformed request to the server's API endpoint. This specially crafted input triggers a function that calls itself recursively without a proper termination condition, leading to a condition known as uncontrolled recursion. This rapidly consumes all available stack memory, causing a stack overflow and resulting in the immediate termination of the server process, leading to a Denial of Service.

Business impact

This vulnerability is rated as High severity with a CVSS score of 7.5. Successful exploitation results in a complete Denial of Service (DoS) of the Triton Inference Server. For any organization leveraging this server for real-time AI/ML model inference, the impact is direct and severe. This can lead to significant operational disruption, outage of customer-facing applications, failure of internal automated processes, and potential financial loss associated with service downtime. The primary risk is to service availability and business continuity.

Remediation

Immediate Action: Apply the security updates released by NVIDIA immediately to patch the vulnerability across all affected systems. Concurrently, security teams should begin to monitor for any signs of exploitation attempts by closely reviewing server access logs for anomalous or malformed requests targeting the inference server.

Proactive Monitoring:

Log Analysis: Monitor Triton server logs and system-level event logs (Windows Event Viewer, Linux syslog) for crash reports, stack overflow errors, or unexpected process terminations.
Network Traffic Analysis: Inspect network traffic for unusual patterns or malformed API requests directed at the Triton server. A sudden increase in failed requests from a specific source IP could indicate an attack attempt.
System Performance: Utilize system monitoring tools to alert on sudden restarts of the Triton server process or spikes in CPU and memory usage that precede a crash, as these can be indicators of an ongoing attack.

Compensating Controls:

Access Control Lists (ACLs): If patching is delayed, restrict network access to the Triton Inference Server to only trusted, authorized IP addresses and subnets.
Web Application Firewall (WAF): Deploy a WAF in front of the server to inspect and filter malicious requests. Custom rules may be required to block the specific attack pattern once it is better understood.
Rate Limiting: Implement rate limiting on the server's API endpoints to slow down and mitigate automated attack attempts from a single source.

Exploitation status

Public Exploit Available: False

Analyst recommendation

This High severity vulnerability poses a direct threat to the availability of critical AI/ML services. Given the potential for significant operational disruption, organizations must treat this as a high-priority issue. We strongly recommend that all system owners identify affected Triton Inference Servers and apply the vendor-supplied patches immediately. Although not currently on the CISA KEV list, the risk of service interruption is substantial. Until patches can be fully deployed, organizations should implement compensating controls such as network segmentation and enhanced monitoring to mitigate risk.