CVE-2025-23311

NVIDIA · NVIDIA Triton Inference Server (Note: This component may be bundled within other NVIDIA software suites and products).

A critical vulnerability has been identified in the NVIDIA Triton Inference Server, a platform used for deploying artificial intelligence models.

Executive summary

A critical vulnerability has been identified in the NVIDIA Triton Inference Server, a platform used for deploying artificial intelligence models. An unauthenticated attacker can send a specially crafted network request to cause a stack overflow, which could allow them to execute arbitrary code and take full control of the affected server. This vulnerability poses a severe risk of data breach, system compromise, and disruption of critical AI-powered services.

Vulnerability

The vulnerability is a stack-based buffer overflow within the HTTP request handling component of the NVIDIA Triton Inference Server. An unauthenticated remote attacker can exploit this by sending a crafted HTTP request containing an overly long value in a specific field (e.g., a header or parameter). This action overwrites the program's call stack, potentially allowing the attacker to hijack the execution flow and run arbitrary code with the permissions of the Triton server process, or at minimum, crash the server causing a denial of service.

Business impact

This vulnerability is rated as critical severity with a CVSS score of 9.8. A successful exploit could lead to complete system compromise, allowing an attacker to execute remote code on the server hosting the Triton Inference Server. This poses a significant business risk, including the theft of proprietary AI/ML models and sensitive training data, unauthorized access to internal networks, and the potential for deploying ransomware. A denial-of-service attack would disrupt critical AI-driven applications, leading to operational downtime, reputational damage, and financial loss.

Remediation

Immediate Action: Organizations must immediately apply the security updates provided by NVIDIA. Prioritize patching internet-facing Triton Inference Servers and then internal instances to mitigate the risk of exploitation.

Proactive Monitoring: Security teams should actively monitor for signs of exploitation. Review web server and application logs for unusually long or malformed HTTP requests directed at the Triton server. Monitor system behavior for unexpected crashes or restarts of the Triton server process and look for anomalous outbound network connections from the server, which could indicate a successful compromise.

Compensating Controls: If immediate patching is not feasible, implement a Web Application Firewall (WAF) with rules to inspect and block malformed HTTP requests or those with excessively long headers/parameters. Additionally, restrict network access to the Triton Inference Server's HTTP port, allowing connections only from trusted, authorized IP addresses to reduce the attack surface.

Exploitation status

Public Exploit Available: False

Analyst recommendation

Given the critical severity (CVSS 9.8) and the high potential for remote code execution, this vulnerability requires immediate attention. We strongly recommend that all organizations using the NVIDIA Triton Inference Server apply the vendor-supplied patches without delay, prioritizing internet-exposed systems. Although not yet listed in the CISA KEV, its severity makes it a prime target for exploitation. If patching is delayed, the compensating controls outlined above should be implemented as an urgent temporary measure.