CVE-2025-23323

NVIDIA · NVIDIA Triton Inference Server for Windows and Linux

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, a platform used for deploying AI models.

Executive summary

A high-severity vulnerability has been identified in the NVIDIA Triton Inference Server, a platform used for deploying AI models. An unauthenticated attacker can send a specially crafted request to the server, causing an integer overflow that leads to a server crash. This results in a denial-of-service condition, making AI-powered applications and services that rely on the server unavailable.

Vulnerability

The vulnerability is an integer overflow within the NVIDIA Triton Inference Server. An attacker can exploit this by sending a request containing an invalid, maliciously large numerical value for a specific parameter. When the server processes this value, the integer data type cannot accommodate the large number, causing it to "wrap around" to a small or negative number. This incorrect value is then used in subsequent operations, such as memory allocation, leading to a memory access error and a segmentation fault, which terminates the server process and causes a denial of service.

Business impact

This vulnerability is rated as High severity with a CVSS score of 7.5. The primary business impact is a Denial of Service (DoS). Successful exploitation would crash the Triton Inference Server, disrupting all AI/ML models it serves. This can halt critical business functions that rely on real-time inference, such as recommendation engines, fraud detection systems, or automated customer support, leading to potential revenue loss, operational downtime, and damage to the organization's reputation. The risk is significant for any organization leveraging NVIDIA Triton for production AI workloads.

Remediation

Immediate Action: The primary remediation is to apply the security updates provided by NVIDIA to the affected Triton Inference Server instances immediately. After patching, system administrators should monitor server logs and performance to ensure the patch has been applied successfully and has not introduced any instability.

Proactive Monitoring: Organizations should configure monitoring to detect potential exploitation attempts. This includes setting up alerts for unexpected crashes or restarts of the Triton server process. Review server access logs for malformed requests, particularly those containing unusually large numerical values in their parameters. Network traffic should be monitored for patterns indicative of a DoS attack, such as a high volume of invalid requests from a single source.

Compensating Controls: If patching cannot be performed immediately, implement compensating controls to reduce the risk. Place the Triton Inference Server behind a Web Application Firewall (WAF) or an API gateway configured to inspect and block requests with out-of-range or malicious parameter values. Additionally, restrict network access to the server, allowing connections only from trusted and authorized client systems to limit the attack surface.

Exploitation status

Public Exploit Available: False

Analyst recommendation

Given the high CVSS score of 7.5 and the critical role of the Triton Inference Server in AI/ML operations, we strongly recommend that all organizations using this software treat this vulnerability with high urgency. The primary course of action is to apply the vendor-provided security updates across all affected systems without delay. Although this CVE is not currently on the CISA KEV list, its potential for causing significant service disruption warrants immediate attention. Implementing proactive monitoring and compensating controls like a WAF will provide an essential layer of defense-in-depth against potential exploitation attempts.