Verified Leadership: Analyzing Microsoft Azure ND GB200 v6 VMs Inference Performance

As the focus of generative AI for enterprises moves experimental deployments to production inferencing deployments, efficiency is a critical aspect to achieving business value.  In this environment, infrastructure decisions require rigorous, transparent and comparable performance data for comparison.
Against that backdrop, the latest MLPerf Inference v5.1 results provide a vendor neutral view of how platforms perform under standardized workloads. In this paper, we focus on the Microsoft Azure ND GB200 v6 virtual machines (VMs) accelerated by the NVIDIA GB200 NVL4. We analyze Azure’s MLPerf results in the Llama2 70B and Llama3.1 405B benchmarks, explain why these tests matter for real deployments, and translate raw data into actionable insights.
Microsoft Azure is a leading provider of AI infrastructure in the public cloud. The Azure ND GB200 v6 is a generally available accelerated instance designed with a scale-out NVIDIA NVLink fabric and platform software stack that targets both training and inference.

This report analyzes the latest MLPerf Inference v5.1 along with relevant v5.0 results, focusing on the performance of Microsoft Azure’s ND GB200 v6 virtual machines, which are accelerated by the NVIDIA GB200 NVL72. The analysis reveals a clear and verifiable leadership position for the Azure platform.

Research commissioned by: