Improving AI Inference with AMD EPYC Host CPUs
-
Mitch Lewis
AI workloads typically leverage GPU-based computation, and as a result, the performance considerations of AI are often heavily focused on GPUs. While GPUs are often the primary driver of AI performance, other infrastructure components can become a bottleneck. In particular, host CPUs provide a critical role for AI inference, even when utilizing GPUs for the bulk of the workload. In this context, the host CPU refers to the CPU inside the GPU server responsible for request handling, scheduling, and data movement between the application layer and GPUs.
This report explores the important role of the host CPU and evaluates the impact they have on AI performance. To isolate the impact of the host CPU, Signal65 conducted hands-on AI inference testing of two nearly identical systems. Both systems were configured with the same GPUs and technical specifications, with the key differentiator being the CPUs. One server was configured with AMD EPYC CPUs and the other with Intel Xeon CPUs.
Key Highlights
- Up to 14.64% greater request throughput
- Up to 14.38% greater output throughput
- Up to 46.54% faster time to first token
- Up to 11.46% lower latency


