Improving AI Inference with AMD EPYC Host CPUs

Mitch Lewis
February 23, 2026

AI workloads typically leverage GPU-based computation, and as a result, the performance considerations of AI are often heavily focused on GPUs. While GPUs are often the primary driver of AI performance, other infrastructure components can become a bottleneck. In particular, host CPUs provide a critical role for AI inference, even when utilizing GPUs for the bulk of the workload. In this context, the host CPU refers to the CPU inside the GPU server responsible for request handling, scheduling, and data movement between the application layer and GPUs.

This report explores the important role of the host CPU and evaluates the impact they have on AI performance. To isolate the impact of the host CPU, Signal65 conducted hands-on AI inference testing of two nearly identical systems. Both systems were configured with the same GPUs and technical specifications, with the key differentiator being the CPUs. One server was configured with AMD EPYC CPUs and the other with Intel Xeon CPUs.

Key Highlights

AMD EPYC host nodes achieved higher throughput, faster time to first token, and lower inter-token latency than Intel Xeon.

Testing demonstrated consistent performance advantages across 7 distinct AI models.

AMD EPYC High Frequency Processors are purpose built for AI workloads and present a practical approach to maximize the performance and cost-efficiency of large AI datacenters.

Throughout this testing, Signal65 found AMD EPYC CPUs to provide notable performance improvements during AI inferencing. Key findings include: