Dell POC for Scalable and Heterogeneous Gen-AI Platform

Russ Fellows
April 15, 2024

Currently, most AI offerings are highly customized and designed to operate with specific hardware, either a particular vendor’s CPUs or a specialized hardware accelerator such as a GPU. Although the operational stacks in use vary across different operational environments, they maintain a core similarity and adapt to each specific hardware requirement.

Today, the conversation around Generative-AI LLMs often revolves around their training and the methods for enhancing their capabilities. However, the true value of AI comes to light when we deploy it in production. This Proof of Concept focuses on the application of generative AI models to generate useful results. Here, the term ‘inferencing’ is used to describe the process of extracting results from an AI application.

Our latest Lab Insight Report, Dell POC for Scalable and Heterogeneous Gen-AI Platform, outlines a Proof of Concept, and we investigate the ability to perform scale-out inferencing for production and to utilize a similar inferencing software stack across heterogeneous CPU and GPU systems to accommodate different production requirements.

This Lab Insight Report highlights the following:

A single CPU based system can support multiple, simultaneous, real-time sessions
GPU augmented clusters can support hundreds of simultaneous, real-time sessions
A common AI inferencing software architecture is used across heterogenous hardware

Designed to be industry-agnostic, this PoC provides an example of how we can create a general-purpose generative AI solution that can utilize a variety of hardware options to meet specific Gen-AI application requirements. If you are interested in learning more, download your copy of Dell POC for Scalable and Heterogeneous Gen-AI Platform, below. In addition you can also download our Executive Summary and Infographic.

Download our reports:

Lab Insight

Executive Summary

Infographic