NVIDIA Blackwell GPU’s debut submission showcases its remarkable performance capabilities, achieving up to 4x faster speeds on Llama 2 70B compared to previous generations. Additionally, the underlying NVIDIA Hopper architecture demonstrates substantial performance improvements across a range of industry-standard AI benchmarks. These advancements highlight the significant potential of NVIDIA Blackwell and Hopper in accelerating AI workloads and driving innovation in various domains.
Enterprises are rapidly adopting generative AI to create innovative services, but this puts immense strain on data center infrastructure. Training these large language models is resource-intensive, and providing real-time LLM-powered services adds another layer of complexity.
NVIDIA platforms continue to excel in MLPerf Inference benchmarks. The new NVIDIA Blackwell platform, featuring a second-generation Transformer Engine and FP4 Tensor Cores, demonstrated up to 4x faster performance than H100 on Llama 2 70B. The NVIDIA H200 also impressed with its performance across all data center benchmarks, including the MoE LLM Mixtral 8x7B. These results showcase the power of NVIDIA’s AI hardware in accelerating various AI workloads.
MoE models are emerging as a powerful tool for enhancing the versatility and capabilities of LLM deployments. By combining multiple expert models, MoE architectures can effectively tackle a wider range of tasks and provide more comprehensive and accurate responses to diverse queries, making them a valuable asset in various AI applications.
The escalating demand for large language models (LLMs) necessitates substantial computational power to handle inference requests promptly. To meet stringent real-time latency requirements and serve a vast user base, multi-GPU computing emerges as an indispensable solution. NVIDIA NVLink and NVSwitch, integral components of the NVIDIA Hopper architecture, facilitate high-bandwidth communication between GPUs, resulting in significant performance gains for real-time, cost-effective large model inference. The Blackwell platform, set to expand NVLink Switch’s capabilities with larger NVLink domains comprising 72 GPUs, will further elevate these advantages. Beyond NVIDIA’s submissions, a cohort of 10 NVIDIA partners, including ASUSTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology, and Supe-rmicro, contributed robust MLPerf Inference submissions, emphasizing the widespread accessibility of NVIDIA platforms.
Innovation of Relentless Software
NVIDIA is always working to make its platforms better. They regularly update their software to improve performance and add new features.
Recently, NVIDIA’s products like Hopper, Jetson, and Triton got much faster at inference. For example, the H200 GPU is now 27% faster at generative AI inference than before. This shows that NVIDIA customers get more value from their investment over time.
Triton Inference Server, is a versatile open-source tool that simplifies the process of deploying and managing AI models in production environments. Triton helps organizations reduce costs and significantly speed up the deployment of AI models, going from months to just minutes. This makes it easier for businesses to leverage the power of AI and realize its benefits more quickly.
In this round of MLPerf, Triton Inference Server delivered near-equal performance to NVIDIA’s bare-metal submissions, showing that organizations no longer have to choose between using a feature-rich production-grade AI inference server and achieving peak throughput performance.
Improved AI at the Edge
NVIDIA Jetson AGX Orin, a powerful platform for AI, has shown significant improvements in its ability to process large language models (LLMs) like GPT-J. This means that AI models can now be run on devices at the edge, like cameras and sensors, to provide real-time insights from data like images and videos. This can be used in many applications, from autonomous vehicles to smart factories.
Performance Leadership All Around
NVIDIA has shown that their AI platforms are very versatile and perform well on all kinds of tasks. These platforms can be used in data centers and at the edge, powering the most innovative AI applications and services. You can read more about these results in our technical blog.
H200 GPU-powered systems are now available from CoreWeave, the first cloud service provider to offer them. Server makers ASUS, Dell Technologies, HPE, QTC, and Supermicro also sell these systems.