In a benchmark meant to measure the performance of training machine-learning models, Nvidia came out on top. MLCommons, a group that develops benchmarks for AI technology training algorithms, revealed the results for a new test that determines system speeds for training algorithms specifically used for the creation of chatbots like ChatGPT. MLPerf 3.0 is meant to provide an industry-standard set of benchmarks for evaluating ML model training. Model training can be a rather lengthy process, taking weeks and even months depending on the size of a data set. That requires an awful lot of power consumption, so training can get expensive. The MLPerf Training benchmark suite is a full series of tests that stress machine-learning models, software, and hardware for a broad range of applications. It found performance gains of up to 1.54x compared to just six months ago and between 33x and 49x compared to the first round in 2018. As quickly as AI and ML have grown, MLCommons has been updating its MLPerf Training benchmarks. The latest revision, Training version 3.0, adds testing for training large language models (LLM), specifically for GPT-3, the LLM used in ChatGPT. This is the first revision of the benchmark to include such testing. All told, the test yielded 250 performance results from 16 vendors’ hardware, including systems from Intel, Lenovo and Microsoft Azure. Notably absent from the test was AMD, which has a highly competitive AI accelerator in its Instinct line. (AMD did not respond to queries as of press time.) Also notable is that Intel did not submit its Xeon or GPU Max and instead opted to test its Gaudi 2 dedicated AI processor from Habana Labs. Intel told me it chose Gaudi 2 because it is purpose-designed for high performance, high efficiency, deep learning training and inference and is particularly able to manage generative AI and large language models, including GPT-3. Using a cluster of 3,584 H100 GPUs built in partnership with AI cloud startup CoreWeave, Nvidia posted a training time of 10.94 minutes. Habana Labs took 311.945 minutes but with a much smaller system equipped with 384 Gaudi2 chips. The question then becomes which is the cheaper option when you factor in both acquisition costs and operational costs? MLCommons didn’t go into that. The faster benchmarks are a reflection of faster silicon, naturally, but also optimizations in algorithms and software. Optimized models mean faster development of models for everyone. The benchmark results show how various configurations performed, so you can decide based on configuration and price whether the performance is a fit for your application. Related content news Pure Storage adds AI features for security and performance Updated infrastructure-as-code management capabilities and expanded SLAs are among the new features from Pure Storage. By Andy Patrizio Jun 26, 2024 3 mins Enterprise Storage Data Center news Nvidia teases next-generation Rubin platform, shares physical AI vision ‘I'm not sure yet whether I'm going to regret this or not,' said Nvidia CEO Jensen Huang as he revealed 2026 plans for the company’s Rubin GPU platform. By Andy Patrizio Jun 17, 2024 4 mins CPUs and Processors Data Center news Intel launches sixth-generation Xeon processor line With the new generation chips, Intel is putting an emphasis on energy efficiency. By Andy Patrizio Jun 06, 2024 3 mins CPUs and Processors Data Center news AMD updates Instinct data center GPU line Unveiled at Computex 2024. the new AI processing card from AMD will come with much more high-bandwidth memory than its predecessor. By Andy Patrizio Jun 04, 2024 3 mins CPUs and Processors Data Center PODCASTS VIDEOS RESOURCES EVENTS NEWSLETTERS Newsletter Promo Module Test Description for newsletter promo module. Please enter a valid email address Subscribe