AMD Wants to Improve AI, HPC Efficiency 30x by 2025

2023-12-06 in phone Tagged hacks, tips, tricks - 4 Minutes

AMD has announced a major new efficiency initiative that’s intended to build off its previous 25×20 project. The company will now work to deliver a 30x improvement in energy efficiency in AI and high performance computing overall, relative to today’s CPU and GPU accelerators.

AMD’s blog post doesn’t go into much detail on how they intend to achieve this savings, beyond some references to a pressing need to lower the cost of compute in data centers, and the rapid growth of AI. Many of the companies working on large AI clusters have stated that they have halted or slowed their buildouts due to power and cooling requirements. There’s a focus throughout the industry on improving the computational efficiency of AI through a variety of methods, both in hardware and software.

We can hazard a few guesses on how AMD will hit these goals based on its known IP development. First, it would be helpful to know which GPU architecture AMD is comparing against. The blog post and PR only mention “Radeon Instinct,” but there are multiple GPU architectures in the Radeon Instinct family. If AMD is using one of its older GCN parts for comparison, the 30x by 2025 is easier to hit.

There are rumors that AMD’s Zen 4 architecture will support AVX-512, which suggests another avenue by which AMD might boost its AI performance and overall efficiency. AMD has a decades-long history of adding support for Intel extensions at roughly n-1 extension sets, or when the Intel extensions have been reserved to Intel products for a significant period of time.

By the time Zen 4 presumably appears in late 2022 with rumored AVX-512 support , Intel should have Sapphire Rapids with support for AMX (Advanced matriX Extensions) built-in. AMD might have added AMX support or be preparing to add it by 2025. It’s not clear exactly how much efficiency AMD would gain from adopting these new SIMD sets, but we can assume that a fair percent of the company’s total improvement will come from new instruction support — via AVX-512, if nothing else.

Next up, there’s the potential performance advantage of AMD’s V-Cache. Caching data generally improves the performance of many workloads, but it’s possible that AMD has specific plans in mind for how it can leverage large L3 caches to boost AI power efficiency in the future. Today, CPUs can expect to spend as much or more power moving data as they do computing on it. Larger caches and better caching algorithms could boost AI execution efficiency by reducing the amount of data that needs to move on and off a given CPU. Improvements to AMD’s ROCm software translation layer could also yield some significant advances in AI power efficiency.

By 2025, we should be seeing the fruits of AMD’s Xilinx purchase/merger and manufacturers like TSMC should be pushing into 2nm and beyond. While manufacturing and lithography improvements do not improve power consumption as they once did, we’re still talking about several generations of successive improvements relative to 7nm. AMD tends to lag the leading edge by a couple of years these days, but 2nm isn’t out of the question by the end of 2025. The cumulative improvements from three node shrinks — 5nm, 3nm, and presumably 2nm — should be at least as big as the gains from 16nm to 7nm and might be a bit larger.

What makes AMD’s claim a bit eyebrow raising is the position the company is in relative to its previous 25×20 plan. When AMD set its 25×20 goal, it was targeting a 25x improvement in power consumption over six years, based on where the company found itself back in 2014. This was during the Bulldozer era, when power efficiency wasn’t exactly AMD’s strongest suit. AMD’s power efficiency in 2020 was much stronger, even if the company starts with Zen 2 + Vega as opposed to Zen 3 + CDNA. Delivering such a high rate of improvement is going to be tricky.

AMD undoubtedly means its target, but keep in mind that these targets haven’t stopped the absolute amount of power consumed in computing from trending steadily upwards. One of the most fundamental methods of improving performance, regardless of any underlying efficiency trend, is throwing more transistors and electricity at a problem.

Ultimately, the question for AMD isn’t whether it can deliver a 2x, 5x, or 30x increase in energy efficiency by 2025 — it’s how well the company’s CPUs will compete against the ARM and x86 CPUs that’ll be in market by then.

Now Read: