Migrating to exascale and, eventually, zettascale computing requires software optimised to make highly efficient use of the multi-core, heterogeneous systems. The Open Zettascale Lab at the University of Cambridge is tackling software portability challenges as it transitions from peta- to exascale systems and beyond. The transition will open doors to new scientific and engineering breakthroughs.
The applications that run on HPC systems must be designed to scale alongside compute platforms in ways that optimise operation and results. Fine-tuning software to take full advantage of emerging HPC systems—component by component—is one of the projects underway at the Cambridge Open Zettascale Lab, a testing ground for next-generation technologies and applications. This involves examining all aspects of an application’s design—how often it communicates with I/O and how efficiently it uses memory resources, for example—to ensure that it can take full advantage of the high-end hardware performance possibilities.
‘It’s harder to write code that makes efficient use of 1,000 nodes at the same time,’ Chris Edsall, Principal Zettascale Software Engineer, explains.
‘Scaling the code up to those nodes requires that you use those nodes efficiently and communicate with those nodes efficiently,’ adds Stefanie Reuter, Zettascale Software Engineer in the Lab.
Programming heterogeneous HPC systems broadly requires a way to recognise at runtime all the devices available to a given application and the ability to use them to process the application. The application will need to manage data sharing in a way that is tuned to the device(s) being used, avoiding excessive data movement, for example, and leveraging memory and memory movement. The application workload will also need to specify computations to be offloaded to selected GPUs/devices. Such open approaches to HPC development will help increase application portability and reduce unnecessary barriers to using and supporting new hardware innovations.
The Intel® oneAPI industry initiative and software library support the complex programming efforts of HPC developers with cross-architecture programming so code can be targeted to CPUs, GPUs, and other specialized accelerators transparently and portably. The oneAPI scalable programming model targets heterogeneous systems, which will be a common foundation of exascale computers, and eases the task of programming and running code on different compute architectures.
‘The key is that there’s no source code to change,’ said Edsall. ‘If you can use your code on [any processor in the exascale system], you get triple the use out of it.’
Striking the right balance between processor clock speeds and HPC application performance is also critical for managing power requirements and the carbon footprint of the emerging HPC systems. Here at the Lab, our software engineering team is benchmarking the application performance impacts of scaling the frequency use of GPU accelerators up and down. This can help conserve power without negligibly affecting operations.
For Edsall, the energy challenge of growing such large compute systems is an issue close to his heart. His group has experimented with operational settings, changing the power used for calculating the same problem from 250 kW down to 150 kW.
‘There was a performance penalty, so this was not an optimum sweet spot. But it does show that by looking at the operational characteristics of these massive-scale HPC systems, we can be mindful about the amount of energy we use.’ he said.
For his part, Edsall is also looking at how coding changes can affect power efficiencies, such as minimizing data movement for computation. ‘For example, don’t store things on disk; recompute them inside the node,’ he said. These tunings ‘could save money and the planet.’