In my role as a Software Engineer at the Zettascale Lab, I’ve been experimenting with GPU clock speeds to determine where it’s possible to achieve substantial energy savings without significantly impacting the performance of applications. This research aims to improve the sustainability of emerging zettascale and exascale systems.
Many of the world’s datacenters consume many megawatts of power. This situation isn’t sustainable in the long term due to resource availability and environmental impact. Already, some researchers estimate that the annual aggregate HPC energy load worldwide is at about 5.2 terawatt-hours, collectively generating about 2 million metric tons of CO2 each year.1
Clearly, the HPC community must find ways to balance power consumption volumes with the performance of the compute-intensive applications researchers and scientists rely on to make world-changing discoveries.
At the Zettascale Lab, I’ve been conducting experiments with one of the world’s fastest academic supercomputers–the University of Cambridge’s Wilkes-3. I’ve found that by adjusting the clock speeds of the Graphics Processing Units (GPUs) that accelerate some high-performance computing (HPC) applications, energy consumption can be reduced without appreciably impacting on the performance of these applications.
These GPUs are highly power-efficient processors optimised for handling thousands of concurrent tasks in parallel processing. In supercomputing applications, they are leveraged to accelerate compute-heavy functions in multiple application domains, including chemical research, fluid dynamics analysis, environmental modeling, visualization/image processing, AI and data analytics
In recent years, GPU frequency adjustments have enabled Cambridge’s Wilkes-3 to earn a top place on the Green500. The Green500 ranks the TOP500 supercomputers (the world’s 500 fastest systems) in terms of their energy efficiency, measuring performance per watt.
In my current research, I’ve looked at a sample of HPC applications to study their energy profiles. It’s not enough here simply to reduce the clock speed on the GPU by a standard amount for all applications. Different applications have different behaviours. Adjustments to GPU frequencies can deliver varying benefits depending on whether the application is memory-bound or compute-bound. In other words, making the same, standard adjustments won’t necessarily affect the power draw and performance in the same way from application to application. That’s one reason this research is necessary.
Turning the compute power down also increases the time it takes to complete a job. If the job then takes longer, does this lead actually to overall power savings? The aim of this work is to help researchers save more power per job than the extra power costs incurred by increasing the time to solution. The idea is to discover the optimal GPU frequency per application, thereby maximising the performance-to-power-consumption parameters.