Skip to main content

Computing Server Architecture with Joint Power and Cooling Integration at the Nanoscale

Periodic Reporting for period 2 - COMPUSAPIEN (Computing Server Architecture with Joint Power and Cooling Integration at the Nanoscale)

Reporting period: 2018-12-01 to 2020-05-31

The soaring demand for computing power in the last years has grown faster than semiconductor technology evolution can sustain, and has produced as collateral undesirable effect a surge in power consumption and heat density in computing servers. Although computing servers are the foundations of the digital revolution, their current designs require 30-40% of the energy supplied to be dissipated in cooling. The remaining energy is used for computation, but their complex many-core designs produce very high operating temperatures. Thus, operating all the cores continuously at maximum performance levels results in system overheating and failures. This situation is limiting the benefits of technology scaling.

The COMPUSAPIEN project aims to completely revise the current server architecture design. In particular, inspired by the mammalian brain, COMPUSAPIEN targets to design a disruptive three-dimensional (3D) computing server architecture that overcomes the prevailing worst-case power and cooling provisioning paradigm for servers. This new 3D server design will be composed of heterogeneous many-core architectures integrated on-chip microfluidic Fuel Cell Arrays (FCAs) able to provide joint cooling delivery and power generation. Furthermore, COMPUSAPIEN proposes novel predictive controller based on holistic power-temperature models, which exploit the server software stack to achieve energy-scalable computing capabilities. COMPUSAPIEN is clearly a high-risk high-reward proposal that will bring drastic energy savings with respect to current server design approaches, and will guarantee energy scalability in future server architectures. To realize this vision, COMPUSAPIEN will develop and integrate breakthrough innovations in heterogeneous computing architectures, cooling-power subsystem design, combined microfluidic power delivery and temperature management in computers.
From the beginning of the project, and in accordance to the objectives of the COMPUSAPIEN project, the main results achieved can be divided in three different areas:
(1) The design of energy-minimal 3D many-core server architectures that enable energy-proportional computing by leveraging the use of heterogeneity, parallel computing and in-memory computing (IMC). In this respect, we have proposed novel IMC architectures by developing a novel in-cache computing accelerator, named BLADE. This accelerator has been integrated into ARM32 / 64 architectures. We have also incorporated High Bandwidth Memories (HBMs) into 3D many-core designs, and developed a novel simulation framework, named gem5-X, to evaluate new server architectures. Our server architectures were assessed for a wide range of novel HPC benchmarks, i.e. Convolutional Neural Networks, Genome Sequencing or video streaming transcoding, and Artificial Intelligence (AI) analytics.

(2) The design of an integrated power and microfluidic cooling delivery subsystem able to overcome the limits of current cooling strategies in 3D MPSoCs. By imitating the dual role of blood in the brain, the FCA technology is able at the same time to extract heat and generate power thanks to the temperature-driven electrochemical reaction of FCAs. In this respect, in COMPUSAPIEN we enhanced the PowerCool framework and integrated it with the 3D-ICE simulator to enable the exploration of FCAs and assess their benefits for a wide range of different architectures. Furthermore, we oroved that FCAs can provide enough power to sustain the caches of an HPC processor, or 50% of the power needed by a low-power accelerator layer in a 3D MPSoC. We also have analyzed the impact of FCAs on the power delivery network of 3D MPSoCs to reduce the voltage drop, the number of power TSVs for the 3D MPSoC, and to increase chip bandwidth with more signal TSVs.

(3) The system-level multi-objective management of 3D stack computing resources by trading-offs energy consumption, temperature and performance. Specifically, we have proposed joint cooling and workload management techniques at the server level, These techniques use as control knobs cooling, workload allocation (i.e. task mapping to cores and accelerators) and setting the frequency of cores and other hardware accelerators. To cope with the large dynamism of the problem and the huge design space, we have proposed reinforcement learning based techniques, both single-agent and multi-agent, to tackle the problem and improve all metrics jointly.
The major outcomes of COMPUSAPIEN can be summarized as follows:

(1) Novel architectures for energy-minimal 3D many-core servers:
a. Design of in-memory computing architectures, such as the in-cache computing accelerator BLADE. BLADE has been designed in 28nm CMOS technology and integrated into ARM32 and ARM64 architectures.
b. Architectural exploration of ARM32, ARM64 architectures equipped with HBMs for the execution of novel HPC applications with stringent timing constraints.
c. Analysis of Non-Volatile Memory architectures to drastically reduce the energy consumption of MPSoCs. In particular, we have integrated Resistive RAMs (RRAMS) into the cache hierarchy, proposing the use of Hybrid cache systems.

(2) Joint power generation and cooling control:
a. Architectural exploration of the main benefits of FCAs in a 3D MPSoC. Assessment of the architectures that would benefit the most from this technology, and optimization of FCA-equipped 3D MPSoCs at design-time, to find the most suitable 3D MPSoC for a particular workload.
b. Exploration of the changes that FCA incorporation implies in the power grid design of 3D MPSoCs. In particular, we have shown how FCAs can minimize the voltage drop, thus reducing the number of power TSVs and increasing chip bandwidth by using the free space for signal TSVs.

(3) Multi-objective hierarchical resource management:
a. Proposal of multi-objective (thermal, power, performance) control policies based on reinforcement learning. Our techniques jointly combine cooling control, task allocation, and frequency setting to provide improvements in performance and energy, together with better thermal reliability. Furthermore, we show the scalability of our techniques by proposing multi-agent based approaches when the design space is too large to make use of single-agent approaches.

We expect novel results on these three different axis until the end of the project. Specifically, we will continue the research on IMC architectures, integrating BLADE on RISC-V architectures and taping out a proof-of-concept version of the accelerator. Furthermore, we will keep on enhancing our architectures integrating NVMs (in particular RRAMs) from the technology level up to the system level. From the FCA perspective, we will propose 3D stacks specifically designed for FCAs, which would not be feasible to manufacture today without this technology. From the software and resource management perspective, we will equip propose hierarchical control policies with novel control knobs to exploit FCAs and the new 3D architecture heterogeneity.

Overall, the final goal of COMPUSAPIEN is to propose the integration of 3D heterogeneous architectures with microfluidic power/cooling delivery, removing dark silicon by going beyond the limits of power delivery and heat dissipation in server designs, opening perspectives in energy-proportional server architectures.
compusapien-logo.png