The initial project plan to develop compute capability combined with 3D-integration technology was to procure chiplets (System-on-Chip to be stacked on a silicon interposer) that met ExaNoDe’s requirements. At project commencement it became clear that the chiplet market maturity was insufficient to allow chiplet procurement meeting our needs, while an in-project development was infeasible for budget reasons. Nevertheless, recent developments confirm the business relevance, with adoption of computing modules including chiplets illustrated by recent announcements from AMD .
The consortium thus used an FPGA compute unit with embedded ARM-v8 processors. The FPGA configurable logic was used to support UNIMEM. The 3D integration was maintained as an architectural vision and prototyped in a 3D-Integrated-Circuit (3D-IC) via the chiplet design, manufacturing and assembly onto a silicon interposer.
The two ExaNoDe prototypes allowed for focused developments of software on FPGA-based compute nodes and 3D integration technology development and manufacturing. Both prototypes realised in daughter boards compatible with the ExaNeSt project prototype. The two ExaNoDe prototypes use different variants of the Multi-Chip-Module: variant 1 (MCM-1) without 3-D integration, but with the necessary software stack to include the various programming models/run-times and execution of mini-applications; variant 2 (MCM-2) with 3-D integration including two chiplets stacked on one silicon interposer with the necessary functionality in the FPGA to facilitate the 3D-IC evaluation.
ExaNoDe’s main achievements cover all aspects of heterogeneous integration including:
• Architecture and design:
o Innovative, high-speed and low-power interconnect for chiplets via a silicon interposer.
o A Convolutional Neural Network (CNN) accelerator hardware IP within the chiplet.
o A chiplet System-on-Chip (SoC) in a 28FDSOI technology node.
• Advanced integration:
o 3D integration of chiplets on an active silicon interposer with approximately 50,000 high density (20 µm pitch) connections.
o Advanced package integration with two FPGA bare dies including ARMv8 cores, one interposer and 43 decoupling capacitors in a 68.5 mm ×55 mm Multi-chip-Module (MCM).
o Integration of two MCMs on a 260 mm x 120 mm daughter board.
• System software:
o Development of a complete SW stack including UNIMEM-based system software and middleware; Runtime libraries optimized for the UNIMEM architecture (OmpSs, MPI, OpenStream, GPI); Checkpointing technology for virtualisation; A set of mini-applications for benchmarking purposes.
• Benchmarking and evaluation:
o Porting and evaluation of some representative mini-applications on the daughter-board implementing the MCM-1 module with FPGA.
o A performance projection of ExaNoDe technologies into a strawman architecture representative of upcoming HPC processors.
The project also generated a number of “lessons learned” that are of significant value for the project participants and potentially for the European Commission’s HPC programme. It should be highlighted that planning for innovative, leading-edge hardware developments poses a challenge for R&D projects, and despite continuous risk monitoring and the use of mitigation plans, the original project time-lines could not be met. Key aspects were:
• Advanced packaging technologies generated significant challenges for assembly and integration (not all could be met within the extended project period).
• Providing the related full software stack implies a significant effort in debugging and performance tuning (which exceeded original project plans).
For the core hardware developments, especially advanced integration, highlighted aspects are:
• The project validated a process for 3-D assembly (microbumps of 10 µm diameter with a 20 µm pitch) more advanced than current industrial standards.
• Since single component failure can impact overall system functionality, mitigation approaches should be defined in the design stage.
• Known Good Die (KGD) tests performed on the active interposer could have improved pre-assembly selection.
• Additional board-level simulation should be performed to avoid errors being detected at the time of bring-up.