"We are proud not only for having developed new solutions for these problems, but also for having tested them on a real hardware Prototype that runs entire, real HPC Application Programs.
The compute nodes of this Testbed are interconnected in a 3-dimensional torus topology, and software can communicate across nodes within less than 1 micro-second, bypassing several layers of overhead, hence unneeded energy consumption, like operating system calls and copying data multiple times from one memory buffer to another.
For storage, we have extended the BeeGFS popular open parallel file system, and we provide on-demand temporary storage locally, as opposed to the traditional central organization: distributed local storage reduces communication overheads and energy, and avoids the bottlenecks of centralized storage.
Our prototype packs compute nodes very close to each other and very close to main memory (DRAM) and persistent storage (SSD), thus reducing communication time and energy; then, to cool these very densely packaged nodes, we immerse them into a fluid that conducts heat but not electricity, using novel techniques.
We have ported real, full Applications from materials science, climate forecasting, computational fluid dynamics, astrophysics, neuroscience, and a database to this ARM plus FPGA platform, we have optimized them for this new environment, and we have evaluated them.
Our results have shown that the energy consumed for solving a given problem on this new ExaNeSt platform is 3 to 10 times lower, hence better, than for solving the same problem on traditional HPC processors of the same time frame (2016). For Applications with compute-intensive kernels, such as the N-Body problem, we used Reconfigurable Hardware (FPGA) Accelerators, in which case we achieved 2 times better (faster) time-to-solution relative to competitive traditional HPC processors of the same time frame, or 6 times better (faster) time-to-solution relative to a popular gaming GPU. Even more pronounced are our gains, in this case, in terms of energy-delay product (EDP - a metric that looks at both economizing on energy and reaching the solution fast): our FPGA Accelerator is two-and-a-half orders of magnitude better than competitive traditional HPC processors, or one order of magnitude better than a popular gaming GPU, or two times better than the most powerful current GPU, in terms of EDP, in this case.
The experience and these new technologies from ExaNeSt, together with complementary ones from the ""sister"" projects ExaNoDe and EcoScale, are now being used in the follow-on projects EuroEXA and EPI (European Processor Initiative), contributing to the EuroHPC JU strategy; they have been extensively published in Conference, Workshops, Trade shows, and Patents; and some of them form the basis for new commercial products of the industrial partners."