CORDIS - Forschungsergebnisse der EU
CORDIS

The MareNostrum Experimental Exascale Platform

Periodic Reporting for period 1 - MEEP (The MareNostrum Experimental Exascale Platform)

Berichtszeitraum: 2020-01-01 bis 2021-06-30

The MEEP (MareNostrum Exascale Emulation Platform) project, is a flexible FPGA-based emulation platform that will explore hardware/software co-designs for Exascale Supercomputers and other hardware targets, based on European-developed IPs. The project is a technology demonstrator that can validate the architecture and design point for an Exascale class machine and derivatives, and at the same time address already established and nascent markets. The evaluation delivered at the end of the project is a proof-of-concept for industrial exploitation.

By experimenting with RISC-V-based hardware designs through FPGA emulation, it is possible to fine-tune the right architectures before committing to silicon. In addition, the emulation platform will also allow us to test all the layers of the full stack, from software applications to hardware. Furthermore, MEEP is a project that will make its Open Source IPs available for academic purposes and HPC application, thus contributing to improving quality of life, advancing science, boosting industrial competitiveness and ensuring Europe’s technological autonomy. As for the exascale feature, the computation capability will transform the ability to answer some of the world’s toughest and most important questions in many working fields.

Particularly, the project’s ambition is to play two important roles within the Exascale computation paradigm:

1. Become an evaluation platform of pre-silicon IP and ideas, capable of balancing speed and scalability.
2. Become a software development and experimentation platform to enable software readiness for new hardware designs. MEEP will accelerate software maturity, compared to the limitations of software simulation approaches, since IPs will be tested and validated before moving to silicon, considering a realistic componentization characterization and running them under their targeted execution contexts; which means saving time and money.
Currently we have identified a suite of workloads to accelerate on the MEEP platform, including traditional HPC workloads, High Performance Data Analytics and COMPSs based HPC workflows. The project must guarantee to execute properly in the proposed RISC-V environments. We have tested the MPIch library, the OpenMP runtime, and the BLIS library on the Fedora distribution. We have started the adaptation of the BLIS library to the offload execution mode. And finally, we are exploring Spark and TensorFlow to be run on RISC-V.

We are developing a benchmarking methodology, defining a set of metrics to capture the coverage and efficiency of the vector usage in applications: average vector length, instruction mix, and arithmetic floating point operations per cycle.

All the software development has been focused on emulated environments, different vendor RISC-V platforms and the FPGA RISC-V based boards. The LLVM compiler infrastructure is used to generate code for the RISC-V vector extension and systolic custom instructions. The offload mode will be supported with the "target spread" construct, which is currently being validated on a multiple GPU system. On the OS, we have defined the boot components and packages to be installed. We have selected Fedora as the base distribution and we have implemented an early networking mechanism based on tun-on-mmap. We have also built and run OS base container images under Debian and Fedora.

FPGA-based Emulated Platform:
On one hand, all the different MEEP Shell IP components have been developed to the point of having working demo designs, which validates the chosen roadmap for each of them, leaving for the second half of the project the fine-tuning, and finalizing the full integration of them into the MEEP Shell concept. Nonetheless, different approaches to a fully automated FPGA design generation process have been explored and implemented, paving the way to a strong FPGA flow that can get reflected in an effortless way to use a CI/CD flow or the creation of new interesting FPGA tools that can be useful for the community.

On the other hand, regarding the implementation of an emulated accelerator, two simplified versions of different ACME components have been already ported to the platform: 1) the VAS Tile core, and 2) the VAS Tile (a multi core system with 4 cores). First experiments have been done involving only one FPGA, and following the track of booting the OS capabilities.

MEEP targeted Accelerator:
From the targeted accelerator design, a first version of the VAS Tile core component has been released, which integrates a scalar core connected to a VPU through an OVI interface. In parallel to this, and moving towards having a matrix of VAS Tiles in the near future, there is progress on the integration of a many-core system with a shared L2 data cache in place, by using the same core as the one used in the VAS Tile core.
With the aim of getting faster results about the accelerator, without the need of waiting for the completion of the whole design, an architectural modeling simulator (Coyote) has been developed, and evolved. The current status of Coyote models most of the necessary features to model the ACME accelerator, and extracting different kinds of metrics through simulation; such as throughput, latencies, memory miss/hit rates, etc. Coyote raised its Technology Readiness Levels (TRL) from concept to demonstrator.

The results of the project were disseminated in some scientific papers, and events. In addition, press releases were launched and technical news pieces were published regularly on the website, resulting in 54 press clippings and over 16.969 page views respectively. Several demo videos were produced to present a description and usage of different hardware and software components developed in the project.

With respect to facilitating SMEs competitiveness, the consortium is fully aware of the barriers that SMEs occasionally encounter in funding their technology platforms, and the high expenses of developing and producing IPs. To correct this direction, the MEEP ecosystem enabled the use of cost-effective low-energy distributed computing solutions that provide a substantial percent reduction in the Total Cost of Ownership (TCO), i.e. the cost to buy, own, operate, and manage; when compared to the systems currently on the market. The SMEs highly benefited from the reduction of such costs. In addition, open source software provides a low cost entry point for SMEs and startups, allowing them to lower the risk and increase the speed of European business innovations.

We target that MEEP outcomes (like the MEEP shell and ACME components) will reach a Technology Readiness Levels (TRL) of demonstrator moving closer to market, and some of them even became improved products or services offered open source to enrich the RISC-V ecosystem. With respect to the software stack, we expect to include a set of components (like the OS, compiler or containerization support) that will be adapted or ported to work on top of the RISC-V architecture.

The work carried out in MEEP is moving towards influencing a large number of European research projects, relevant standardization bodies and diverse academic programmes. Three upcoming European projects will continue the development of the results and/or will leverage IP coming from MEEP:
- eProcessor
- The European Pilot
- EPI SGA2
MEEP structure