We have identified a suite of workloads to accelerate on the MEEP platform, including traditional HPC workloads, High Performance Data Analytics and COMPSs based HPC workflows. The project demonstrates the proper execution of these applications in the proposed RISC-V environments. We have tested the MPIch library, the OpenMP runtime, and the BLIS library on the Fedora distribution. We have started the adaptation of the BLIS library to the offload execution mode. And finally, we are exploring Spark and TensorFlow to be run on RISC-V.
We have developed a benchmarking methodology, defining a set of metrics to capture the coverage and efficiency of the vector usage i n applications: average vector length, instruction mix, and arithmetic floating point operations per cycle.
All the software development has been focused on emulated environments, different vendor RISC-V platforms and the FPGA RISC-V based boards. The LLVM compiler infrastructure is used to generate code for the RISC-V vector extension and systolic custom instructions. The offload mode will be supported with the OpenMP "target spread" construct, which is currently being validated on a multiple GPU system. On the OS, we have defined the boot components and packages to be installed. We have selected Fedora as the base distribution and we have implemented an early networking mechanism based on tun-on-mmap. We have also built and run OS base container images under Debian and Fedora.
FPGA-based Emulated Platform: a) all the different MEEP Shell IP components have been developed and demonstrated in working demo designs, which validates the chosen roadmap for each of them, leaving for the second half of the project to fine-tune, and finalizing the full integration of them into the MEEP Shell concept. Different approaches to a fully automated FPGA design generation process have been explored and implemented, paving the way to a strong open source FPGA flow that can get reflected in an effortless way to use a CI/CD flow or the creation of new interesting FPGA tools that can be useful for the community. b) regarding the implementation of an emulated accelerator, two simplified versions of different ACME components have been already ported to the platform: 1) the VAS Tile core, and 2) the VAS Tile (a multi core system with 4 cores). First experiments have been done involving only one FPGA, and following the track of booting the OS capabilities.
MEEP targeted Accelerator: From the targeted accelerator design, a first version of the VAS Tile core component has been released, which integrates a scalar core connected to a VPU through an OVI interface. In parallel to this, and moving towards having a matrix of VAS Tiles in the near future, there is progress on the integration of a many-core system with a shared L2 data cache in place, by using the same core as the one used in the VAS Tile core.