European Commission logo
English English
CORDIS - EU research results
CORDIS

dEsign enVironmEnt foR Extreme-Scale big data analytics on heterogeneous platforms

Periodic Reporting for period 1 - EVEREST (dEsign enVironmEnt foR Extreme-Scale big data analytics on heterogeneous platforms)

Reporting period: 2020-10-01 to 2022-03-31

Big Data analytics is one of the major challenges for today’s computing systems, aiming at extracting valuable knowledge and information from an exponentially-increasing amount of data. Energy efficient computing on heterogeneous architectures using domain-optimized accelerators and distributed computing for scalability are two key ingredients of future Big Data computing systems. Such novel Big Data computing systems must be re-designed to manage, elaborate, and move and enable safe and privacy-preserving data exchange among the different nodes.

EVEREST (dEsign enVironmEnt foR Extreme-Scale big data analyTics on heterogeneous platforms) enables the use of complex and distributed architectures with an increased level of heterogeneity both at the node and the system levels. The focus of EVEREST are nodes equipped with CPUs and one or more Field-programmable gate arrary (FPGAs). The overall organization of the nodes allows a broad range of different solutions, including but not limited to CPU-only, bus-attached FPGAs, and network-attached FPGAs. Due to the versatility of FPGAs, the EVEREST approach serves as a blueprint for future extensions to a broader set of accelerators, including GPUs and custom ASICs.
The EVEREST approach will be driven by and validated on three industry-relevant use-cases: 1) the prediction of renewable energy production for the energy trading market based on a weather modeling supported by the assimilation of measurement data from IoT devices including weather stations, 2) air-quality monitoring of industrial sites, and 3) a traffic modeling for intelligent transportation in smart cities.

EVEREST extends a traditional hardware/software co-design approach with novel methods for efficient data management, processing, and protection. The characteristics of the data sets drive the customization of the entire system. This increased heterogeneity is coupled with efficient hardware memory architectures based on application characteristics and customized on the given FPGAs. All contributions are designed to be interoperable with state-of-the-art or commercial alternatives to broaden their use.

The design environment developed in EVEREST has the objective to
- improve the productivity of the creation of Big Data applications through a reduction of the development cost and programming
- improve of the security of Big Data systems through the reduction of the time to identify anomalies in the data streams,
- reduce the energy cost and improvement of the execution performance thanks to the use of heterogeneous computing on, e.g. FPGA devices.

The goals of the EVEREST project for the three use-cases are to improve
- the performance of simulations for renewable energies prediction,
- the response time of the air-quality predictions,
- the overall performance of the traffic model framework.
EVEREST is proposing an integrated design environment, called EVEREST System Development Kit (SDK), that aims at
- supporting data-driven policies to ensure efficient data allocation, computation, communication, and protection. All these techniques will be abstracted from the designers and, whenever possible, automatically inferred from the data sets or the application descriptions,
- simplifying the programming of heterogeneous and distributed computing systems for Big Data applications through a combination of domain-specific abstractions, compiler optimizations, high-level synthesis, hardware memory generation, and runtime extensions. These technologies are interoperable with existing solutions thanks to the creation of a modular framework that will be released as open source.

The current version of the EVEREST SDK includes an automated DSL-to-FPGA compilation flow that allows the creation of accelerators with novel high-bandwith memory (HBM). This compilation flow is based on the open-source multi-level intermediate representation (MLIR) infrastructure, which is the basis for the convergence of multiple language interfaces, e.g. from Rust, Python, and ML frameworks such as TensorFlow.

Other achievements include the
- development of the reference implementations of the three use cases on IT4Is reference infrastructure,
- definition and prototype of a coherent design framework, integrating multiple software components,
- definition of data-oriented policies for efficient allocation, computation, communication, and data protection,
- release of two open data sets through Zenodo,
- definition of the runtime environment (dynamic autotuning framework, lightweight runtime layer, and virtualization extensions for FPGA),
- definition of the EVEREST target system for bus-attached and network attached FPGAs and the release of the cloudFPGA platform in combination with the respective programming environments,
- first version of an end-to-end machine-learning compilation flow deployed on the cloudFPGA platform
- creation of the project website, social media, identify package,
- publication of 2 journal and 7 conference papers including a positional paper with contributions from all partners,
- organization of events to disseminate the results to presentations.
EVEREST proposes a data-centric programming framework that covers the optimization of the memory accesses through the entire software stack. Domain-specific languages (DSLs) express information about the computation and memory access patterns in the target applications. Such information is used to implement a fully automated and transparent memory management at both compile time and runtime.

EVEREST implements the use of the open-source multi-level intermediate representation (MLIR) as a unified IR, which acts as an integration layer between different DSLs and heterogeneous target system architectures. This leads to better reusability, interoperability, and extensibility beyond EVEREST.

Hardware acceleration is achieved with a combination of productivity-enhancing high-level synthesis (HLS) based solutions for optimizing the memory accesses and the computation. This combined solution will enable the exploitation of more spatial parallelism while limiting resources and energy consumption.

In EVEREST, the workflow pipelines of the use-cases are described with a novel API and the resulting runtime infrastructure supports resource management within virtualized multi-node environments. EVEREST features runtime-adaptivity to match the application’s requirements and the characteristics of the memories. The runtime dynamically selects the executable from a set of pre-compiled code variants and hardware configurations based on configurable optimization goals (e.g. performance vs energy). The runtime infrastructure is enhanced with the support of virtualization to seamlessly interact with the hardware accelerators concurrently and in an isolated software partition.

EVEREST is expected to have the following impacts:
- Increased productivity and quality of system design and software development thanks to better methods, architectures and tools for complex federated/distributed systems handling extremely large volumes and streams of data.
- Significant increase of data throughput.
- Demonstrated adoption of results of the extreme-scale analysis and prediction in decision-making, including AI (in industry and/or society).
- Adoption by end-user communities, innovation agendas, and standardization committees.
everest.jpg