Skip to main content
An official website of the European UnionAn official EU website
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

dEsign enVironmEnt foR Extreme-Scale big data analytics on heterogeneous platforms

Periodic Reporting for period 2 - EVEREST (dEsign enVironmEnt foR Extreme-Scale big data analytics on heterogeneous platforms)

Reporting period: 2022-04-01 to 2024-03-31

Big Data analytics and the training/inference of large artificial intelligence (AI) models are the cornerstones on which companies and the open-source community build today’s IT applications. Extracting knowledge and information from massive data sets provides opportunities to create value for our digital society but also poses grand societal challenges to computational capabilities, energy efficiency, and the security of today’s computing systems. The energy-efficiency challenge is increasingly addressed through developing and using specialized hardware on field-programmable gate array (FPGA) devices. However, using specialized hardware leads to complex heterogeneous systems, which are difficult to program and operate.

EVEREST (dEsign enVironmEnt foR Extreme-Scale big data analyTics on heterogeneous platforms) provides a System Development Kit (SDK) to deploy applications and accelerate performance-relevant kernels on complex heterogeneous systems. To develop and test the SDK, EVEREST built and operated an experimental heterogeneous hardware platform that combines CPUs and FPGA accelerators (bus-attached and network-attached). Due to the versatility of FPGAs, the EVEREST approach serves as a blueprint for future extensions to a broader set of accelerators, including GPUs and ASICs.

The development of the EVEREST SDK has been driven by and validated on three industry-relevant use cases:
1) renewable energy production forecasting,
2) air-quality monitoring of industrial sites, and
3) intelligent transportation in smart cities.
EVEREST extends a traditional hardware/software co-design approach with novel methods for efficient data management, processing, and protection. It offers a unified hardware generation flow that can support multiple application descriptions. All SDK tools are modular, making them interoperable with alternative solutions to broaden their use.

In conclusion, the EVEREST SDK enables application developers to
- improve the productivity of the creation of Big Data applications through a reduction of development costs and programming effort,
- improve the security of Big Data systems through the reduction of the time to identify anomalies in the data streams,
- reduce energy costs and improve execution performance thanks to FPGA devices.

The application of the SDK allowed the use case providers to deploy and accelerate their applications on the EVEREST target platform and improve
- the quality of the renewable energy prediction,
- the response time of the air-quality predictions,
- the overall performance of the traffic model framework.
EVEREST proposed an integrated design environment, called EVEREST System Development Kit (SDK), which aims at
- supporting data-driven policies to ensure efficient data allocation, computation, communication, and protection. All these techniques are abstracted from the designers and described in domain-specific languages or derived from the application descriptions. Data information is automatically inferred from the data sets (e.g. data ranges) whenever possible.
- simplifying the programming of FPGA-based computing systems for Big Data applications through domain-specific abstractions (based on the novel MLIR compiler infrastructure), compiler optimizations, high-level synthesis, hardware memory generation, automated system-level integration, and virtualized runtime extensions. These technologies are interoperable with existing solutions thanks to a modular framework.

The EVEREST SDK was released as open-source and includes tools for compilation, virtualization, run-time management, and data security of big data and AI applications. The compilation flow leverages domain-specific languages and the open-source MLIR infrastructure to support multiple popular programming models (incl. Python, Rust, TensorFlow, and PyTorch) or to modernize traditional HPC codes. The runtime environment allows for dynamic adaptivity.

Other achievements include the
- use of the SDK to develop end-to-end implementations of the three use cases with accelerated kernels,
- definition, building, and operation of the EVEREST FPGA-based target system for bus-attached and network-attached FPGAs,
- definition of data-oriented policies for efficient allocation, computation, communication, and data protection and their implementation in the EVEREST tools,
- implementation of an MLIR-based compilation flow to converge multiple input languages and unify the hardware generation for multiple target devices,
- definition of a virtualized runtime environment for dynamic autotuning,
- release of eight open data sets through Zenodo,
- dissemination of the project results through the organization of workshops, tutorials, webinars and conferences, scientific publications (6 journal papers, 25 conference papers, and two book chapters), creation of the project website, social media presence, and an identity package (e.g. logo, templates, stickers).
EVEREST developed the EVEREST SDK, a data-centric programming framework for end-to-end data and accelerator management. Wherever possible, the SDK integrates existing tools as a basis, but had to reach far beyond the state-of-the-art to achieve its goals:
- New and existing domain-specific languages (DSLs) are used to express information about computation and data access patterns, which is used to implement fully automated and transparent compilation flows and memory management at compile time and runtime.
- EVEREST leverages and expands MLIR to integrate different DSLs, AI-model representations, and heterogeneous target system architectures. This leads to better reusability, interoperability, and extensibility beyond EVEREST.
- Hardware acceleration is achieved by combining productivity-enhancing high-level synthesis (HLS) solutions for optimizing computation, communication, and storage at both the kernel and system levels. This combined solution enables the creation of energy-efficient, high-performance systems.
- The workflow pipelines of the use cases are described with a novel API, and the resulting runtime infrastructure supports resource management within a virtualized multi-node environment. EVEREST features runtime adaptivity to dynamically select from pre-compiled code variants and hardware configurations based on configurable optimization goals (e.g. performance vs. energy). The runtime infrastructure is enhanced with virtualization support to seamlessly interact with the hardware accelerators concurrently and in an isolated software partition.

Using the SDK, the impact that EVEREST generated includes
- increased productivity and quality of system design and software development thanks to better methods, architectures, and tools for complex systems handling extremely large volumes and streams of data and several FPGA-enhanced nodes,
- significant increase in data throughput thanks to multiple devices and data parallelism,
demonstrated adoption of extreme-scale analysis results and prediction results in decision-making, including AI (in industry and/or society),
- adoption by end-user communities, innovation agendas, and standardization committees.

The broader socio-economic impact of EVEREST will be driven by the people who have researched and learned solutions to the energy efficiency, productivity, and security issues that our society faces when employing heterogeneous systems. This includes 6 PostDocs, 12 PhD students, and numerous master students.
20240603-everest-skd-overview.png
everest.jpg