Application Aware, Life-Cycle Oriented Model-Hardware Co-Design Framework for Sustainable, Energy Efficient ML Systems

Informazioni relative al progetto

SustainML

ID dell’accordo di sovvenzione: 101070408

DOI

10.3030/101070408

Data della firma CE 20 Giugno 2022

Data di avvio 1 Ottobre 2022

Data di completamento 30 Giugno 2026

Finanziato da

Digital, Industry and Space

Costo totale

€ 3 742 855,00

Contributo UE

€ 3 742 855,00

3 742 855,00

Coordinato da

PROYECTOS Y SISTEMAS DE MANTENIMIENTO SL
Spain

Periodic Reporting for period 1 - SustainML (Application Aware, Life-Cycle Oriented Model-Hardware Co-Design Framework for Sustainable, Energy Efficient ML Systems)

Periodo di rendicontazione: 2022-10-01 al 2024-03-31

The SustainML project is dedicated to reducing the environmental impact of machine learning (ML) models by minimizing their carbon and resource footprints. Its goal is to offer multiple pathways to avoid AI-waste from the very early stages of AI life-cycles, making power-aware applications as accessible to develop as standard ML systems.

SustainML envisions a sustainable, interactive ML framework that prioritizes energy efficiency across the entire application lifecycle. Developers can describe their tasks, and the framework will analyze and encode the problem into an abstract functional semantic catalog. It will suggest several ML models, leveraging knowledge transfer and recycling from its collection of neural network functional knowledge cores. Developers can reconfigure models, use optional pre-trained parameters, or design their own models with popular neural network languages.

To reach this aim and address the challenges, the following specific objectives (SO) are investigated:

- SO1: Modeling the requirements of specific ML applications
- SO2: Resource-aware optimization methods based on models from SO1
- SO3: Footprint and AI-waste transparent interactive design assistant
- SO4: Collection of efficient methods and cores as catalogues and libraries
- SO5: Dedicated toolchain implementation and validation

Significant progress was made in parameterizing ML tasks. This includes developing a taxonomy from AI publications and expert brainstorming sessions, creating an abstract structure for task modeling, and incorporating knowledge embedding techniques to parameterize tasks into a semantic space. This work enables the description of ML tasks as graphs of connected nodes, enhancing flexibility and efficiency in task modeling. Additionally, a comprehensive knowledge graph database using RDF and Turtle syntax has been created to store these relationships, facilitating efficient problem solving and knowledge recycling. To help build the knowledge graph, we are also using an LLM in combination with public model hosting sites (e.g. Huggingface). The result will be checked and edited by human experts. Last in providing viable ML solutions, we are using knowledge graph based retrieval augmented generation (Graph-RAG) to avoid hallucination, and provide multiple solution candidates.

Efforts have also been made to identify all the cross-layer optimizations suitable for improving the energy efficiency of the deep neural network (DNN) accelerators. The large design space is explored by the NAS (Neural Architecture Search) gu0ided by optimization objectives targeting both application and hardware requirements.

Within the main achievements, we can present a flexible HLS hardware library of custom hardware architectures, which can facilitate various DNN topologies. Note that first we presented custom hardware architectures for standard One-dimensional Convolutional Neural Networks (1D-CNNs), depth-wise separable 1D-CNNs, and various other DNN layers and components suitable for unidimensional signal processing.

Regarding PIM solutions, we also conducted research to identify the specific ML tasks or DNN layers that can be offloaded to UPMEM PIM to improve energy efficiency, and identified the implementation challenges of UPMEM PIM for these workloads. We have now established an automated flow to integrate new and/or configurable instructions into our current DPU processors to easily extend and adapt the DPU design for DNN acceleration. We have also implemented a solution on FPGA to increase the computing capacity of UPMEM’s PIM DRAMs by moving some operations (typically MAC operations) from the DPU to the SAs. This implementation is now fully operational.

We also addressed the resource costs associated with the training phase of machine learning (ML) model development. We introduce three novel methodologies that significantly enhance the training efficiency of ML models by optimizing the number of training examples required, minimizing the necessity for labeled data, and reducing memory consumption. These optimizations subsequently impact computational demand, energy usage, and the carbon footprint of training ML models.

In the field of HCI, we have conducted qualitative studies to better understand the awareness of ML and human-computer interaction (HCI) experts on their impact on sustainability, as well as their existing workflows. We also presented a framework that structures the different intersections between sustainability with ML and HCI and describes the resulting research areas based on recent work.

Finally, we have presented a design of the SustainML framework that integrates the results of the project. We have also carried out the development of the SustainML library that will be used by the consortium partners to integrate the different modules of the framework. In addition, we have created a first proof of concept of the front-end of the framework, so that both the project partners and the early adopters of the SustainML framework can test and validate it.

Up to the current status of the project, there have been 17 scientific publications (and 4 under review) within the scope of the SustainML project, which demonstrates the great progress and scientific impact of the project. Moreover, the necessary code and documentation infrastructure is in place to begin the integration of the project results and have reliable and feasible results during the second reporting period.

Some model optimization methodologies detailed in SustainML project deliverables and scientific publications contribute substantially to the field of Sustainable ML by addressing critical resource constraints in the training cycle. These approaches not only enhance training efficiency but also reduce the environmental impact of ML model development. Each method has been rigorously evaluated and disseminated through reputable conferences and publications, underscoring their significance and potential for broader application in the ML community.

We are also developing an ultra power efficient AI LLM chip, thanks to UPMEM’s proven on-device DRAM scheduling, that will also offer significant performance enhancements with up to 32 TOPs for a single chip. To date, we have developed a simulator to evaluate its performance. This simulator allows us to compare the performance of different hardware in terms of execution time, energy consumption and power consumption for different workloads, which plays a crucial role in the SustainML framework.

Currently there is a first test version of the SustainML framework that features the segmentation of the ML problem into tasks and the subsequent processing of these tasks for the provision of an optimal ML model. However, these preliminary results do not have the massive database of ML models, source code, and HW that is intended to be achieved throughout the project, but it serves as a taste of the capabilities of this framework to provide energy-efficient ML methodologies to every AI researchers and developer during the AI application entire life-cycle.

SustainML Framework Architecture

SustainML Logo

Periodic Reporting for period 1 - SustainML (Application Aware, Life-Cycle Oriented Model-Hardware Co-Design Framework for Sustainable, Energy Efficient ML Systems)

Condividi questa pagina Condividi questa pagina sui social network

Scarica Scarica il contenuto della pagina