Periodic Reporting for period 1 - SustainML (Application Aware, Life-Cycle Oriented Model-Hardware Co-Design Framework for Sustainable, Energy Efficient ML Systems)
Periodo di rendicontazione: 2022-10-01 al 2024-03-31
SustainML envisions a sustainable, interactive ML framework that prioritizes energy efficiency across the entire application lifecycle. Developers can describe their tasks, and the framework will analyze and encode the problem into an abstract functional semantic catalog. It will suggest several ML models, leveraging knowledge transfer and recycling from its collection of neural network functional knowledge cores. Developers can reconfigure models, use optional pre-trained parameters, or design their own models with popular neural network languages.
To reach this aim and address the challenges, the following specific objectives (SO) are investigated:
- SO1: Modeling the requirements of specific ML applications
- SO2: Resource-aware optimization methods based on models from SO1
- SO3: Footprint and AI-waste transparent interactive design assistant
- SO4: Collection of efficient methods and cores as catalogues and libraries
- SO5: Dedicated toolchain implementation and validation
Efforts have also been made to identify all the cross-layer optimizations suitable for improving the energy efficiency of the deep neural network (DNN) accelerators. The large design space is explored by the NAS (Neural Architecture Search) gu0ided by optimization objectives targeting both application and hardware requirements.
Within the main achievements, we can present a flexible HLS hardware library of custom hardware architectures, which can facilitate various DNN topologies. Note that first we presented custom hardware architectures for standard One-dimensional Convolutional Neural Networks (1D-CNNs), depth-wise separable 1D-CNNs, and various other DNN layers and components suitable for unidimensional signal processing.
Regarding PIM solutions, we also conducted research to identify the specific ML tasks or DNN layers that can be offloaded to UPMEM PIM to improve energy efficiency, and identified the implementation challenges of UPMEM PIM for these workloads. We have now established an automated flow to integrate new and/or configurable instructions into our current DPU processors to easily extend and adapt the DPU design for DNN acceleration. We have also implemented a solution on FPGA to increase the computing capacity of UPMEM’s PIM DRAMs by moving some operations (typically MAC operations) from the DPU to the SAs. This implementation is now fully operational.
We also addressed the resource costs associated with the training phase of machine learning (ML) model development. We introduce three novel methodologies that significantly enhance the training efficiency of ML models by optimizing the number of training examples required, minimizing the necessity for labeled data, and reducing memory consumption. These optimizations subsequently impact computational demand, energy usage, and the carbon footprint of training ML models.
In the field of HCI, we have conducted qualitative studies to better understand the awareness of ML and human-computer interaction (HCI) experts on their impact on sustainability, as well as their existing workflows. We also presented a framework that structures the different intersections between sustainability with ML and HCI and describes the resulting research areas based on recent work.
Finally, we have presented a design of the SustainML framework that integrates the results of the project. We have also carried out the development of the SustainML library that will be used by the consortium partners to integrate the different modules of the framework. In addition, we have created a first proof of concept of the front-end of the framework, so that both the project partners and the early adopters of the SustainML framework can test and validate it.
Some model optimization methodologies detailed in SustainML project deliverables and scientific publications contribute substantially to the field of Sustainable ML by addressing critical resource constraints in the training cycle. These approaches not only enhance training efficiency but also reduce the environmental impact of ML model development. Each method has been rigorously evaluated and disseminated through reputable conferences and publications, underscoring their significance and potential for broader application in the ML community.
We are also developing an ultra power efficient AI LLM chip, thanks to UPMEM’s proven on-device DRAM scheduling, that will also offer significant performance enhancements with up to 32 TOPs for a single chip. To date, we have developed a simulator to evaluate its performance. This simulator allows us to compare the performance of different hardware in terms of execution time, energy consumption and power consumption for different workloads, which plays a crucial role in the SustainML framework.
Currently there is a first test version of the SustainML framework that features the segmentation of the ML problem into tasks and the subsequent processing of these tasks for the provision of an optimal ML model. However, these preliminary results do not have the massive database of ML models, source code, and HW that is intended to be achieved throughout the project, but it serves as a taste of the capabilities of this framework to provide energy-efficient ML methodologies to every AI researchers and developer during the AI application entire life-cycle.