Seamless design of smart edge processors

Informazioni relative al progetto

CONVOLVE

ID dell’accordo di sovvenzione: 101070374

DOI

10.3030/101070374

Data della firma CE 23 Giugno 2022

Data di avvio 1 Novembre 2022

Data di completamento 30 Aprile 2026

Finanziato da

Digital, Industry and Space

Costo totale

€ 8 418 127,50

Contributo UE

€ 8 416 876,50

8 416 876,50

1 251,00

Coordinato da

TECHNISCHE UNIVERSITEIT EINDHOVEN
Netherlands

Periodic Reporting for period 1 - CONVOLVE (Seamless design of smart edge processors)

Periodo di rendicontazione: 2022-11-01 al 2024-04-30

As the world braces for smart applications powered by Artificial Intelligence (AI) in almost every edge device, there is an urgent need for an ultra-low-power (ULP) edge AI Smart Edge Processor (SEP). The SEP hardware market is expected to grow about 40% per year, beyond 70 Billion USD by 2026. SEP hardware needs to support high throughput, reliable, and secure AI processing at ultra-low power, with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well positioned to lead in this SEP market for edge devices. However, this can only be realized when the EU can make AI edge processing at least 100 times more energy-efficient, while offering sufficient flexibility and scalability to deal with AI as a fast-moving target. The CONVOLVE project addresses these roadblocks and thereby enables EU leadership in Edge-AI by making SEPs more efficient, while ensuring security by design that protects the data and the privacy of the European society. It takes a holistic approach with innovations at all design stack levels. CONVOLVE project is led by Eindhoven University of Technology and the consortium includes 18 partners out of 8 countries covering entire European continent and a good mixture of academic partners, large enterprises, and SMEs.

Below, we summarize our progress towards achieving the key objectives.

Objective 1: Achieve 100x improvement in energy efficiency.
To achieve 100x improvement in energy-efficiently, we have developed an Ultra-Low-Power (ULP) library with novel architectural and micro-architectural accelerator building blocks, in short ULP blocks, having common or standard interfaces, and optimized at micro-architecture, circuit, and device levels. The developed ULP blocks uses different architectural paradigms, such as Compute-in-Memory (CIM), Compute Near Memory (CNM), and Coarse-Grained Reconfigurable Arrays (CGRA), with each of them tailored to benefit a specific use-case to achieve the highest energy-efficiency. So far we have successfully demonstrated the functional operation of the ULP accelerators on simulation-level and currently we are working towards a hardware prototype for energy-efficiency measurements.

Objective 2: Reduce design time by 10x.
To reduce the design time by 10x we developed 1) A Transparent and compositional programming flow and 2) A compositional architecture Design-Space Exploration (DSE) and SoC generation tool flow. We have built the front-end compilation chain, targeting both MLIR and LLVM IR. At the code generation front, our domain-specific compiler lowers operations to custom RISC-V ISA accelerator extensions. We have also contributed to the IREE OSS ML compiler framework, having developed an ONNX importer that converts ONNX modules to Torch MLIR for IREE compilation. We have introduced a scalable cache model for ML (affine-heavy) programs, significantly reducing analysis time and enabling efficient updates after program modifications. For secure compilation, we have facilitated peephole rewrite validation at the intermediate representation level by automating integration with interactive theorem provers. For the automated DSE and SoC generation, we have developed a multi-accelerator architecture simulator, Stream, to analytically study optimal SoC architectures.

Objective 3: Provide hardware security.
We identified relevant attack scenarios by including side-channel attacks based on physical access to the device, mutually distrustful applications on a single device and even powerful attackers with access to large-scale quantum computers. To account for this wide range of attack scenarios, we were working on different interrelated frontiers: Trusted Execution Environments (TEE), Post-quantum cryptography (PQC), and security of compute-in-memory (CIM). We have a working prototype of TEE based on Keystone and the Rocket core. We have conducted a careful exploration of PQC schemes to determine the best fit for the specific needs of Convolve. On the CIM frontier, we are able to extract neural network weights from the crossbars via power side-channels and are working on effective countermeasures.

Objective 4: Enable smart edge applications.
We have laid the groundwork for edge application use-case by defining requirements and benchmarks for smart edge processors. Initial point demos were prepared to guide target adjustments, supporting technical work package topics with application-focused approach. Key achievements include delivering application code bases, and pushing the state-of-the-art research on quantization, efficient deployment of neural networks for resource efficient speech quality prediction, acoustic scene analysis and image processing applications.

Some of the CONVOLVE key achievements so far include:
> Energy-efficient ULP accelerator blocks with state-of-the-art performance. Our SNN accelerator has an energy efficiency of 0.29 pJ/Synaptic operation, the SRAM based CIM accelerator 111.3 TOPS/W, and RRAM based CIM a massive 2 POPS/W.
> Security mechanisms, such as TEE, PQC and security of compute-in-memory (CIM) were prototyped.
> Multi-core AI accelerator modelling tool for design space exploration, called Stream.
> Extended ZigZag tool to model compute-in-memory architectures.
> Standard interface for accelerator integration called the SNAX framework, to easily allow the integration of multiple accelerators in a common processing system.
> Optimized compute efficiency in traditional deep neural networks by limiting operand precision (quantization) and limiting connectivity between neurons (pruning), all with minimal loss of performance and classification accuracy.
> Control flow techniques (called Dynamic Neural Networks) that can detect when sufficient processing has been carried out on an input, avoiding unneeded computation.
> New, more efficient learning rules were proposed to replace the costly backdrop algorithm.
> Early exiting speech enhancement network that provides several levels of accuracy and resource savings by halting computations at different stages.
> Tailored mixed-precision sub-8-bit quantization scheme using genetic algorithms: A modular integer quantization scheme for gates recurrent units has been proposed, where the bit width of each operator can be selected independently.

Furthermore, publications have been made for state-of-the-art research on quantization, efficient deployment of neural networks for resource efficient speech quality prediction, acoustic scene analysis and image processing. Additionally, student supervision and dissemination efforts such as TinyML and Danish Digitalization, Data Science, DATE, DAC, HiPEAC and AI 2.0 amplify the project’s societal impact, ensuring a robust foundation for advancing Europe’s edge computing capabilities. CONVOLVE partners have made 22 publications in various journals and conference proceedings and produced 4 open-source tools.

The project has fostered collaborations between academic institutions, industries, and SMEs, by hosting or participating in various events, spanning meetings, talks at conferences, educational and training session, and workshops. These events contribute to the ongoing aim of educating and training experts across various levels, strengthening EU's human capital in research and innovation for ULP processors and edge AI.

CONVOLVE vision

Periodic Reporting for period 1 - CONVOLVE (Seamless design of smart edge processors)

Condividi questa pagina Condividi questa pagina sui social network

Scarica Scarica il contenuto della pagina