Skip to main content
Ir a la página de inicio de la Comisión Europea (se abrirá en una nueva ventana)
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

CoCoUnit: An Energy-Efficient Processing Unit for Cognitive Computing

Periodic Reporting for period 4 - CoCoUnit (CoCoUnit: An Energy-Efficient Processing Unit for Cognitive Computing)

Período documentado: 2024-03-01 hasta 2025-08-31

There is a fast-growing interest in extending the capabilities of many systems around us with cognitive functions such as speech recognition, machine translation, speech synthesis, image classification or object recognition, that will replace, extend and/or enhance human tasks in all kind of environments (work, entertainment, transportation, health care, etc.). The CoCoUnit project has investigated the design of new computing system architectures that are highly energy-efficient, especially for those systems that make intensive use of these cognitive functionalities.

We have followed a disruptive approach by researching unconventional architectures that dramatically improve energy efficiency while delivering substantial performance gains. These platforms use various types of units specialized for certain domains, and we place special emphasis on brain-inspired architectures (e.g. neural networks) and graphics processors due to their potential to exploit massive parallelism and their high energy efficiency. We have proposed extensions to existing architectures combined with novel accelerators and functional units.

The end result of this project has been the design of novel platforms that provide new experiences to users in the areas of cognitive computing and computational intelligence on mobile devices, embedded systems, servers and data centers. These new user experiences have been made possible by innovative architectures that provide dramatic benefits in terms of energy efficiency, and are exposed to the programmer by means of minimalist, programmer-friendly extensions.
Some of the most relevant results of this project include:

The design of an accelerator for neural networks that includes novel techniques to reduce energy consumption, such as computation reuse, pruning of neurons and connections, dynamic selection of the precision used in calculations, increasing locality in memory accesses, a new workload scheduling mechanism for recurrent neural networks, and a novel data encoding and approximate computing scheme for binary neural networks.

The design of a “system-on-chip” that includes a general-purpose processor and various accelerators for automatic speech recognition, and achieves real time with very low energy consumption.

The design of a new unit to improve the performance of graphics processors for graph algorithms by reordering, merging and filtering out redundant memory accesses and related activity.

A microarchitecture for graphics processors based on exploiting coherence between successive frames to reduce computations and substantially improve their energy efficiency, as well as a new organization of its memory hierarchy to better exploit locality in accesses, and a new approach to render multiple tiles in parallel.

A detailed characterization of the performance and energy consumption of computing systems for autonomous vehicles and the proposal of an accelerator to optimize one of its main bottlenecks, simultaneous localization and mapping.

A programmable accelerator for automatic speech recognition targeted to edge devices that can be easily adapted to implement alternative/future models while providing high performance and low energy consumption.

A novel high-performance and energy-efficient architecture extension to exploit Sliding Window Processing in conventional CPU cores, and its detailed evaluation for autonomous driving workloads.

A new approach to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss.

A new near-data processing architecture that leverages a 3D-stacked memory for weight storage and computation that takes advantage of a logarithmic quantization of activations to reduce memory access overheads.

A ReRAM-based accelerator for DNNs that leverages dynamic quantization and smart scheduling of tasks for energy-efficiency and a novel approximate computing technique to extend the lifespan of the accelerator.

An improved simulator for GPGPUs that enhances its accuracy and speed.

A novel core microarchitecture for GPGUS that includes a simple out-of-order execution approach, a novel control-flow management scheme and an energy-efficient register file caching mechanism.

New ISA extensions for the vector processing unit of CPUs and a novel data compression scheme that optimize neighbor search in cloud processing tasks, commonly used in computer vision applications.

An innovative approach to allow the cooperation of the front-end and back-end of pipeline of continuous vision systems to improve their performance and energy-efficiency.

A processing-using-memory architecture based on lookup tables to efficiently execute SIMD operations by supporting independent column accesses within each mat of a DRAM subarray.

The main results of this project have been published in the top publication venues of the area of computer architecture, such as ISCA, HPCA and MICRO symposia, and a number of IEEE and ACM journals. We are in touch with several companies interested in the exploitation of some of these results, especially in the area of speech recognition accelerators, DNN accelerators, GPU architectures and autonomous driving hardware platforms.
This project advances the state-of-the-art in a number of ways.

(1) We are the first to identify the potential of reusing computations in DNNs in a number of innovative ways to avoid ineffectual activity. We also propose a novel dynamic adaptive quantization scheme for RNNs to reduce compute and memory activity.

(2) We debunk conventional neuron pruning schemes by showing that they behave close to a random policy, where the only parameter that matters is the degree of pruning, and propose a highly effective pruning scheme that overcomes the huge overhead of traditional schemes.

(3) We take a disruptive approach to deal with the extremely low efficiency of graphics processors for graph-based algorithms based on an additional programmable unit that optimizes the memory locality, and identifies and removes redundant activity.

(4) We leverage the temporal coherence existing in consecutive frames of graphics workload to devise several mechanisms that reduce the number of computations without compromising the quality of the rendered image.

(5) We have proposed a novel platform for automatic speech recognition that achieves human quality and can be deployed on edge devices that have very stringent power and cost budgets. The proposed platform leverages several accelerators and at the same time is programable with a simple API, which makes it suitable for a variety of current and future algorithms.

(6) We have designed a novel CPU microarchitecture and associated ISA extensions to optimize the processing of sliding windows, a very common programming approach in autonomous driving and many other applications that make use of image processing.
(7) We have developed a number of innovative architectures that leverage near data processing technologies to improve the energy efficiency of cognitive computing systems.

(8) We have designed innovative architecture extensions and accelerators for autonomous driving systems.

(9) We have proposed an innovative approach to exploit synergies between the frontend and backend of the computer vision pipeline of cognitive computing platforms to augment its efficiency.
Website
Accelerator
Mi folleto 0 0