Some of the most relevant results of this project include:
The design of an accelerator for neural networks that includes novel techniques to reduce energy consumption, such as computation reuse, pruning of neurons and connections, dynamic selection of the precision used in calculations, increasing locality in memory accesses, a new workload scheduling mechanism for recurrent neural networks, and a novel data encoding and approximate computing scheme for binary neural networks.
The design of a “system-on-chip” that includes a general-purpose processor and various accelerators for automatic speech recognition, and achieves real time with very low energy consumption.
The design of a new unit to improve the performance of graphics processors for graph algorithms by reordering, merging and filtering out redundant memory accesses and related activity.
A microarchitecture for graphics processors based on exploiting coherence between successive frames to reduce computations and substantially improve their energy efficiency, as well as a new organization of its memory hierarchy to better exploit locality in accesses, and a new approach to render multiple tiles in parallel.
A detailed characterization of the performance and energy consumption of computing systems for autonomous vehicles and the proposal of an accelerator to optimize one of its main bottlenecks, simultaneous localization and mapping.
A programmable accelerator for automatic speech recognition targeted to edge devices that can be easily adapted to implement alternative/future models while providing high performance and low energy consumption.
A novel high-performance and energy-efficient architecture extension to exploit Sliding Window Processing in conventional CPU cores, and its detailed evaluation for autonomous driving workloads.
A new approach to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss.
A new near-data processing architecture that leverages a 3D-stacked memory for weight storage and computation that takes advantage of a logarithmic quantization of activations to reduce memory access overheads.
A ReRAM-based accelerator for DNNs that leverages dynamic quantization and smart scheduling of tasks for energy-efficiency and a novel approximate computing technique to extend the lifespan of the accelerator.
An improved simulator for GPGPUs that enhances its accuracy and speed.
A novel core microarchitecture for GPGUS that includes a simple out-of-order execution approach, a novel control-flow management scheme and an energy-efficient register file caching mechanism.
New ISA extensions for the vector processing unit of CPUs and a novel data compression scheme that optimize neighbor search in cloud processing tasks, commonly used in computer vision applications.
An innovative approach to allow the cooperation of the front-end and back-end of pipeline of continuous vision systems to improve their performance and energy-efficiency.
A processing-using-memory architecture based on lookup tables to efficiently execute SIMD operations by supporting independent column accesses within each mat of a DRAM subarray.
The main results of this project have been published in the top publication venues of the area of computer architecture, such as ISCA, HPCA and MICRO symposia, and a number of IEEE and ACM journals. We are in touch with several companies interested in the exploitation of some of these results, especially in the area of speech recognition accelerators, DNN accelerators, GPU architectures and autonomous driving hardware platforms.