Skip to main content

NEUral computing aRchitectures in Advanced Monolithic 3D-VLSI nano-technologies

Periodic Reporting for period 1 - NeuRAM3 (NEUral computing aRchitectures in Advanced Monolithic 3D-VLSI nano-technologies)

Reporting period: 2016-01-01 to 2017-06-30

Over the last 50 years, our societies have seen a continuous revolution in the way our life is influenced by Information and Communication Technologies (ITCs). This revolution is underpinned by machines based on the von Neumann architecture for data processing, and by the remarkable matched progress of Very Large Scale Integration (VLSI) technology, that allowed to map this computational architecture onto an adequate electronic computing substrate.
This combination is not enough today. We can no longer solely rely on the exponential increase in digital transistor density in VLSI technology to solve the technological challenges triggered by the evolution of the society. With billions of interconnected devices generating very large amounts of unstructured data it is not possible anymore to scale the computational power of classical von Neumann based computing systems by mere brute force approaches.
The objective of this project is to define the technology that will match the new requirements for data processing, hence to develop a new generation of computing architectures that are co-designed from the bottom up, combining theory of computing, neuromorphic processor architecture design, nano-technology, and 3D VLSI integration efforts.
Specifically, we propose to fabricate a chip implementing a neuromorphic architecture that supports state-of-the-art machine learning algorithms and spike-based learning mechanisms. With respect to its physical architecture this chip will feature:
1) an ultra low power, scalable and highly configurable neural architecture; delivering a gain of a factor 50x in power consumption on selected applications compared to conventional digital solutions;
2) fabricated in a monolithically integrated 3D technology in Fully-Depleted Silicon on Insulator (FDSOI) at 28nm design rules with integrated Resistive Random Access Memory (RRAM) synaptic elements;
We will complete this vision and develop complementary technologies that will allow to address the full spectrum of applications from mobile/autonomous objects to high performance computing co-processing, by realizing:
a) a technology to implement on-chip learning, using native adaptive characteristics of electronic synaptic elements;
b) a scalable ultra-low energy platform based on a novel segmented bus architecture exploiting TFT device technology to interconnect multiple neuromorphic processor chips to build large neural processing systems.
The neuromorphic computing system will be developed jointly with advanced neural algorithms and computational architectures for online adaptation, learning, and high-throughput on-line signal processing, delivering
1) an ultra-low power massively parallel non von Neumann computing platform with non-volatile nano-scale devices that support on-line learning mechanisms
2) a programming toolbox of algorithms and data structures tailored to the specific constraints and opportunities of the physical architecture;
3) an array of fundamental application demonstrations instantiating the basic classes of signal processing tasks.
Our approach will provide a scalable solution that can be used in multiple domains from small ultra-low power data processing coupled to sensors in autonomous systems (Internet of Things) to energy efficient large data processing in servers and networks.
These ambitious objectives leverage the work pursued in a number of other projects in the FP7, H2020 and ECSEL frameworks. The project focuses then on specific building blocks and algorithmic solutions lacking in concurrent projects, and will allow a first EU implementation of a neural chip in
an advanced technology with a wide spectrum of practical uses.
All major objectives of the first 18 months have been attained on time:
1) The tape out of the first FDSOI 28nm version of the neuromorphic chip (without RRAM) was taped out in time (D2.2) and the silicon has been completed just at the end of the reporting period (D4.2). It should be delivered packaged by the review meeting and, if no issues appear on testing, this will allow carrying out most of the work of WP5 related to architectural and algorithmic studies
2) The large review of the theoretical bases for new network approaches was finally completed by JacobsUni (D2.1)
3) The TFT technology was validated (D2.6 and D3.2) and the interconnect work is progressing
4) Individual design and technologies modules for RRAM and 3D integration have been validated (D2.3 D2.4 D3.1 D3.3) and development is on schedule.
UZH successfully completed the design and the tape-out of a mixed-signal analog/digital multi-core spiking neural network physical architecture with configurable routing tables, for implementing arbitrary neural network computational architectures in standard FDSOI 28nm technology. This design was based on the bulk 180nm CMOS VLSI devices that was already designed and tested at UZH and has been named Dynap-Sel. CEA provided support for the ST design kit and the interface with the CMP organisation for the logistic of the transfer to fabrication.
Each Dynap-sel chip has 4 programmable TCAM cores and 1 plastic core. Specifically, each TCAM core has 16x16 units, which contains one neuron and one synapse block. The synapse block comprises a linear integrator circuit -DPI which integrates input events from 64 16-bit (11-bit TCAM + 5bit SRAM) programmable TCAM cells. The plastic core has 64 neurons with 2 crossbar synapses arrays, with one plastic and another configurable.
The chip employs AER protocol to route spikes among neurons within a core, across cores, and across chip boundaries; the neuromorphic analog neuron and synapse circuits implement biophysically realistic temporal dynamics using log-domain DPI filters and adaptive exponential I&F neuron circuits. Output events generated by the neurons can be routed to the same core, via a Level-1 router, to other cores on the same chip, via a Level-2 router, or to cores on different chips, via a Level-3 router. The memory used by the routers to store post synaptic destination addresses is implemented using 8.5k 18-bit SRAM blocks distributed among the Level-1, 2, and 3 router circuits. Thanks to the scalable architecture and to the on chip programmable routers, a multi-chip system (for example 4x4) can be easily achieved to implement a wide range of connections schemes, without requiring external mapping, memory, or computing support.
Estimated static power consumption of main blocks and total power consumption when all neurons have mean firing rate of 100Hz with each spike event broadcasted to 4 cores with 25% connectivity per event lead to an overall power consumption estimation (without IOs) of 168uW that would be well beyond the state of the art for spiking circuits.
Layout and characteristics of the 28nm FDSOI multicore spiking circuit