Periodic Reporting for period 2 - MNEMOSENE (Computation-in-memory architecture based on resistive devices)
Okres sprawozdawczy: 2019-07-01 do 2021-06-30
• Objective 1: Develop new algorithmic solutions for targeted applications for CIM architecture.
• Objective 2: Develop and design new mapping methods integrated in a framework for efficient compilation of the new algorithms into CIM macro-level operations; each of these is mapped to a group of CIM tiles.
• Objective 3: Develop a macro-architecture based on the integration of group of CIM tiles, including the overall scheduling of the macro-level operation, data accesses, inter-tile communication, the partitioning of the crossbar
• Objective 4: Develop and demonstrate the micro-architecture level of CIM tiles and their models, including primitive logic and arithmetic operators, the mapping of such operators on the crossbar, different circuit choices and the associated design trade-offs.
• Objective 5: Design a simulator (based on calibrated models of memristor devices & building blocks) and FPGA emulator for the new architecture (CIM device combined with conventional CPU) in order demonstrate its superiority.
Six applications were selected and investigated in order to derive the operations (kernels) which could be performed in CIM core; hence not only reducing the communication between CPU and memory, but potentially also reducing the energy consumption and improving the overall performance. Applications were selected from three domains (i.e. Data analytics, signal processing, and machine learning). Simulations and comparative studies performed have shown that depending on the selected applications, 5-10x energy benefits are estimated compared with conventional digital CMOS ASICs, and 10-1000x energy benefits are estimated compared with traditional general-purpose processors.
Objective 2
The compiler makes use of the CIM architectures developed in WP2 and the applications identified in WP1. First, we built a micro-instruction compiler for the TTA based CIM micro-architecture developed in WP2. The micro-compiler can offload the user annotated CIM micro/nano-operations to CIM units integrated to CGRA/TTA architectures, taking into consideration of computation and communication resources in the architecture, and available parallelism in the user program. Second, we designed a macro-programming interface by following a top-down approach to support the programming of multiple CIM units at system level. Third, we created a full end-to-end compilation flow from the high-level representation to code that executes on the macro CIM architecture proposed in WP3 and demonstrated the flow on the full-system simulators developed in WP5 with the CIM applications identified in WP1.
Objective 3
First, an initial macro CIM architecture was designed. Based on the CIM tile (micro-architecture level) developed in WP4, a multi-tile CIM accelerator (i.e. macro Architecture) and its instruction set ISA were developed. This CIM was integrated as a functional unit into TTA, a data path aware architecture. Second, integration of the CIM core (based on CGRC) in computing cluster has taken place. Here, we made use of multi-core PULP system (developed by ETHZ) as a computing cluster. Efficient communication fabric for tight integration of CIM core as an in-memory accelerator have been developed as part of this work. Third, the key kernels derived from selected applications in WP1 have been mapped on CIM macro architecture. Work concentrated on Hyper-Dimensional (HD) computing (see D1.2) that make intensive use of small instructions, on ultra-wide words (e.g. bit wise XOR). HD computing is essentially about manipulating and comparing large patterns (e.g. hypervectors of 10,000 bits) stored in memory which works to the strength of CIM accelerator tiles. Our results indicate up to 60x improvement in peak throughput and 40x improvement in energy efficiency for GEMM type workloads.
Objective 4
First, small crossbars were characterised and models were developed. Second, memristor based primitive logic and arithmetic circuits were designed and validated using SPICE simulation. Third, a complete (first version of) CIM tile micro-architecture was developed and designed (using C code), including ISA (instruction set architecture) and a compile being able to translate the macro-instruction defined in WP3 to a nano-instructions that can be understood by the CIM tile. Fourth, on the lowest hardware level, the impact of the memristor array architecture and ADC design choices on the performance of MAC operations for ReRAM based memristor devices was investigated. Next, CIM tiles, based on an STT-RAM memristor array, were optimized for performing binary logic operations as well as for performing Matrix-Matrix Multiplication (MMM) operations.
Objective 5
The framework supporting the characterization and study of the presented CIM architectures was materialized in the refined versions of the simulator (D5.6) and emulator (D5.8) packages. These key milestones highlight the quality of the knowledge generated in all the technical work packages and gather in a coherent simulation/emulation platform an end-to-end approach for benchmarking the applications – studied in WP1 — that using the compilers provided by WP2 were accelerated by the proposed architecture – WP3, noting that the modelling of the CIM submodules was accurately calibrated against the WP4 engineered circuits.
To ensure impacts beyond the project timeframe, the MNEMOSENE partners have initiated several national and EU project proposals - involving additional industrial partners - to follow up on the interesting results realised in this project.
Expected Impact 2. Helping to double economic value of semiconductor component production in Europe within 10 years
MNEMOSENE has been a cutting-edge, high-gain research project that has delivered scientific and experimental foundations for a fundamental computing paradigm using radically new concepts and theories. The technological impact of the project will greatly influence the development and design of new energy efficient computing engines. The demonstration of energy efficiencies of the order of fJ/operation indicates clearly the huge potential of CIM technologies not only to contribute to ultra-low power computing engines, but also to enable the deployment of computing at the edge, where the data is being generated.
Expected Impact 3. Impact on technology and innovation
MNEMOSENE has delivered and demonstrated scientific and experimental foundations of a fundamental new computing paradigm using a radically new concept of tightly integrated computation-in-memory (CIM).
Expected Impact 4. Impact on future leadership
MNEMOSENE has significantly contributed to Europe taking leadership in new and emerging technology areas that promise to renew the basis for European competitiveness and growth.