OPRECOMP work-plan spanned 5 different directions.
1) Transprecision-boosted applications.
OPRECOMP identified twelve micro-benchmarks, in three major areas of computing. For these micro-benchmarks, OPRECOMP has developed a reference baseline scalar, parallel, and GPU implementations. Time, power, and energy costs w.r.t. different workload (e.g. varying input size) have been measured, to characterize current state-of-the-art systems. A first set of transprecision algorithms to accelerate and reduce energy-cost of the micro-benchmarks have been developed. More in detail, at present time OPRECOMP demonstrated considerable acceleration in PageRank, BLSTM, CG, SpMV, SVD and others.
2) Establishment of a full transprecision framework for computing.
OPRECOMP has established the basic ground for the theoretical and experimental (quality metrics) analysis, of the effects of transprecision on the micro-benchmarks identified in the project. Tools to emulate effect of transprecision (accuracy and error bound) through an intuitive software framework have been developed. OPRECOMP also explored application characteristics (including automated precision tuning tool), programming model and initial version of transprecision compiler to design and build a transprecision software stack.
3) Sustainable HPC to Exascale and beyond.
OPRECOMP is building kw-demonstrator for transprecision computing. The project has developed a testing environment attaching PULP to an OpenPOWER-based system through CAPI. OPRECOMP has developed the appropriate library for establishing this connection, alongside sample applications, which form the baseline templates for the porting of OPRECOMP's micro-benchmarks. For early prototyping and debugging, an emulator of the kw-system by coupling the PULP virtual platform to an OPENPOWER-based system has been developed. On the PULP side, OPRECOMP has developed new functional units, processing elements and memory hierarchy structures that exploit transprecision characteristics.
4) Energy-neutral near-sensor processing.
OPRECOMP has been actively working on two IoT platforms (PULP and GAP8). These platforms already include some early transprecision support and will be made available to the full OPRECOMP consortium to develop and test benchmark applications. The project has also worked on alternative short bit-width floating point representations with 8 and 16-bit, and these have already been implemented and benchmarked in hardware. A further improvement has been a complete floating-point unit that provides support for not only the basic ADD and MUL instructions but also DIV and SQRT units.
5) Pathfinding for disruptive technologies.
First investigations in the direction of transprecision memories, for example approximate DRAM and variation analysis of Resistive RAMs, have been carried out. OPRECOMP explored DRAM’s power down modes in a full-system simulator and quantified their impact, which is critical for all kind of DRAM subsystems. The project also investigated the refresh penalty of DRAMs in two flavors and used the insights of vendor-specific DRAM architectures to optimize the error correction capabilities.