Periodic Reporting for period 1 - TPANN (Tensor Processing on FPGAs for Artificial Neural Networks)
Reporting period: 2017-05-01 to 2018-04-30
The successful quantization of neural network inference is highly relevant as it allows to simplify the backing arithmetic. The platforms that are able to extract benefit from every single saved bit are programmable hardware devices as made by Xilinx. These reprogrammable physical electrical circuits are able to translate simpler operations directly into greater operational density and concurrency. Thus, quantization allows small, power-efficient devices to deploy capable neural networks. Their use becomes a green option and is enabled in more difficult application environments as in embedded or remote contexts or in cyberphysical systems.
The goal of TPANN was the rigorous optimization of the neural network inference on programmable hardware devices. Particularly in the ubiquitous convolutional networks, it is the computation of a vast number of dot products that poses a critical challenge. His strong background in digital design and specialization in computer arithmetic of the fellow, Thomas Preußer, was key in this effort. One illustrative result of the work was the development of an object detection demo working on a live video stream running on a small embedded heterogeneous all-programmable device. The work also yielded two invention disclosures that are currently undergoing internal patent review.
The conducted work included the design and the characterization of highly-optimized arithmetic kernels for various low-precision quantization schemes on different abstraction levels. These results allowed the team to identify the points in the system design space that offered the most interesting trade-offs between the hardware resource investment and the achieved network accuracy.
The TPANN project produced a host of reference implementations particularly for the dot-product computation that is so key for the convolution. The diversity of implementations was motivated by the desire to fully exploit all the capabilities of Xilinx' modern all-programmable devices and by different application goals. A software solution leveraging the NEON vector extension of the ARM CPU cores on Zynq devices helped to conquer the computational challenge of those layers in an object detection network that were sensitive to a quantization below 8 bits. This implementation became part of a demo for the object detection in live video using a network derived from the TinyYOLO topology. This demo was first publicly presented by Thomas at the FPL conference in 2017 and went viral within the company thereafter and has been since used for conferences and job fairs alike. Another refurbishment was undertaken on the convolutional layer library backing the FINN framework and published open-source as part of the BNN-PYNQ project. These high-level C++ models implemented for the high-level synthesis flow were not further optimized for performance but made more flexible. The templated rewrite of the processing engine now allows the full customization of the operand quantization. This provides a quick way to explore and analyze the design space of an application and derive a balanced implementation easily. Finally, systolic dot-product engines were designed and evaluated on the very low structural level of the programmable hardware fabric. Thomas has developed new techniques to map quantized multi-bit operations efficiently to the DSP compute cores found on Xilinx devices. The produced reference implementation was adopted in the development processes of two Xilinx products. Two related invention disclosures are currently reviewed internally to decide between patent filing or immediate publication.
Beyond the technical dimension, Thomas was able to develop his leadership skills in the course of the project. He supervised interns and coordinated publications and invention disclosures as main author. He also represented Xilinx with invited talks at three conferences (FPL'17, FPGA'18, and DATE'18) and supported Xilinx outreach activities by attending the company booth at FPL'17 and by presenting to transition-year students at during an in-house orientation week in spring 2018. Finally, he engaged in non-technical dissemination activities promoting the Marie-Skłodowska-Curie instruments at an event organized by the Graduate Academy of TU Dresden.
The results of TPANN have been exploited (a) for building an object detection demo on live video that was first shown by Thomas at FPL'17 and has been used for conferences and fairs throughout the company since then, (b) for an invited keynote talk presented by Thomas at FPL'17, (c) for publications at ICCD'17 and DATE'18, and (d) for integration in two Xilinx products related to accelerated neural network inference. Two techniques devised in the course of the project are currently reviewed internally to decide between patent filing or immediate publication.