## Final Report Summary - OPPORTUNISTIC-DSP (Opportunistic Approximations to Break the Traditional Efficiency Limits of Flexible DSP Implementations)

This project, entitled “Opportunistic Approximations to Break the Traditional Efficiency Limits of Flexible DSP Implementations”, has investigated for two years an unconventional method of implementing Digital Signal Processing (DSP) applications that require extreme energy efficiency. We have shown significant energy savings in the implementation of digital wireless communication modems but similar advantages can be expected in other application domains that experience highly dynamic execution conditions, like health monitoring, video/audio processing, sensor networks, etc. The energy savings can reduce the size and increase the autonomy of many existing products that make heavy use of signal processing (e.g. smartphones) and even potentially enable new products that are not yet feasible due to the energy inefficiency achieved by traditional static implementations.

Typically, DSP implementations are dimensioned to withstand the most adverse possible execution conditions. Robustness is achieved at the expense of energy efficiency: the processing precision achieved by the static implementation is significantly higher than what is actually required in most execution conditions, and thus, some extra operations and some extra bits in the operations needlessly consume extra energy. The execution conditions of DSP implementations are very often remarkably more permissive than the worst-case conditions (e.g. a gentle wireless channel in wireless communications). For those cases, a more energy efficient implementation can be realized by contextualizing the finite precision refinement (i.e. implementation step that transforms the infinite precision specification onto a bit-true specification that is maintained throughout the rest of the implementation flow). This project introduced the concept of opportunistic approximations to fundamentally depart from the notion of a single worst-case design point. Opportunistic approximations move in a new practical direction by (1) generating a multiplicity of context-dependent relaxed specifications, (2) creating optimized implementations—by adapting the operation type, number and precision of each specification—, and (3) continuously monitoring the execution conditions. Thereby, the cheapest implementation that applies to the actual execution context can be opportunistically selected without compromising the quality experienced by the user. Opportunistic run-time approximations do not modify the algorithmic functionality, only its processing precision. By relaxing the processing load whenever possible, flexible DSP implementations can achieve significant efficiency gains while continuing to meet the same I/O interface (e.g. bit-error rate in a communication link).

During the project, we focused on a couple of aspects of system design to enable opportunistic approximations. On the one hand, we built a very detailed motivational example to formalize the design methodology and illustrate the potential of the proposed technique. On the other hand, we developed new design automation technology to facilitate the use of our technique. Next, we detail the contributions of this project to both aspects.

The first type of contributions is in design methodology. We chose an advanced Multiple Input, Multiple Output (MIMO) 3GPP LTE receiver as our motivational example to formalize the design methodology and illustrate the potential benefits of the proposed opportunistic approximations. We divide the process of finite precision refinement in three consecutive steps. First, algorithmic approximations can be used to transform the original algorithm to a different one of lower precision. For example, an estimation algorithm can replace the ideal 2-norm distance computation with a less complex 1-norm distance at the expense of some loss in convergence optimality. Then, algebraic approximations can be applied once the algorithmic approximations have fixed the algorithm to replace the ideal operators with approximate ones. For example, consider a Taylor approximation of a reciprocal square root. Finally, signal approximations can be applied to assign a finite number of bits to the inputs and outputs of each operator in the algorithm. The result of applying the three types of approximated transformations is a bit-true specification of the targeted DSP implementation. The main novelty of the proposed opportunistic approximations is the application of these transformations with consideration of multiple diverse execution contexts and the derivation of multiple different bit-true specifications that are only valid for a given execution context. Accordingly, it is essential to design a run-time monitor able to track the execution dynamics and select the appropriate bit-true implementation for the given execution context. In the case of our MIMO receiver, we proposed an estimation of the orthogonality defect as the run-time monitor. We showed that the orthogonality defect is a property of a MIMO channel that is correlated to the effort required to decode, and thus, can be used to identify execution contexts where a lower precision implementation would do equally well as the high precision worst-case one. The design of such a monitor involves detailed knowledge of the application, as the monitor likely needs to exploit complex correlations that are different for each application. Moreover, these correlations may not even be explicit in the application code. An application expert should find what external conditions stress the system in order to define an effective monitor. Applying the proposed opportunistic approximations to our motivational example resulted in a 40% reduction in energy consumption compared to the original optimized static. These energy savings are achieved at the expense of a slight increase in overall chip area. Importantly, we demonstrated that despite such savings our implementation met the same bit-error rate requirements as the reference static implementation, meaning that the energy savings are achieved without degrading the user experience.

The second type of contributions is in design automation technology. Opportunistic approximations can achieve significant energy savings at the expense of an increase in design complexity. Accordingly, it is essential to increase the level of automation in order to facilitate the adoption of opportunistic approximations. In the context of this project, we have three contributions in design automation technology.

First, we implemented a novel analytical approach to measure the quantization noise power at the output of the targeted DSP implementation in the LLVM compiler infrastructure. With this, a designer can obtain quick feedback on the amount of error introduced by a given bit-width configuration. Our approach showed to be more than 10 times faster than a Monte Carlo based simulation approach configured to achieve similar estimation accuracy.

Second, we proposed a novel methodology to derive an adequate fixed-point specification from a floating point reference. Existing methods cannot scale to complex systems, which includes many signals, without taking an important hit on signal sizing optimality. To achieve scalability, we introduced a new divide-and-conquer method that is able to approach the quality of global methods in significantly less time. First, our method sorts the signals in multiple groups considering the propagation path of the signals to the global application metric (i.e. bit-error rate). Then, the fixed-point configurations of the groups are resolved with fast local simulations. Finally, the global fixed-point configuration is composed by the group configurations using slow global simulations. The method was applied to the fixed-point refinement of an advanced wireless algorithm achieving close to 9 times speedup with respect to a reference statistical method without affecting the quality of the result.

Finally, we explored the use of Domain Specific Languages (DSL) to enable smarter compiler transformations that can rely on specific domain information. In particular, we have used properties of linear algebra to show that clever matrix optimizations can be elegantly encoded in a DSL to generate an implementation that outperforms traditional (C-based) implementation flows.

In summary, during this project we have achieved significant progress towards the realization of opportunistic approximations as a viable design alterative. The most important achievement was to show that opportunistic approximations applied to a real-world system deliver significant energy savings without degrading the user experience, and that this energy savings are complementary to any other traditional energy-saving optimization.

Typically, DSP implementations are dimensioned to withstand the most adverse possible execution conditions. Robustness is achieved at the expense of energy efficiency: the processing precision achieved by the static implementation is significantly higher than what is actually required in most execution conditions, and thus, some extra operations and some extra bits in the operations needlessly consume extra energy. The execution conditions of DSP implementations are very often remarkably more permissive than the worst-case conditions (e.g. a gentle wireless channel in wireless communications). For those cases, a more energy efficient implementation can be realized by contextualizing the finite precision refinement (i.e. implementation step that transforms the infinite precision specification onto a bit-true specification that is maintained throughout the rest of the implementation flow). This project introduced the concept of opportunistic approximations to fundamentally depart from the notion of a single worst-case design point. Opportunistic approximations move in a new practical direction by (1) generating a multiplicity of context-dependent relaxed specifications, (2) creating optimized implementations—by adapting the operation type, number and precision of each specification—, and (3) continuously monitoring the execution conditions. Thereby, the cheapest implementation that applies to the actual execution context can be opportunistically selected without compromising the quality experienced by the user. Opportunistic run-time approximations do not modify the algorithmic functionality, only its processing precision. By relaxing the processing load whenever possible, flexible DSP implementations can achieve significant efficiency gains while continuing to meet the same I/O interface (e.g. bit-error rate in a communication link).

During the project, we focused on a couple of aspects of system design to enable opportunistic approximations. On the one hand, we built a very detailed motivational example to formalize the design methodology and illustrate the potential of the proposed technique. On the other hand, we developed new design automation technology to facilitate the use of our technique. Next, we detail the contributions of this project to both aspects.

The first type of contributions is in design methodology. We chose an advanced Multiple Input, Multiple Output (MIMO) 3GPP LTE receiver as our motivational example to formalize the design methodology and illustrate the potential benefits of the proposed opportunistic approximations. We divide the process of finite precision refinement in three consecutive steps. First, algorithmic approximations can be used to transform the original algorithm to a different one of lower precision. For example, an estimation algorithm can replace the ideal 2-norm distance computation with a less complex 1-norm distance at the expense of some loss in convergence optimality. Then, algebraic approximations can be applied once the algorithmic approximations have fixed the algorithm to replace the ideal operators with approximate ones. For example, consider a Taylor approximation of a reciprocal square root. Finally, signal approximations can be applied to assign a finite number of bits to the inputs and outputs of each operator in the algorithm. The result of applying the three types of approximated transformations is a bit-true specification of the targeted DSP implementation. The main novelty of the proposed opportunistic approximations is the application of these transformations with consideration of multiple diverse execution contexts and the derivation of multiple different bit-true specifications that are only valid for a given execution context. Accordingly, it is essential to design a run-time monitor able to track the execution dynamics and select the appropriate bit-true implementation for the given execution context. In the case of our MIMO receiver, we proposed an estimation of the orthogonality defect as the run-time monitor. We showed that the orthogonality defect is a property of a MIMO channel that is correlated to the effort required to decode, and thus, can be used to identify execution contexts where a lower precision implementation would do equally well as the high precision worst-case one. The design of such a monitor involves detailed knowledge of the application, as the monitor likely needs to exploit complex correlations that are different for each application. Moreover, these correlations may not even be explicit in the application code. An application expert should find what external conditions stress the system in order to define an effective monitor. Applying the proposed opportunistic approximations to our motivational example resulted in a 40% reduction in energy consumption compared to the original optimized static. These energy savings are achieved at the expense of a slight increase in overall chip area. Importantly, we demonstrated that despite such savings our implementation met the same bit-error rate requirements as the reference static implementation, meaning that the energy savings are achieved without degrading the user experience.

The second type of contributions is in design automation technology. Opportunistic approximations can achieve significant energy savings at the expense of an increase in design complexity. Accordingly, it is essential to increase the level of automation in order to facilitate the adoption of opportunistic approximations. In the context of this project, we have three contributions in design automation technology.

First, we implemented a novel analytical approach to measure the quantization noise power at the output of the targeted DSP implementation in the LLVM compiler infrastructure. With this, a designer can obtain quick feedback on the amount of error introduced by a given bit-width configuration. Our approach showed to be more than 10 times faster than a Monte Carlo based simulation approach configured to achieve similar estimation accuracy.

Second, we proposed a novel methodology to derive an adequate fixed-point specification from a floating point reference. Existing methods cannot scale to complex systems, which includes many signals, without taking an important hit on signal sizing optimality. To achieve scalability, we introduced a new divide-and-conquer method that is able to approach the quality of global methods in significantly less time. First, our method sorts the signals in multiple groups considering the propagation path of the signals to the global application metric (i.e. bit-error rate). Then, the fixed-point configurations of the groups are resolved with fast local simulations. Finally, the global fixed-point configuration is composed by the group configurations using slow global simulations. The method was applied to the fixed-point refinement of an advanced wireless algorithm achieving close to 9 times speedup with respect to a reference statistical method without affecting the quality of the result.

Finally, we explored the use of Domain Specific Languages (DSL) to enable smarter compiler transformations that can rely on specific domain information. In particular, we have used properties of linear algebra to show that clever matrix optimizations can be elegantly encoded in a DSL to generate an implementation that outperforms traditional (C-based) implementation flows.

In summary, during this project we have achieved significant progress towards the realization of opportunistic approximations as a viable design alterative. The most important achievement was to show that opportunistic approximations applied to a real-world system deliver significant energy savings without degrading the user experience, and that this energy savings are complementary to any other traditional energy-saving optimization.