CORDIS - EU research results

Sustainable Performance for High-Performance Embedded Computing Systems

Periodic Reporting for period 3 - SuPerCom (Sustainable Performance for High-Performance Embedded Computing Systems)

Reporting period: 2021-06-01 to 2022-11-30

SuPerCom addresses the challenge of providing "high and sustainable performance" (hsperf) covering the highest-ever computation performance needs of critical software with strong guarantees on sustainability for safe operation. To reach its goals, SuPerCom proposes a radical new approach by combining performance analysis, hardware design, and statistical and machine learning analysis. SuPerCom proposes innovative solutions that push the limits of current approaches for sustainable performance.

SuPerCom solutions can become an integral part of the ecosystem of next-generation embedded computing systems by allowing them to use increasingly-complex high-performance hardware features on which strong guarantees of sustainable performance can be placed soundly a priori. This will allow developers to use computer performance-demanding functionalities (with sound guarantees) such as complex algorithms to control CO2 emissions or to provide driving assistance in cars; or devices with increasing performance requirements in the medical market such as pacemakers or infusion pumps. Hence, the SuPerCom breakthrough can have a significant economic and societal impact.

1. SuPerCom will provide sustainable performance with minimum impact on the performance of complex resources (e.g. accelerators) and with a small impact on overall hardware complexity.
2. For hard-to-predict resources SuPerCom shifts away from performance-capping solutions and instead enables high-performance features by adding novel hardware sensing techniques that implement Key Performance Indicators (KPIs).
3. SuPerCom will leverage statistical analysis to manage the data coming from the proposed advanced KPIs and the hardware sensors that make them visible.
4. To enable incremental software verification, SuPerCom characterizes performance requirements for individual applications in isolation and develop an automatic framework that will produce benchmarks to create controlled load scenarios needed for application profiling.
5. SuPerCom introduces a hsperf in-field feedback-loop mechanism that maintains the sensing active during system operation to collect measurements for each system instance.
Requirements. We defined the main project requirements during the first months of the project: hardware support for predictability and observability. We also defined the main statistical and machine learning techniques to be used in the project.

Case Studies/Benchmarks/Kernels. We proposed a benchmarking approach for state-of-the-art autonomous driving platforms, in accordance with the key modules, structural design and functions of AD systems, building on several industry-level autonomous driving systems. In addition, we ported a space case study in an embedded GPU, showing the feasibility and effectiveness of existing space algorithm acceleration using GPUs.

Toolchain. We developed a baseline simulation infrastructure featuring state-of-the-art architectural support and industry-level accuracy. We are actively working on keeping these tools updated and adding features that are required.

Modelling. We developed timing models for crossbar interconnects resulting in tighter bounds by exploiting their ability to process several requests in parallel. We also present better modeling approaches for the different parameters of a network on chip-interconnect. For buses, we propose an ILP formulation for computing the worst-case contention delay suffered by a task due to interference on a shared bus. We also derived a technique to derive the internal operations of the GPU system software to increase the understanding of their observed behavior and how resources are internally managed.

Analysis. We developed a technique to handle the variability in the values of hardware event monitors when running several times in the same experiment. For probabilistic WCET analysis, we show how survivability-analysis theory can help in producing tighter bounds. We provide models to test whether the exponential assumption holds for probabilistic WCET estimates. We account for timing anomalies as part of probabilistic WCET analysis for the first time. We provide an assessment of how to manage task dependencies to schedule tasks with probabilistic WCET estimates. We showed the main gaps for the analysis of AD applications for their adoption in critical systems. We showed how statistical analysis can be used to model the timing analysis of AD software. We showed how AD applications can be adapted to fully exploit the performance of the different computing elements in advanced hardware. We produced the first survey on the use on probabilistic worst-case analysis in the literature. The statistical analysis used in the project allowed us to model other metrics of interest like worst-case energy consumption, power peaks, and hardware faults. Moreover, we developed a methodology used together with software randomization, a probabilistic WCET enabler, which allows computing the resource allocations in terms of memory and timing budget.

Characterization and Observability. We showed the main challenges for the characterization of complex AD applications to derive metrics like time and memory usage. We also showed how micro-benchmarks can be used to derive bounds to space applications in representative boards in that domain. At the hardware level we dig down into some of the uncertainty coming from readings of hardware event monitors which can be subject to unexpected behavior and propose a methodology to increase the confidence in their correct behavior. We also address the need for a standardized PMC interface in the embedded domain, especially in view to support timing characterization of embedded platforms.

Hardware Support. We show GPU configurations that are appropriate for automotive setups by modeling an automotive GPU. We proposed a hardware technique to track contention delay rather than events as a way to improve the accuracy in the stall cycles incurred by the task due to contention. We propose a cache write policy that reconciles the benefits of high-performance and real-time policies. Moreover, we argued about the necessity for high integrity hardware performance monitoring units in automotive systems. In addition, we proposed a performance monitoring unit for safety-critical systems.
The techniques proposed for modelling, statistical/big-data analysis, hardware and software are novel: this includes the usage of risk-analysis instead of survivability analysis and pattern matching techniques for the modeling of parallel interconnects. The first study on how to use statistical analysis for AD software, how AD applications can better exploit different compute elements to reduce WCET, how to exploit request sequences to exploit interconnect parallelism and how to exploit information about the resource allocation of GPU memory allocations, based on reverse engineering of these black-box components. The analysis of the hardware event monitor predictability and the statistical approach to handle it. The hardware techniques to improve predictability or its modelling without affecting average performance. The formalization of the requirements and proposals for emerging observability requirements on complex processors. The resource allocation approach for software randomized systems with probabilistic WCET.