Skip to main content
European Commission logo print header

Dependable Performance on Many-Thread Processors

Final Report Summary - DPMP (Dependable Performance on Many-Thread Processors)

Contemporary microprocessors seek at improving performance through thread-level parallelism by co-executing multiple threads on a single microprocessor chip. Current processors feature multiple tens to hundreds of threads, hence called many-thread processors. Many-thread processors, however, lead to non-dependable performance: co-executing threads affect each other’s performance in unpredictable ways because of resource sharing across threads. Failure to deliver dependable performance leads to missed deadlines, priority inversion, unbalanced parallel execution, etc., which severely impacts the usage model and the performance growth path for many important future and emerging application domains (e.g. media, medical, datacenter).

This project developed a cycle accounting architecture to track per-thread performance, which enables system software to deliver dependable performance by assigning hardware resources to threads depending on their relative progress. Through this cooperative hardware-software approach, this project addressed a fundamental problem in multi-threaded ad multi/many-core processing.

More specifically, we made several important contributions through this project. (1) We designed novel cycle accounting architectures, called criticality stacks and bottle graphs, to monitor per-thread performance in multi-threaded and managed language workloads. (2) We leverage per-thread progress to steer hardware/software cooperative scheduling and resource management to optimize (heterogeneous) multicore performance under bandwidth, power and reliability constraints. (3) To evaluate this idea, we developed Sniper, a parallel, hardware-validated, multi/many-core simulator that runs at a simulation speed up to 2 MIPS on current hardware. Its key feature is the ability to model core performance at a high level of abstraction using analytical models, which reduces both simulator development and evaluation time. The overarching (meta) conclusion from the project is that simple white-box analytical models are extremely powerful to comprehensively monitor workload execution characteristics, steer application scheduling and resource management, and devise powerful simulation infrastructures for increasingly complex multicore processor architectures.

This project has led to more than 60 publications in high-profile journals and conferences; a publicly released architecture simulator called Sniper that is now widely used in academia and industry (http://www.snipersim.org/); a spin-off called CoScale in datacenter monitoring (http://www.coscale.com/); and two related ERC Proof-of-Concept projects.