Skip to main content

DAL: Defying Amdahl's Law

Final Report Summary - DAL (DAL: Defying Amdahl's Law)

While multicores featuring 100’s or even 1000’s of cores will become feasible around 2020, sequential programming style will continue to be dominant. Even future mainstream parallel applications will exhibit large sequential sections. Amdahl’s law indicates that high performance on these sequential sections is needed to enable overall high performance on the whole application. On many (most) applications, the effective performance of future computer systems using a 1000-core processor chip will significantly depend on their performance on both sequential code sections and single thread. Around 2020, many processor chips will feature a few complex cores and many (may be 1000’s) simpler, more silicon and power effective cores.

In the DAL research project, we have explored several microarchitecture techniques that may enable will be needed to enable high performance on sequential programs and sequential sections in parallel programs on such heterogeneous processor chips.

In particular, we have introduced a simple model to evaluate the potential performance of a parallel application on a manycore. We have made significant progress in branch prediction, thus limiting the performance loss associated with control flow misprediction. We have also presented models to use the simple cores as helpers to enhance the sequential programs, particularly through preexecution of a skeleton of the application thus enabling prefetching of data from memory. We have also made substantial contributions in hardware management of the memory hierarchy, proposing new hardware prefetcher and new cache replacement policy, but also new compressed cache structures.
However, the main contribution of the DAL project has been on the microarchitecture of the excution core of the processor itself. In particular, we have shown that value prediction, i.e. predicting the results of individual instructions, is a technique that can both provide substantial performance improvement on sequential code AND be leveraged to reduce the overall complexity of an out-of-order execution core.