Periodic Reporting for period 4 - CC-MEM (Coordination and Composability: The Keys to Efficient Memory System Design)
Reporting period: 2021-09-01 to 2022-08-31
Important: A significant amount of our computing energy goes into moving data. As computer systems are power-limited (batteries on mobile devices, cooling on all devices), decreasing the energy spent on moving data will allow us to increase performance and/or battery life.
Overall objectives: Improve data movement efficiency by coordinating data movement across the different parts of the system.
Interaction between instructions and scheduling: We have analyzed the behavior of memory instructions and their interactions with other instructions in the processor. This has given us insights into how we can efficiently construction hardware schedulers that allow instructions to execute nearly as well as expensive out-of-order schedulers, with far less cost. The results are significantly increased scheduling efficiency and decreased complexity.
Complex memory systems: We have analyzed the interaction between applications that execute on many distributed processors (both graphics workloads on GPUs and large-scale NUMA workloads) to determine how best to optimize memory system behavior. In both cases we found that combining knowledge of the hardware and software allowed us to significantly improve performance, but doing so required clever techniques to explore/understand how to configure the applications and hardware.
Traditional memory systems: We have analyzed the interaction between memory requests and the existing processor pipeline and identified that we can take advantage of existing structures in the processor to improve efficiency with essentially no overhead. This has allowed us to transform both the store buffer and the register file into caches, thereby significantly reducing the energy spent accessing the first-level cache. In addition to working within the processor, we have improved the interactions between the processor and the OS through the virtual memory system. This has resulted in improvements to the allocation of large pages in fragmented systems and a re-design of the 40-year-old choices we are still using in today's virtual memory paging systems. The latter has resulted in a design that is both enough better and simple enough that it is being included in the future design of most mobile processors.