Memory Hierarchy Studies on Many-Core CMPs with Large On-Die Storage

Investigators: Jih-Kwon Peir

Sponsor: Intel

Abstract:

Chip Multiprocessors (CMPs) have become an industry standard for achieving a higher chip-level IPC (Instruction-Per-Cycle). Recently, Intel's Tara-scale computing project pushed the number of on-die cores to tens or even hundreds. In addition, there are many new memory technologies looming on the horizon that may reshape the memory hierarchy organization in many-core CMPs. Among these new technologies, the evolving high density memory, such as Thyristor_RAM, Ferroelectric-RAM, and Resistive-RAM could potentially be embedded in the CPU die to provide much larger on-chip storages. Key related design issues in future many-core CMPs are an intelligent on-die memory hierarchy organization along with efficient data communication and coherency mechanisms among many cores and storage modules. Furthermore, it is essential to incorporate new memory technologies into future CMPs for more reliable and scalable memory systems.

In this project, our first proposed research topic is to investigate solutions of using large on-die storage as caches and/or an addressable unit. In addition, designing a scalable cache coherence mechanism with large on-die storage is very challenging and opens many new research fronts that will have substantial impact on CMP performance. Our second proposed topic is to integrate new memory technology and its usage as the main memory in future CMPs. It is well-known that main memory built with the new memory technology incurs substantial longer latency. We will study data prefetching techniques to hide the memory latency. To be effective, all prefetching methods must overcome four serious challenges: accuracy, miss coverage, timeliness, and space overhead. Existing prefetching methods are based on two general behaviors of the missing block addresses: regularity and correlation. We will look into solutions in both directions and investigate more suitable solutions for the new memory technologies. In this project, we will evaluate different data prefetching methods using the MARSS whole-system simulation environment.

 

Related Publications:

1.   

1.   Xi Tao, Qi Zeng, Jih-Kwon Peir, and Shih-Lien Lu, "Small Cache Lookaside Table for Fast DRAM Cache Access," 35th IEEE International Performance Computing and Communications Conference, (IPCCC), Las Vegas Nevada, Dec. 2016.

2.   Xi Tao, Qi Zeng, Jih-Kwon Peir and Shih-Lien Lu, "Runahead Cache Misses Using Bloom Filter," 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Guangzhou China, Dec. 2016.

3.  Xi Tao, Qi Zeng, Jih-Kwon Peir, "Hot Row Identification of DRAM Memory in a Multicore System," 2016 High Performance Computing and Cluster Technologies Conference (HPCCT), Chengdu China, Dec. 2016.

4.  Xudong Shi, Feiqi Su and Jih-Kwon Peir, "Directory Lookaside Table: Enabling Scalable, Low-Conflict, Many-Core Cache Coherence Directory," 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu Taiwan, Dec. 2014.

5.  Jih-Kwon Peir, S. Lai, S. Lu, J. Stark, and K. Lai, "Auther Retrospective: Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching," 25 years of ACM International Conference on Supercomputing (ICS). DOI:  http://dx.doi.org/10.1145/514191.514219, Dec. 2014.

6.   Jianmin Chen, Xi Tao, Zhen Yang, Jih-Kwon Peir, Xiaoyuan Li, Shih-Lien Lu, "Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency", 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Boston MA, May 2013.

7.   Gang Liu, Jih-Kwon Peir, Victor Lee, "Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data Prefetching", 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai China, May 2012.