Predictive Multiprocessor Caching Techniques Based on Cache Interference and Working Set Change

Investigators: Jih-Kwon Peir

Sponsor: NSF/EIA

Abstract:

High-performance computer servers based on shared-memory multiprocessing technology continue receiving great attention due to the demand from the blooming internet market. Parallel programs running on cache-coherent, shared-memory multiprocessor environments, like transaction processing workloads, incur performance penalties due to cache interference caused by data sharing. The interference causes cache lines to be involuntarily relinquished before required by an LRU replacement policy. This problem is especially serious for modified lines which can account for a large percentage of the total cache misses but also incur higher miss penalty. As the number of processors increase, the cache interference can become dominant and hinder any performance improvement.

Therefore, the main objective of this project is to investigate and evaluate innovative hardware-based approaches to reduce the sharing misses in multiprocessor caches. The fundamental idea is to record the lines that a cache has given up involuntarily. Such lines become potential prefetching targets because they are likely to be used in the near future due to locality of references. There are two general approaches to prefetch the early-invalidated lines. The first and intuitive way is to take advantages of normal coherence transactions. This situation may encounter when the modified copy of a line is transferred from the owner's cache to the requester in response to a read miss, or when the line is evicted from the owner's cache. The second and more aggressive approach is to give up the ownership earlier by predicting the last-modify of the line by the owner. This early given-up line can then be selectively broadcasted to the processors where the line has been invalidated recently.

Papers and Presentations:

  1. L. Peng, J-K. Peir, and K. Lai, A New Address-Free Memory Hierarchy Layer for Zero-Cycle Load, Journal of Instruction-Level Parallelism, Vol. 6, Sep. 2004.
  2. L. Peng, J-K. Peir, and K. Lai, Signature Buffer: Bridging Performance Gap between Registers and Caches, 10th Int'l Symp. on High Performance Computer Architecture, (HPCA-10), Feb. 2004.
  3. L. Peng, J-K. Peir, Q. Ma, and K. Lai, Address-Free Memory Access Based on Program Syntax Correlation of Loads and Stores, IEEE Transactions on VLSI Systems, Vol. 11(3), June 2003.
  4. J-K. Peir, S. Lai, S. LU, J. Stark, and K. Lai, Bloom Filtering Cache Miss for Accurate Data Speculation and Prefetching, Int'l Conf. on Supercomputing, New York, NY, June 2002.
  5. S, Lai, S. Lu, K. Lai, and J-K. Peir, Ditto Processor, Int'l Conf. on Dependable Systems and Networks, Washington DC, June 2002.
  6. B. Chung, J. Zhang, J-K. Peir, S. Lai, K. Lai, Direct Load: Dependence-Linked Dataflow Resolution of Load Address and Cache Coordinate, 34th Int'l Symp. on Microarchitecture, Austin, TX, Nov. 2001.
  7. Q. Ma, J-K. Peir, L. Peng, and K. Lai, Symbolic Cache: Fast Memory Access Based on Program Syntax Correlation of Loads and Stores, Best Paper Award , IEEE 2001 Int'l Conf. on Computer Design, Austin, TX, Sep. 2001.
  8. B. Chung, Y. Lee, J.-K. Peir, and K. Lai, Two-Phase Write-Posting on Symmetric Multiprocessors, 2001 Int'l Conf. on Parallel and Distributed Processing Techniques and Applications, June 2001.
  9. J-K. Peir, J. Zhang, S. Zhang, S. Robinson, K. Lai, and W. Wang, "Predictive Multiprocessor Caching: Read/Write Snarfing, Preown, and Selective Write Broadcast," 9th Workshop on Scalable Shared Memory Multiprocessors,, Vancouver, CA, June 2000.
  10. J-K. Peir, W. W. Hsu, H. Young, and S. Ong, "Improving Cache Performance with Full-Map Block Directory," Journal of System Architecture, Vol.46(2000), pp. 439-454.