Abstract:
Future multi-core systems will execute massive memory intensive applications with significant data
sharing. On chip memory latency further increases as more cores are added since diameter of most on chip
networks increases with increase in number of cores, which makes it difficult to implement caches with single
uniform access latency, leading to non-uniform cache architectures (NUCA). Data movement and their
management further impacts memory access latency and consume power. We observed that previous D-NUCA
design have used a costly data access scheme to search data in the NUCA cache in order to obtain significant
performance benefits. In this paper, we propose an efficient and implementable data access algorithm for DNUCA
design using a set of pointers with each bank. Our scheme relies on low-overhead and highly accurate
in-hardware pointers to reduce miss latency and on-chip network contention. Using simulations of 8-core multicore,
we show that our proposed data search mechanism in D-NUCA design reduces 40% dynamic energy
consumed per memory request and outperforms multicast access policy by an average performance speedup of
6%.