Department of Electrical and Electronics Engineering
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1925
Browse
4 results
Search Results
Item An adaptive coherence protocol with adaptive cache for multi-core architectures(IEEE, 2013) Chaturvedi, NitinNext generation multicore processors and their applications will process massive amounts of data with significant sharing. Data movement between cores and shared cache hierarchy and its management impacts memory access latency and consumes power. The efficiency of high-performance shared-memory multicore processors depends on the design of the on-chip cache hierarchy and the coherence protocol. Current multicore cache hierarchies uses a fixed size of cache block in the cache organization and in the design of the coherence protocols. The fixed size of block in the set is basically choosen to match average spatial locality requirement across a range of applications, but it also results in wastage of bandwidth because of unnecessary coherence traffic for shared data. The additional bandwidth has a direct impact on the overall energy consumption. In this paper, we present a new adaptable and implementable cache design with novel proposal of the design of cache coherence protocol that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality.Item An Adaptive Block Pinning Cache for Reducing Network Traffic in Multi-core Architectures(IEEE, 2013) Chaturvedi, NitinWith advent of new technologies there is exponential increase in multi-core processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this issue. A NUCA partitions the complete cache memory into smaller multiple banks and allows banks near the processor cores to have lower access latencies than those further away, thus reducing the effects of the cache's internal wire delays. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D- NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. In D-NUCA designs a data blocks can migrate towards the processor core that access them most frequently. This migration of data blocks will increase network traffic. The short life time of data blocks and low spatial locality in many applications results in eviction of block with few unused words. This effectively increases miss rate, and waste on chip network bandwidth. Unused word transfers also wastes a large fraction of on chip energy consumption.In this paper, we present an efficient and implementable cache design that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality. It also presents one way to scale on-chip coherence with less costeffective techniques such as shared caches augmented to track cached copies, explicit eviction notification and hierarchal design. Based on our scalability analysis of this cache design we predict that this design consistently reduce miss rate and improve the fraction of data transmitted that is actually utilized by the applicationItem Selective Cache Line Replication Scheme in Shared Last Level Cache(Elsevier, 2015) Chaturvedi, NitinIn current multi-core systems with the shared last level cache (LLC) physically distributed across all the cores, both initial data placement and subsequent placement of data close to the requesting core can contribute to reducing memory access latency and power consumption. This paper extends a replication scheme that balances between access latency and cache capacity in shared NUCA designs by selectively replicating frequently used cache lines close to the requesting cores. Our scheme reduces completion time by 15% and improves energy consumption by 27% when compared to the Static-NUCA (S-NUCA) management scheme, when simulated on an eight core system.Item An Efficient Data Access Policy in shared Last Level Cache(World Scientific, 2015) Chaturvedi, NitinFuture multi-core systems will execute massive memory intensive applications with significant data sharing. On chip memory latency further increases as more cores are added since diameter of most on chip networks increases with increase in number of cores, which makes it difficult to implement caches with single uniform access latency, leading to non-uniform cache architectures (NUCA). Data movement and their management further impacts memory access latency and consume power. We observed that previous D-NUCA design have used a costly data access scheme to search data in the NUCA cache in order to obtain significant performance benefits. In this paper, we propose an efficient and implementable data access algorithm for DNUCA design using a set of pointers with each bank. Our scheme relies on low-overhead and highly accurate in-hardware pointers to reduce miss latency and on-chip network contention. Using simulations of 8-core multicore, we show that our proposed data search mechanism in D-NUCA design reduces 40% dynamic energy consumed per memory request and outperforms multicast access policy by an average performance speedup of 6%.