BITS Faculty Publications
Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867
Browse
4 results
Search Results
Item Adaptive Zone-Aware Multi-bank on Chip last level L2 Cache Partitioning for Chip Multiprocessors(IJCA, 2010) Chaturvedi, NitinThis paper proposes a novel efficient Non-Uniform Cache Architecture (NUCA) scheme for the Last-Level Cache (LLC) to reduce the average on-chip access latency and improve core isolation in Chip Multiprocessors (CMP). The architecture proposed is expected to improve upon the various NUCA schemes proposed so far such as S-NUCA, D-NUCA and SP-NUCA[9][10][5] in terms of average access latency without a significant reduction in the hit rate. The complete set of L2 banks is divided into various zones. Each core belongs to one particular zone which is the closest to it. Consequently, adjacent cores are grouped into the same zone. Each zone individually follows the SP-NUCA scheme [5] for maintaining core isolation and sharing common blocks. However, blocks that need to be shared by cores which belong to different zones are replicated. This scheme is much more scalable than the SP-NUCA scheme and bounds the maximum on-chip access latency to a lower value as the number of cores increases.Item Study of Various Factors Affecting Performance of Multi-Core Processors(IJDPS, 2013-07) Chaturvedi, NitinAdvances in Integrated Circuit processing allow for more microprocessor design options. As Chip Multiprocessor system (CMP) become the predominant topology for leading microprocessors, critical components of the system are now integrated on a single chip. This enables sharing of computation resources that was not previously possible. In addition the virtualization of these computation resources exposes the system to a mix of diverse and competing workloads. On chip Cache memory is a resource of primary concern as it can be dominant in controlling overall throughput. This Paper presents analysis of various parameters affecting the performance of Multi-core Architectures like varying the number of cores, changes L2 cache size, further we have varied directory size from 64 to 2048 entries on a 4 node, 8 node 16 node and 64 node Chip multiprocessor which in turn presents an open area of research on multicore processors with private/shared last level cache as the future trend seems to be towards tiled architecture executing multiple parallel applications with optimized silicon area utilization and excellent performance.Item An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors(Springer, 2015-07) Chaturvedi, NitinMost of today’s chip multiprocessors implement last-level shared caches as non-uniform cache architectures. A major problem faced by such multicore architectures is cache line placement, especially in scenarios where multiple cores compete for line usage in the single non-uniform shared L2 cache. Block migration has been suggested to overcome the problem of optimum placement of cache blocks. Previous research, however, shows that an uncontrolled block migration scheme leads to scenarios where a cache line ‘ping-pongs’ between two requesting cores resulting in higher access latency for both the requestors and greater power dissipation. To address this problem, this paper first proposes a mechanism to dynamically profile data block usage from different cores on the chip. We then propose an adaptive migration–replication scheme for shared last-level non-uniform cache architectures that adapts between selectively replicating frequently used cache lines near the requesting cores and cache line migration towards the requesting core in case of fewer requests. AMR eliminates ‘ping-ponging’ of cache lines between the banks of the requesting cores. However, any mechanism that dynamically adapts between migration and replication at runtime is bound to have a complex search scheme to locate data blocks. To simplify the data lookup policy, this work also presents an efficient data access mechanism for non-uniform cache architectures. Our proposal relies on low overhead and highly accurate in-hardware pointers to keep track of the on-chip location of the cache block. We show that our proposed scheme reduces the completion time by on average 12.25, 8.1 and 3 % and energy consumption by 11.65, 8.5 and 2.1 % when compared to state-of-the-art last-level cache management schemes S-NUCA, D-NUCA and HK-NUCA, respectively. SPEC and PARSEC benchmarks were used to thoroughly evaluate our proposal.Item An efficient adaptive block pinning for multicore architectures(Elsevier, 2015-05) Chaturvedi, NitinMost of today’s multi-core processors feature last level shared L2 caches. A major problem faced by such multi-core architectures is cache contention, where multiple cores compete for usage of the single shared L2 cache. Previous research shows that uncontrolled sharing leads to scenarios where one core evicts useful L2 cache content belonging to another core. To address this problem, the paper first presents a cache miss classification scheme – CII: Compulsory, Inter-processor and Intra-processor misses – for CMPs with shared caches and its comparison to the 3C miss classification for a traditional uniprocessor, to provide a better understanding of the interactions between memory references of different processors at the level of shared cache in a CMP. We then propose a novel approach, called block pinning for eliminating inter-processor misses and reducing intra-processor misses in a shared cache. Furthermore, we show that an adaptive block pinning scheme improves over the benefits obtained by the block pinning and set pinning scheme by significantly reducing the number of off-chip accesses. This work also proposes two different schemes of relinquishing the ownership of a block to avoid domination of ownership by a few active cores in the multi-core system which results in performance degradation. Extensive analysis of these approaches with SPEC and PARSEC benchmarks are performed using a full system simulator.