CnC-PRAC: Coalesce, not Cache, Per Row Activation Counts for an Efficient in-DRAM Rowhammer Mitigation
- URL: http://arxiv.org/abs/2506.11970v1
- Date: Fri, 13 Jun 2025 17:28:38 GMT
- Title: CnC-PRAC: Coalesce, not Cache, Per Row Activation Counts for an Efficient in-DRAM Rowhammer Mitigation
- Authors: Chris S. Lin, Jeonghyun Woo, Prashant J. Nair, Gururaj Saileshwar,
- Abstract summary: JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs.<n>We propose CnC-PRAC, a PRAC implementation that addresses both performance and energy overheads.
- Score: 4.040475373859059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs to enable precise counting of DRAM row activations using per-row activation counts. While recent PRAC implementations enable holistic mitigation of Rowhammer attacks, they impose slowdowns of up to 10% due to the increased DRAM timings for performing a read-modify-write of the counter. Alternatively, recent work, Chronus, addresses these slowdowns, but incurs energy overheads due to the additional DRAM activations for counters. In this paper, we propose CnC-PRAC, a PRAC implementation that addresses both performance and energy overheads. Unlike prior works focusing on caching activation counts to reduce their overheads, our key idea is to reorder and coalesce accesses to activation counts located in the same physical row. Our design achieves this by decoupling counter access from the critical path of data accesses. This enables optimizations such as buffering counter read-modify-write requests and coalescing requests to the same row. Together, these enable a reduction in row activations for counter accesses by almost 75%-83% compared to state-of-the-art solutions like Chronus and enable a PRAC implementation with negligible slowdown and a minimal dynamic energy overhead of 0.84%-1% compared to insecure DDR5 DRAM.
Related papers
- Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [58.044803442346115]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive computational complexity and memory overhead during inference.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention via delayed bidirectional sparse caching.
arXiv Detail & Related papers (2025-08-04T16:14:03Z) - Spark Transformer: Reactivating Sparsity in FFN and Attention [63.20677098823873]
We introduce Spark Transformer, a novel architecture that achieves a high level of activation sparsity in both FFN and the attention mechanism.<n>This sparsity translates to a 2.5x reduction in FLOPs, leading to decoding wall-time speedups of up to 1.79x on CPU and 1.40x on GPU.
arXiv Detail & Related papers (2025-06-07T03:51:13Z) - Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers [58.98923344096319]
REFORM is a novel inference framework that efficiently handles long contexts through a two-phase approach.<n>It achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length.<n>It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains.
arXiv Detail & Related papers (2025-06-01T23:49:14Z) - When Mitigations Backfire: Timing Channel Attacks and Defense for PRAC-Based RowHammer Mitigations [4.040475373859059]
We present Timing-Safe PRAC (TPRAC), a defense that eliminates PRAC-induced timing channels without compromising RH mitigation efficacy.<n>Our evaluations demonstrate that TPRAC closes timing channels while incurring only 3.4% performance overhead at the RH threshold of 1024.
arXiv Detail & Related papers (2025-05-15T09:28:46Z) - Chronus: Understanding and Securing the Cutting-Edge Industry Solutions to DRAM Read Disturbance [6.220002579079846]
We present the first rigorous security, performance, energy, and cost analyses of the state-of-the-art on-DRAM-die read disturbance mitigation method.<n>We propose a new on-DRAM-die RowHammer mitigation mechanism, Chronus, to address PRAC's two major weaknesses.
arXiv Detail & Related papers (2025-02-18T08:54:49Z) - APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs [81.5049387116454]
We introduce APB, an efficient long-context inference framework.<n>APB uses multi-host approximate attention to enhance prefill speed.<n>APB achieves speeds of up to 9.2x, 4.2x, and 1.6x compared with FlashAttn, RingAttn, and StarAttn, respectively.
arXiv Detail & Related papers (2025-02-17T17:59:56Z) - QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues [4.3423142741332255]
JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs.<n> PRAC enables a holistic mitigation of Rowhammer attacks even at ultra-low Rowhammer thresholds.<n>This paper provides the first secure, scalable, and practical RowHammer solution using the PRAC framework.
arXiv Detail & Related papers (2025-01-31T02:48:20Z) - DAPPER: A Performance-Attack-Resilient Tracker for RowHammer Defense [1.1816942730023883]
RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems.<n>Perf-Attacks exploit shared structures to reduce DRAM bandwidth for co-running benign applications.<n>We propose secure hashing mechanisms to thwart adversarial attempts to capture the mapping of shared structures.
arXiv Detail & Related papers (2025-01-31T02:38:53Z) - Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks.
By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z) - RelayAttention for Efficient Large Language Model Serving with Long System Prompts [59.50256661158862]
This paper aims to improve the efficiency of LLM services that involve long system prompts.
handling these system prompts requires heavily redundant memory accesses in existing causal attention algorithms.
We propose RelayAttention, an attention algorithm that allows reading hidden states from DRAM exactly once for a batch of input tokens.
arXiv Detail & Related papers (2024-02-22T18:58:28Z) - MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using
DRAM Technology [2.918940961856197]
This paper presents MAC-DO, an efficient and low-power DRAM-based in-situ accelerator.
It supports a multi-bit multiply-accumulate (MAC) operation within a single cycle.
A MAC-DO array efficiently can accelerate matrix multiplications based on output stationary mapping, supporting the majority of computations performed in deep neural networks (DNNs)
arXiv Detail & Related papers (2022-07-16T07:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.