DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory System
- URL: http://arxiv.org/abs/2602.12433v1
- Date: Thu, 12 Feb 2026 21:45:15 GMT
- Title: DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory System
- Authors: Niklas Klinger, Jonas Sander, Peterson Yuhala, Pascal Felber, Thomas Eisenbarth,
- Abstract summary: Homomorphic encryption (HE) is a promising technology for confidential cloud computing.<n> processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module.<n>We present DRAMatic, which implements operations foundational to HE on UPMEM's programmable, general-purpose PIM system.
- Score: 4.464102544889846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationally expensive and often memory-bound on conventional computer architectures. Processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module. PIM enables higher memory bandwidth than conventional architectures and could thus be suitable for accelerating HE. In this work, we present DRAMatic, which implements operations foundational to HE on UPMEM's programmable, general-purpose PIM system, and evaluate its performance. DRAMatic incorporates many arithmetic optimizations, including residue number system and number-theoretic transform techniques, and can support the large parameters required for secure homomorphic evaluations. To compare performance, we evaluate DRAMatic against Microsoft SEAL, a popular open-source HE library, regarding both runtime and energy efficiency. The results show that DRAMatic significantly closes the gap between UPMEM PIM and Microsoft SEAL. However, we also show that DRAMatic is currently constrained by UPMEM PIM's multiplication performance and data transfer overhead. Finally, we discuss potential hardware extensions to UPMEM PIM.
Related papers
- From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents [78.30630000529133]
We propose MM-Mem, a pyramidal multimodal memory architecture grounded in Fuzzy-Trace Theory.<n> MM-Mem memory structures hierarchically into a Sensory Buffer, Episodic Stream, and Symbolic.<n>Experiments confirm the effectiveness of MM-Mem on both offline and streaming tasks.
arXiv Detail & Related papers (2026-03-02T05:12:45Z) - Memory Caching: RNNs with Growing Memory [56.25483647131372]
We introduce Memory Caching (MC), a technique that enhances recurrent models by caching checkpoints of memory states (a.k.a. hidden states)<n>We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules.<n>The results indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models.
arXiv Detail & Related papers (2026-02-27T18:53:41Z) - FAME: FPGA Acceleration of Secure Matrix Multiplication with Homomorphic Encryption [11.342625695057281]
Homomorphic Encryption (HE) enables secure computation on encrypted data, addressing privacy concerns in cloud computing.<n>Accelerating homomorphic encrypted MM (HE MM) is therefore crucial for applications such as privacy-preserving machine learning.<n>We present a bandwidth-efficient FPGA implementation of HE MM.<n>FAME is the first FPGA-based accelerator specifically tailored for HE MM.
arXiv Detail & Related papers (2025-12-17T15:09:36Z) - Sangam: Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing [2.9665163298601342]
Inference, particularly the decoding phase, is dominated by memory-bound GEMV or flat GEMM operations.<n>Existing in/near-memory solutions face critical limitations such as reduced memory capacity.<n>This work presents a chiplet-based memory module that addresses these limitations.
arXiv Detail & Related papers (2025-11-15T16:39:51Z) - Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [108.0657508755532]
We introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers.<n>We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs to share memory readout states from a Samba-based self-decoder.
arXiv Detail & Related papers (2025-07-09T07:27:00Z) - MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN
Training and Inference [4.718504401468233]
PIM solutions rely either on novel memory technologies that have yet to mature or bit-serial computations that have significant performance overhead and scalability issues.
Our work proposes an in-SRAM digital multiplier, that uses a conventional memory to perform bit-parallel computations, leveraging multiple wordlines activation.
We then introduce DAISM, an architecture leveraging this multiplier, which achieves up to two orders of magnitude higher area efficiency compared to the SOTA counterparts, with competitive energy efficiency.
arXiv Detail & Related papers (2023-05-12T10:58:21Z) - PIM-DRAM:Accelerating Machine Learning Workloads using Processing in
Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads.
We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z) - GradPIM: A Practical Processing-in-DRAM Architecture for Gradient
Descent [17.798991516056454]
We present GradPIM, a processing-in-memory architecture which accelerates parameter updates of deep neural networks training.
Extending DDR4 SDRAM to utilize bank-group parallelism makes our operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance.
arXiv Detail & Related papers (2021-02-15T12:25:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.