Related papers: PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing

PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing

URL: http://arxiv.org/abs/2409.16777v1
Date: Wed, 25 Sep 2024 09:37:50 GMT
Title: PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing
Authors: Mpoki Mwaisela,
Abstract summary: Homomorphic encryption (HE) and secure multi-party computation (SMPC) enhance data security by enabling processing on encrypted data. Existing approaches focus on improving computational overhead using specialized hardware. We propose a framework that uses recently available PIM hardware to achieve efficient privacy-preserving computation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Privacy-preserving computation techniques like homomorphic encryption (HE) and secure multi-party computation (SMPC) enhance data security by enabling processing on encrypted data. However, the significant computational and CPU-DRAM data movement overhead resulting from the underlying cryptographic algorithms impedes the adoption of these techniques in practice. Existing approaches focus on improving computational overhead using specialized hardware like GPUs and FPGAs, but these methods still suffer from the same processor-DRAM overhead. Novel hardware technologies that support in-memory processing have the potential to address this problem. Memory-centric computing, or processing-in-memory (PIM), brings computation closer to data by introducing low-power processors called data processing units (DPUs) into memory. Besides its in-memory computation capability, PIM provides extensive parallelism, resulting in significant performance improvement over state-of-the-art approaches. We propose a framework that uses recently available PIM hardware to achieve efficient privacy-preserving computation. Our design consists of a four-layer architecture: (1) an application layer that decouples privacy-preserving applications from the underlying protocols and hardware; (2) a protocol layer that implements existing secure computation protocols (HE and MPC); (3) a data orchestration layer that leverages data compression techniques to mitigate the data transfer overhead between DPUs and host memory; (4) a computation layer which implements DPU kernels on which secure computation algorithms are built.

Related papers

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems [54.045712360156024]
racetrack memory is a non-volatile technology that allows high data density fabrication.<n>In-memory arithmetic circuits with memory cells affects both the memory density and power efficiency.<n>We present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory.
arXiv Detail & Related papers (2025-07-02T07:29:53Z)
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing [8.114331115730021]
Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data. Many cloud computing applications (e.g., DNA read mapping, biometric matching, web search) use exact string matching as a key operation. Prior string matching algorithms that use homomorphic encryption are limited by high computational latency.
arXiv Detail & Related papers (2025-03-12T00:25:58Z)
Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures [5.565715369147691]
Processing-in-Memory (PIM) promises to substantially improve performance by moving processing closer to the data. Integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs) This paperleverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM.
arXiv Detail & Related papers (2025-01-28T20:48:14Z)
Evaluating the Potential of In-Memory Processing to Accelerate Homomorphic Encryption [1.5707609236065612]
homomorphic encryption (HE) allows computation without the need for decryption. The high computational and memory overhead associated with the underlying cryptographic operations has hindered the practicality of HE-based solutions. processing in-memory (PIM) presents a promising solution to this problem by bringing computation closer to data, thereby reducing the overhead resulting from processor-memory data movements.
arXiv Detail & Related papers (2024-12-12T10:28:58Z)
Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation [7.204881999658682]
Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value ( KV) caching is used to store intermediate activations. The memory required for KV caching grows rapidly, often exceeding the capacity of GPU memory. A cost-effective alternative is to offload KV cache to CPU memory, which alleviates GPU memory pressure but shifts the bottleneck to the limited bandwidth of the PCIe connection between the CPU and GPU.
arXiv Detail & Related papers (2024-11-26T04:03:14Z)
CHIME: Energy-Efficient STT-RAM-based Concurrent Hierarchical In-Memory Processing [1.5566524830295307]
This paper introduces a novel PiC/PiM architecture, Concurrent Hierarchical In-Memory Processing (CHIME) CHIME strategically incorporates heterogeneous compute units across multiple levels of the memory hierarchy. Experiments reveal that, compared to the state-of-the-art bit-line computing approaches, CHIME achieves significant speedup and energy savings of 57.95% and 78.23%.
arXiv Detail & Related papers (2024-07-29T01:17:54Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
SOCI^+: An Enhanced Toolkit for Secure OutsourcedComputation on Integers [50.608828039206365]
We propose SOCI+ which significantly improves the performance of SOCI. SOCI+ employs a novel (2, 2)-threshold Paillier cryptosystem with fast encryption and decryption as its cryptographic primitive. Compared with SOCI, our experimental evaluation shows that SOCI+ is up to 5.4 times more efficient in computation and 40% less in communication overhead.
arXiv Detail & Related papers (2023-09-27T05:19:32Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications. In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle. The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.