IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements
- URL: http://arxiv.org/abs/2512.01574v1
- Date: Mon, 01 Dec 2025 11:47:26 GMT
- Title: IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements
- Authors: Sangpyo Kim, Hyesung Ji, Jongmin Kim, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn,
- Abstract summary: IVE is an accelerator for single-server PIR with a systematic extension that enables retrieval from large databases using DRAM.<n>IVE achieves up to 1,275x higher throughput compared to prior PIR hardware solutions.
- Score: 4.085063539485129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private information retrieval (PIR) is an essential cryptographic protocol for privacy-preserving applications, enabling a client to retrieve a record from a server's database without revealing which record was requested. Single-server PIR based on homomorphic encryption has particularly gained immense attention for its ease of deployment and reduced trust assumptions. However, single-server PIR remains impractical due to its high computational and memory bandwidth demands. Specifically, reading the entirety of large databases from storage, such as SSDs, severely limits its performance. To address this, we propose IVE, an accelerator for single-server PIR with a systematic extension that enables practical retrieval from large databases using DRAM. Recent advances in DRAM capacity allow PIR for large databases to be served entirely from DRAM, removing its dependence on storage bandwidth. Although the memory bandwidth bottleneck still remains, multi-client batching effectively amortizes database access costs across concurrent requests to improve throughput. However, client-specific data remains a bottleneck, whose bandwidth requirements ultimately limits performance. IVE overcomes this by employing a large on-chip scratchpad with an operation scheduling algorithm that maximizes data reuse, further boosting throughput. Additionally, we introduce sysNTTU, a versatile functional unit that enhances area efficiency without sacrificing performance. We also propose a heterogeneous memory system architecture, which enables a linear scaling of database sizes without a throughput degradation. Consequently, IVE achieves up to 1,275x higher throughput compared to prior PIR hardware solutions.
Related papers
- SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses [59.68732483257323]
Memory overload is a common form of resource exhaustion in cloud data warehouses.<n>We propose SafeLoad, the first query admission control framework specifically designed to identify memory-overloading (MO) queries.<n>We show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead.
arXiv Detail & Related papers (2026-01-05T08:29:51Z) - Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing [9.984481065465028]
Large Language Models (LLMs) on edge devices are crucial for reducing latency, improving real-time processing, and enhancing privacy.<n> implementing LLMs on edge devices presents challenges, particularly with managing key-value caches.<n>We propose eDRAM as the primary storage for LLM serving in edge device, which offers higher density compared to storage.
arXiv Detail & Related papers (2025-10-16T07:12:08Z) - AlDBaran: Towards Blazingly Fast State Commitments for Blockchains [52.39305978984572]
AlDBaran is an authenticated data structure capable of handling state updates efficiently at a network throughput of 50 Gbps.<n>AlDBaran provides support for historical state proofs, which facilitates a wide array of novel applications.<n>On consumer-level portable hardware, it achieves approximately 8 million updates/s in an in-memory setting and 5 million updates/s with snapshots at sub-second intervals.
arXiv Detail & Related papers (2025-08-14T09:52:15Z) - REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing [8.574396262432522]
Large Language Models (LLMs) face an inherent challenge: their knowledge is confined to the data that they have been trained on.<n>Retrieval-Augmented Generation (RAG) complements the static training-derived knowledge of LLMs with an external knowledge repository.<n>We propose REIS, the first ISP system tailored for RAG that addresses these limitations with three key mechanisms.
arXiv Detail & Related papers (2025-06-19T16:26:51Z) - Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine [0.0]
SPiME is a lightweight, scalable, and FPGA-compatible Secure Processor-in-Memory Encryption architecture.<n>It integrates the Advanced Encryption Standard (AES-128) directly into a Processing-in-Memory framework.<n>It delivers over 25Gbps in sustained encryption throughput with predictable, low-latency performance.
arXiv Detail & Related papers (2025-06-18T02:25:04Z) - Versatile and Fast Location-Based Private Information Retrieval with Fully Homomorphic Encryption over the Torus [4.021179028452984]
We present VeLoPIR, a versatile location-based private information retrieval (PIR) system designed to preserve user privacy.<n>VeLoPIR introduces three operational modes-interval validation, coordinate validation, and identifier matching-that support a broad range of real-world applications.<n>We provide formal security and privacy proofs, confirming the system's robustness under standard cryptographic assumptions.
arXiv Detail & Related papers (2025-06-15T08:01:35Z) - PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts [59.5243730853157]
Large language models (LLMs) hosted on cloud servers alleviate the computational and storage burdens on local devices but raise privacy concerns.<n>Small language models (SLMs) running locally enhance privacy but suffer from limited performance on complex tasks.<n>We propose a privacy-aware wireless collaborative mixture of experts (PWC-MoE) framework to balance computational cost, performance, and privacy protection under bandwidth constraints.
arXiv Detail & Related papers (2025-05-13T16:27:07Z) - SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models [17.602518628415776]
Deep Learning Recommendation Models (DLRMs) play a crucial role in delivering personalized content across web applications such as social networking and video streaming.<n>With improvements in performance, the parameter size of DLRMs has grown to terabyte (TB) scales, accompanied by memory bandwidth demands exceeding TB/s levels.<n>We propose SCRec, a scalable computational storage recommendation system that can handle TB-scale industrial DLRMs.
arXiv Detail & Related papers (2025-04-01T08:12:45Z) - A Universal Framework for Compressing Embeddings in CTR Prediction [68.27582084015044]
We introduce a Model-agnostic Embedding Compression (MEC) framework that compresses embedding tables by quantizing pre-trained embeddings.<n>Our approach consists of two stages: first, we apply popularity-weighted regularization to balance code distribution between high- and low-frequency features.<n> Experiments on three datasets reveal that our method reduces memory usage by over 50x while maintaining or improving recommendation performance.
arXiv Detail & Related papers (2025-02-21T10:12:34Z) - Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models [87.49293964617128]
Federated domain-specific instruction tuning (FedDIT) for large language models (LLMs) aims to enhance performance in specialized domains using distributed private and limited data.<n>We empirically establish that cross-client domain coverage, rather than data heterogeneity, is the pivotal factor.<n>We introduce FedDCA, an algorithm that explicitly maximizes this coverage through diversity-oriented client center selection and retrieval-based augmentation.
arXiv Detail & Related papers (2024-09-30T09:34:31Z) - Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks.
By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z) - RelayAttention for Efficient Large Language Model Serving with Long System Prompts [59.50256661158862]
This paper aims to improve the efficiency of LLM services that involve long system prompts.
handling these system prompts requires heavily redundant memory accesses in existing causal attention algorithms.
We propose RelayAttention, an attention algorithm that allows reading hidden states from DRAM exactly once for a batch of input tokens.
arXiv Detail & Related papers (2024-02-22T18:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.