Related papers: Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design

Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design

URL: http://arxiv.org/abs/2411.05400v1
Date: Fri, 08 Nov 2024 08:31:12 GMT
Title: Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design
Authors: Haojie Ye, Yuchen Xia, Yuhan Chen, Kuan-Yu Chen, Yichao Yuan, Shuwen Deng, Baris Kasikci, Trevor Mudge, Nishil Talati,
Abstract summary: Oblivious RAM (ORAM) hides the memory access patterns, enhancing data privacy by preventing attackers from discovering sensitive information. The performance of ORAM is often limited by its inherent trade-off between security and efficiency. This paper presents Palermo: a protocol- hardware co-design to improve ORAM performance.
Score: 13.353250150074066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Oblivious RAM (ORAM) hides the memory access patterns, enhancing data privacy by preventing attackers from discovering sensitive information based on the sequence of memory accesses. The performance of ORAM is often limited by its inherent trade-off between security and efficiency, as concealing memory access patterns imposes significant computational and memory overhead. While prior works focus on improving the ORAM performance by prefetching and eliminating ORAM requests, we find that their performance is very sensitive to workload locality behavior and incurs additional management overhead caused by the ORAM stash pressure. This paper presents Palermo: a protocol-hardware co-design to improve ORAM performance. The key observation in Palermo is that classical ORAM protocols enforce restrictive dependencies between memory operations that result in low memory bandwidth utilization. Palermo introduces a new protocol that overlaps large portions of memory operations, within a single and between multiple ORAM requests, without breaking correctness and security guarantees. Subsequently, we propose an ORAM controller architecture that executes the proposed protocol to service ORAM requests. The hardware is responsible for concurrently issuing memory requests as well as imposing the necessary dependencies to ensure a consistent view of the ORAM tree across requests. Using a rich workload mix, we demonstrate that Palermo outperforms the RingORAM baseline by 2.8x, on average, incurring a negligible area overhead of 5.78mm^2 (less than 2% in 12th generation Intel CPU after technology scaling) and 2.14W without sacrificing security. We further show that Palermo also outperforms the state-of-the-art works PageORAM, PrORAM, and IR-ORAM.

Related papers

MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models [72.61076288351201]
We propose Memory-efficient Offloaded Mini-sequence Inference (MOM) MOM partitions critical layers into smaller "mini-sequences" and integrates seamlessly with KV cache offloading. On Meta-Llama-3.2-8B, MOM extends the maximum context length from 155k to 455k tokens on a single A100 80GB GPU.
arXiv Detail & Related papers (2025-04-16T23:15:09Z)
Bandwidth-Efficient Two-Server ORAMs with O(1) Client Storage [8.731979518509597]
Two-server ORAM, designed for secure two-party RAM computation, stores data across two non-colluding servers. This paper presents two new client-friendly two-server ORAM schemes that achieve practical logarithmic bandwidth under O(1) local storage.
arXiv Detail & Related papers (2025-03-27T03:37:42Z)
Optimal Offline ORAM with Perfect Security via Simple Oblivious Priority Queues [0.0]
We study the so-called offline ORAM in which the sequence of memory access locations to be hidden is known in advance. We obtain the first optimal offline ORAM with perfect security from oblivious priority queues via time-forward processing. Building on our construction, we additionally present efficient external-memory instantiations of our oblivious, perfectly-secure construction.
arXiv Detail & Related papers (2024-09-18T14:31:33Z)
H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM [14.803814604985957]
Oblivious RAM (ORAM) with Trusted Execution Environments (TEE) has found numerous real-world applications due to their complementary nature. We introduce several new efficient oblivious components to build a high-performance hierarchical O$$RAM (H$$O$RAM) The results indicate that H$$O$RAM reduces execution time by up to $sim 103$ times and saves memory usage by $5sim44$ times compared to stateoftheart solutions.
arXiv Detail & Related papers (2024-09-11T10:31:14Z)
Understanding the Security Benefits and Overheads of Emerging Industry Solutions to DRAM Read Disturbance [6.637143975465625]
Per Row Activation Counting (PRAC) mitigation method described in JEDEC DDR5 specification's April 2024 update. Back-off signal propagates from the DRAM chip to the memory controller. RFM commands are issued when needed as opposed to periodically, reducing RFM's overheads.
arXiv Detail & Related papers (2024-06-27T11:22:46Z)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference [57.53291046180288]
Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference. We propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache.
arXiv Detail & Related papers (2024-05-21T06:46:37Z)
RelayAttention for Efficient Large Language Model Serving with Long System Prompts [59.50256661158862]
This paper aims to improve the efficiency of LLM services that involve long system prompts. handling these system prompts requires heavily redundant memory accesses in existing causal attention algorithms. We propose RelayAttention, an attention algorithm that allows reading hidden states from DRAM exactly once for a batch of input tokens.
arXiv Detail & Related papers (2024-02-22T18:58:28Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z)
Efficient and Error-Resilient Data Access Protocols for a Limited-Sized Quantum Random Access Memory [7.304498344470287]
We focus on the access of larger data sizes without keeping on increasing the size of the QRAM. We propose a novel protocol for loading data with larger word lengths $k$ without increasing the number of QRAM levels $n$. By exploiting the parallelism in the data query process, our protocol achieves a time complexity of $O(n+k)$ and improves error scaling performance.
arXiv Detail & Related papers (2023-03-09T12:21:18Z)
Single Round-trip Hierarchical ORAM via Succinct Indices [5.437298646956505]
Rank ORAM can retrieve data with a single round-trip of communication. A emphcompressed client-side data structure stores, implicitly, the location of each element at the server.
arXiv Detail & Related papers (2022-08-16T01:15:26Z)
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness [80.3586155104237]
FlashAttention is an IO-aware exact attention algorithm for Transformers. It reduces the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip. FlashAttention and block-sparse FlashAttention enable longer context in Transformers.
arXiv Detail & Related papers (2022-05-27T17:53:09Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Parallelising the Queries in Bucket Brigade Quantum RAM [69.43216268165402]
Quantum algorithms often use quantum RAMs (QRAM) for accessing information stored in a database-like manner. We show a systematic method to significantly reduce the effective query time by using Clifford+T gate parallelism. We conclude that, in theory, fault-tolerant bucket brigade quantum RAM queries can be performed approximately with the speed of classical RAM.
arXiv Detail & Related papers (2020-02-21T14:50:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.