Related papers: H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM

H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM

URL: http://arxiv.org/abs/2409.07167v1
Date: Wed, 11 Sep 2024 10:31:14 GMT
Title: H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM
Authors: Leqian Zheng, Zheng Zhang, Wentao Dong, Yao Zhang, Ye Wu, Cong Wang,
Abstract summary: Oblivious RAM (ORAM) with Trusted Execution Environments (TEE) has found numerous real-world applications due to their complementary nature. We introduce several new efficient oblivious components to build a high-performance hierarchical O$$RAM (H$$O$RAM) The results indicate that H$$O$RAM reduces execution time by up to $sim 103$ times and saves memory usage by $5sim44$ times compared to stateoftheart solutions.
Score: 14.803814604985957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The combination of Oblivious RAM (ORAM) with Trusted Execution Environments (TEE) has found numerous real-world applications due to their complementary nature. TEEs alleviate the performance bottlenecks of ORAM, such as network bandwidth and roundtrip latency, and ORAM provides general-purpose protection for TEE applications against attacks exploiting memory access patterns. The defining property of this combination, which sets it apart from traditional ORAM designs, is its ability to ensure that memory accesses, both inside and outside of TEEs, are made oblivious, thus termed doubly oblivious RAM (O$_2$RAM). Efforts to develop O$_2$RAM with enhanced performance are ongoing. In this work, we propose H$_2$O$_2$RAM, a high-performance doubly oblivious RAM construction. The distinguishing feature of our approach, compared to the existing tree-based doubly oblivious designs, is its first adoption of the hierarchical framework that enjoys inherently better data locality and parallelization. While the latest hierarchical solution, FutORAMa, achieves concrete efficiency in the classic client-server model by leveraging a relaxed assumption of sublinear-sized client-side private memory, adapting it to our scenario poses challenges due to the conflict between this relaxed assumption and our doubly oblivious requirement. To this end, we introduce several new efficient oblivious components to build a high-performance hierarchical O$_2$RAM (H$_2$O$_2$RAM). We implement our design and evaluate it on various scenarios. The results indicate that H$_2$O$_2$RAM reduces execution time by up to $\sim 10^3$ times and saves memory usage by $5\sim44$ times compared to state-of-the-art solutions.

Related papers

Memory Layers at Scale [67.00854080570979]
This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the budget, as well as mixture-of-expert models when matched for both compute and parameters. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.
arXiv Detail & Related papers (2024-12-12T23:56:57Z)
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training [51.39495282347475]
We introduce $textttFRUGAL$ ($textbfF$ull-$textbfR$ank $textbfU$pdates with $textbfG$r$textbfA$dient sp$textbfL$itting, a new memory-efficient optimization framework. Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam.
arXiv Detail & Related papers (2024-11-12T14:41:07Z)
Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design [13.353250150074066]
Oblivious RAM (ORAM) hides the memory access patterns, enhancing data privacy by preventing attackers from discovering sensitive information. The performance of ORAM is often limited by its inherent trade-off between security and efficiency. This paper presents Palermo: a protocol- hardware co-design to improve ORAM performance.
arXiv Detail & Related papers (2024-11-08T08:31:12Z)
Optimal Offline ORAM with Perfect Security via Simple Oblivious Priority Queues [0.0]
We study the so-called offline ORAM in which the sequence of memory access locations to be hidden is known in advance. We obtain the first optimal offline ORAM with perfect security from oblivious priority queues via time-forward processing. Building on our construction, we additionally present efficient external-memory instantiations of our oblivious, perfectly-secure construction.
arXiv Detail & Related papers (2024-09-18T14:31:33Z)
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module. B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z)
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z)
Single Round-trip Hierarchical ORAM via Succinct Indices [5.437298646956505]
Rank ORAM can retrieve data with a single round-trip of communication. A emphcompressed client-side data structure stores, implicitly, the location of each element at the server.
arXiv Detail & Related papers (2022-08-16T01:15:26Z)
Efficient Deep Learning Using Non-Volatile Memory Technology [12.866655564742889]
We present DeepNVM++, a comprehensive framework to characterize, model, and analyze NVM-based caches in architectures for deep learning (DL) applications. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional cache. DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in
arXiv Detail & Related papers (2022-06-27T19:27:57Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning [11.228806840123084]
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional technologies. In this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in deep learning (DL) applications.
arXiv Detail & Related papers (2020-12-08T16:53:25Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
Parallelising the Queries in Bucket Brigade Quantum RAM [69.43216268165402]
Quantum algorithms often use quantum RAMs (QRAM) for accessing information stored in a database-like manner. We show a systematic method to significantly reduce the effective query time by using Clifford+T gate parallelism. We conclude that, in theory, fault-tolerant bucket brigade quantum RAM queries can be performed approximately with the speed of classical RAM.
arXiv Detail & Related papers (2020-02-21T14:50:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.