How Would Oblivious Memory Boost Graph Analytics on Trusted Processors?
- URL: http://arxiv.org/abs/2512.24255v1
- Date: Tue, 30 Dec 2025 14:28:29 GMT
- Title: How Would Oblivious Memory Boost Graph Analytics on Trusted Processors?
- Authors: Jiping Yu, Xiaowei Zhu, Kun Chen, Guanyu Feng, Yunyi Chen, Xiaoyu Fan, Wenguang Chen,
- Abstract summary: We focus on graph analytics, an important application vulnerable to access-pattern attacks.<n>With a co-design between storage structure and algorithms, our prototype system is 100x faster than baselines given an OM sized around the per-core cache.<n>This gives insights into equipping trusted processors with OM.
- Score: 8.661898399197089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Trusted processors provide a way to perform joint computations while preserving data privacy. To overcome the performance degradation caused by data-oblivious algorithms to prevent information leakage, we explore the benefits of oblivious memory (OM) integrated in processors, to which the accesses are unobservable by adversaries. We focus on graph analytics, an important application vulnerable to access-pattern attacks. With a co-design between storage structure and algorithms, our prototype system is 100x faster than baselines given an OM sized around the per-core cache which can be implemented on existing processors with negligible overhead. This gives insights into equipping trusted processors with OM.
Related papers
- HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode Architectures [20.525243835887558]
SuperNode represents data movement using cache operators within the compiler.<n>We implement SuperNode within the production deep learning framework MindSpore.<n>We show that SuperNode reduces peak device memory usage by up to 26% for inference while maintaining end-to-end performance.
arXiv Detail & Related papers (2026-01-31T14:29:13Z) - Fast SAM2 with Text-Driven Token Pruning [52.8350457627401]
Segment Anything Model 2 (SAM2), a vision computation model has significantly advanced in prompt-driven video object segmentation.<n>SAM2 pipelines propagate all visual tokens produced by the image encoder through downstream temporal reasoning modules, regardless of their relevance to the target object.<n>We introduce a text-guided token pruning framework that improves inference efficiency by selectively reducing token density prior to temporal propagation.
arXiv Detail & Related papers (2025-12-24T18:59:05Z) - Optimized Memory Tagging on AmpereOne Processors [0.0]
The Memory Tagging Extension (MTE) to the ARM AArch64 Instruction Set Architecture is a valuable tool to address memory-safety escapes.<n>This paper analyzes the complete hardware-software stack, identifying application memory management as the primary remaining source of overhead.
arXiv Detail & Related papers (2025-11-21T20:39:31Z) - Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [72.27673320976933]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding.<n>Current caching techniques accelerate decoding by storing full-layer states, yet impose substantial memory usage.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention.
arXiv Detail & Related papers (2025-08-04T16:14:03Z) - Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures [5.565715369147691]
Processing-in-Memory (PIM) promises to substantially improve performance by moving processing closer to the data.<n>Integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs)<n>This paperleverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM.
arXiv Detail & Related papers (2025-01-28T20:48:14Z) - PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing [0.0]
Homomorphic encryption (HE) and secure multi-party computation (SMPC) enhance data security by enabling processing on encrypted data.
Existing approaches focus on improving computational overhead using specialized hardware.
We propose a framework that uses recently available PIM hardware to achieve efficient privacy-preserving computation.
arXiv Detail & Related papers (2024-09-25T09:37:50Z) - vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - Revisiting Main Memory-Based Covert and Side Channel Attacks in the Context of Processing-in-Memory [6.709670986126109]
IMPACT is a set of main memory-based timing attacks that leverage characteristics of processing-in-memory (PiM) architectures.<n>We build two covert channels that leverage different PiM approaches.<n>Our covert channels achieve 8.2 Mb/s and 14.8 Mb/s communication throughput, respectively, which is 3.6x and 6.5x higher than the state-of-the-art main memory-based covert channel.
arXiv Detail & Related papers (2024-04-17T11:48:14Z) - Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation.
We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z) - Memory Safe Computations with XLA Compiler [14.510796427699459]
XLA compiler extension adjusts the representation of an algorithm according to a user-specified memory limit.
We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device.
arXiv Detail & Related papers (2022-06-28T16:59:28Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field.
We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem.
We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.