Related papers: Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations

Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations

URL: http://arxiv.org/abs/2404.11284v3
Date: Thu, 10 Oct 2024 16:57:34 GMT
Title: Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations
Authors: Konstantinos Kanellopoulos, F. Nisa Bostanci, Ataberk Olgun, A. Giray Yaglikci, Ismail Emir Yuksel, Nika Mansouri Ghiasi, Zulal Bingol, Mohammad Sadrosadati, Onur Mutlu,
Abstract summary: We show that processing-in-memory (PiM) solutions provide a new way to directly access main memory, which malicious user applications can exploit. We introduce IMPACT, a set of high- throughput main memory-based timing attacks that leverage characteristics of PiM architectures to establish covert and side channels. Our results demonstrate that our covert channels achieve 12.87 Mb/s and 14.16 Mb/s communication throughput, respectively, which is up to 4.91x and 5.41x faster than the state-of-the-art main memory-based covert channels.
Score: 6.709670986126109
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The adoption of processing-in-memory (PiM) architectures has been gaining momentum because they provide high performance and low energy consumption by alleviating the data movement bottleneck. Yet, the security of such architectures has not been thoroughly explored. The adoption of PiM solutions provides a new way to directly access main memory, which malicious user applications can exploit. We show that this new way to access main memory opens opportunities for high-throughput timing attacks that are hard-to-mitigate without significant performance overhead. We introduce IMPACT, a set of high-throughput main memory-based timing attacks that leverage characteristics of PiM architectures to establish covert and side channels. IMPACT enables high-throughput communication and private information leakage by exploiting the shared DRAM row buffer. To achieve high throughput, IMPACT (i) eliminates cache bypassing steps required by processor-centric main memory and cache-based timing attacks and (ii) leverages the intrinsic parallelism of PiM operations. We showcase two applications of IMPACT. First, we build two covert-channel attacks that run on the host CPU and leverage different PiM approaches to gain direct and fast access to main memory and establish high-throughput communication covert channels. Second, we showcase a side-channel attack that leaks private information of concurrently running victim applications that are accelerated with PiM. Our results demonstrate that (i) our covert channels achieve 12.87 Mb/s and 14.16 Mb/s communication throughput, respectively, which is up to 4.91x and 5.41x faster than the state-of-the-art main memory-based covert channels, and (ii) our side-channel attack allows the attacker to leak secrets with a low error rate. To avoid such covert and side channels in emerging PiM systems, we propose and evaluate three defenses.

Related papers

Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching [12.993197799897532]
Large Language Models (LLMs) exhibit pronounced memory-bound characteristics during inference due to High Bandwidth Memory (HBM) bandwidth constraints. We propose an L2 Cache-oriented asynchronous KV Cache prefetching method to break through the memory bandwidth bottleneck in LLM inference through computation-load overlap.
arXiv Detail & Related papers (2025-04-08T09:17:35Z)
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs [81.5049387116454]
We introduce APB, an efficient long-context inference framework. APB uses multi-host approximate attention to enhance prefill speed. APB achieves speeds of up to 9.2x, 4.2x, and 1.6x compared with FlashAttn, RingAttn, and StarAttn, respectively.
arXiv Detail & Related papers (2025-02-17T17:59:56Z)
Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures [5.565715369147691]
Processing-in-Memory (PIM) promises to substantially improve performance by moving processing closer to the data. Integrating PIM modules within a secure computing system raises an interesting challenge: unencrypted data has to move off-chip to the PIM, exposing the data to attackers and breaking assumptions on Trusted Computing Bases (TCBs) This paperleverages multi-party computation (MPC) techniques, specifically arithmetic secret sharing and Yao's garbled circuits, to outsource bandwidth-intensive computation securely to PIM.
arXiv Detail & Related papers (2025-01-28T20:48:14Z)
MC3: Memory Contention based Covert Channel Communication on Shared DRAM System-on-Chips [1.369703525384921]
We introduce a new memory-contention based covert communication attack, MC3. It achieves high throughput communication between applications running on CPU and GPU without the need for an LLC or elevated access to the system. We demonstrate the utility of MC3 on NVIDIA Orin AGX, Orin NX, and Orin Nano up to a transmit rate of 6.4 kbps with less than 1% error rate.
arXiv Detail & Related papers (2024-12-06T17:58:57Z)
LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention. For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z)
MeMoir: A Software-Driven Covert Channel based on Memory Usage [7.424928818440549]
MeMoir is a novel software-driven covert channel that, for the first time, utilizes memory usage as the medium for the channel. We implement a machine learning-based detector that can predict whether an attack is present in the system with an accuracy of more than 95%. We introduce a noise-based countermeasure that effectively mitigates the attack while inducing a low power overhead in the system.
arXiv Detail & Related papers (2024-09-20T08:10:36Z)
vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z)
SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction [6.800641017055453]
This paper introduces weight and activation eviction mechanisms to off-chip memory along the computational pipeline. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks.
arXiv Detail & Related papers (2024-03-27T18:12:24Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z)
Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution [11.336229510791481]
We discuss a novel execution paradigm for microcontroller deep learning. It modifies the execution of neural networks to avoid materialising full buffers in memory. This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
arXiv Detail & Related papers (2022-11-30T18:47:30Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
PIM-DRAM:Accelerating Machine Learning Workloads using Processing in Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads. We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z)
Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency. We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector. We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z)
GradPIM: A Practical Processing-in-DRAM Architecture for Gradient Descent [17.798991516056454]
We present GradPIM, a processing-in-memory architecture which accelerates parameter updates of deep neural networks training. Extending DDR4 SDRAM to utilize bank-group parallelism makes our operation designs in processing-in-memory (PIM) module efficient in terms of hardware cost and performance.
arXiv Detail & Related papers (2021-02-15T12:25:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.