Related papers: Cancellable Memory Requests: A transparent, lightweight Spectre mitigation

Cancellable Memory Requests: A transparent, lightweight Spectre mitigation

URL: http://arxiv.org/abs/2406.12110v1
Date: Mon, 17 Jun 2024 21:43:39 GMT
Title: Cancellable Memory Requests: A transparent, lightweight Spectre mitigation
Authors: Hossam ElAtali, N. Asokan,
Abstract summary: Speculation is fundamental to achieving high CPU performance, yet it enables vulnerabilities such as Spectre attacks. We propose a novel mitigation technique, Cancellable Memory Requests (CMR) that cancels mis-speculated memory requests. We show that CMR can completely thwart Spectre attacks in four real-world processors with realistic system configurations.
Score: 11.499924192220274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speculation is fundamental to achieving high CPU performance, yet it enables vulnerabilities such as Spectre attacks, which remain a significant challenge to mitigate without incurring substantial performance overheads. These attacks typically unfold in three steps: they speculatively access sensitive data (access), alter the cache state (transmit), and then utilize a cache timing attack (e.g., Flush+Reload) to extract the secret (receive). Most Spectre attacks exploit a cache timing side channel during the transmit and receive steps. Our key observation is that Spectre attacks do not require the transmit instruction to complete before mis-prediction is detected and mis-speculated instructions are squashed. Instead, it suffices for the instruction to execute and dispatch a request to the memory hierarchy. Responses from memory that arrive after squashing occurs still alter the cache state, including those related to mis-speculated memory accesses. We therefore propose a novel mitigation technique, Cancellable Memory Requests (CMR), that cancels mis-speculated memory requests. Immediately upon squashing, a cancellation is sent to the cache hierarchy, propagating downstream and preventing any changes to caches that have not yet received a response. This reduces the likelihood of cache state changes, thereby reducing the likelihood of Spectre attacks succeeding. We implement CMR on gem5 and show that it thwarts practical Spectre attacks, and has near-zero performance overheads. We show that CMR can completely thwart Spectre attacks in four real-world processors with realistic system configurations.

Related papers

EXAM: Exploiting Exclusive System-Level Cache in Apple M-Series SoCs for Enhanced Cache Occupancy Attacks [2.198430261120653]
Cache occupancy attacks exploit the shared nature of cache hierarchies to infer a victim's activities by monitoring overall cache usage. We propose a suite of SLC-cache occupancy attacks, the first of its kind, where an adversary can monitor GPU and other CPU cluster activities from their own CPU cluster.
arXiv Detail & Related papers (2025-04-18T00:21:00Z)
Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation [2.03921019862868]
We study an interaction between two state-of-the art defenses in this paper. TORC mitigates cache-hit based attacks and DSRC mitigates speculative coherence state change attacks. We demonstrate a new covert channel attack is possible using this vulnerability.
arXiv Detail & Related papers (2025-04-14T15:27:32Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation. DiTs come with significant drawbacks, including increased computational and memory costs. We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
Auditing Prompt Caching in Language Model APIs [77.02079451561718]
We investigate the privacy leakage caused by prompt caching in large language models (LLMs) We detect global cache sharing across users in seven API providers, including OpenAI. We find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.
arXiv Detail & Related papers (2025-02-11T18:58:04Z)
SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts [5.942801930997087]
Self-modifying code (SMC) allows programs to alter their own instructions. SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes.
arXiv Detail & Related papers (2025-02-08T03:35:55Z)
Deliberation in Latent Space via Differentiable Cache Augmentation [48.228222586655484]
We show that a frozen large language model can be augmented with an offline coprocessor that operates on the model's key-value (kv) cache. This coprocessor augments the cache with a set of latent embeddings designed to improve the fidelity of subsequent decoding. We show experimentally that when a cache is augmented, the decoder achieves lower perplexity on numerous subsequent tokens.
arXiv Detail & Related papers (2024-12-23T18:02:25Z)
InstCache: A Predictive Cache for LLM Serving [9.878166964839512]
We propose to predict user-instructions by an instruction-aligned LLM and store them in a predictive cache, so-called InstCache. Experimental results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB.
arXiv Detail & Related papers (2024-11-21T03:52:41Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Training-Free Exponential Context Extension via Cascading KV Cache [49.608367376911694]
We introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens. Our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens.
arXiv Detail & Related papers (2024-06-24T03:59:17Z)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference [57.53291046180288]
Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference. We propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache.
arXiv Detail & Related papers (2024-05-21T06:46:37Z)
EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data. While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated. We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z)
Prime+Retouch: When Cache is Locked and Leaked [8.332926136722296]
Caches on modern commodity CPUs have become one of the major sources of side-channel leakages. To thwart the cache-based side-channel attacks, two types of countermeasures have been proposed. We present the Prime+Retouch attack that completely bypasses these defense schemes.
arXiv Detail & Related papers (2024-02-23T16:34:49Z)
On the Amplification of Cache Occupancy Attacks in Randomized Cache Architectures [11.018866935621045]
We show that MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack. We leverage MIRAGE's global eviction property to demonstrate covert channel with byte-level granularity. We extend our attack vectors to include side-channel, template-based fingerprinting of workloads in a cross-core setting.
arXiv Detail & Related papers (2023-10-08T14:06:06Z)
Random and Safe Cache Architecture to Defeat Cache Timing Attacks [5.142233612851766]
Caches have been exploited to leak secret information due to the different times they take to handle memory accesses. We present a systematic view of the attack and defense space and show that no existing defense has addressed all cache timing attacks. We propose Random and Safe (RaS) cache architectures to decorrelate cache state changes from memory requests.
arXiv Detail & Related papers (2023-09-28T05:08:16Z)
BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions [7.46215723037597]
L1 data cache attacks pose a significant privacy and confidentiality threat. BackCache always achieves cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions.
arXiv Detail & Related papers (2023-04-20T12:47:11Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.