Cancellable Memory Requests: A transparent, lightweight Spectre mitigation
- URL: http://arxiv.org/abs/2406.12110v1
- Date: Mon, 17 Jun 2024 21:43:39 GMT
- Title: Cancellable Memory Requests: A transparent, lightweight Spectre mitigation
- Authors: Hossam ElAtali, N. Asokan,
- Abstract summary: Speculation is fundamental to achieving high CPU performance, yet it enables vulnerabilities such as Spectre attacks.
We propose a novel mitigation technique, Cancellable Memory Requests (CMR) that cancels mis-speculated memory requests.
We show that CMR can completely thwart Spectre attacks in four real-world processors with realistic system configurations.
- Score: 11.499924192220274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speculation is fundamental to achieving high CPU performance, yet it enables vulnerabilities such as Spectre attacks, which remain a significant challenge to mitigate without incurring substantial performance overheads. These attacks typically unfold in three steps: they speculatively access sensitive data (access), alter the cache state (transmit), and then utilize a cache timing attack (e.g., Flush+Reload) to extract the secret (receive). Most Spectre attacks exploit a cache timing side channel during the transmit and receive steps. Our key observation is that Spectre attacks do not require the transmit instruction to complete before mis-prediction is detected and mis-speculated instructions are squashed. Instead, it suffices for the instruction to execute and dispatch a request to the memory hierarchy. Responses from memory that arrive after squashing occurs still alter the cache state, including those related to mis-speculated memory accesses. We therefore propose a novel mitigation technique, Cancellable Memory Requests (CMR), that cancels mis-speculated memory requests. Immediately upon squashing, a cancellation is sent to the cache hierarchy, propagating downstream and preventing any changes to caches that have not yet received a response. This reduces the likelihood of cache state changes, thereby reducing the likelihood of Spectre attacks succeeding. We implement CMR on gem5 and show that it thwarts practical Spectre attacks, and has near-zero performance overheads. We show that CMR can completely thwart Spectre attacks in four real-world processors with realistic system configurations.
Related papers
- Auditing Prompt Caching in Language Model APIs [77.02079451561718]
We investigate the privacy leakage caused by prompt caching in large language models (LLMs)
We detect global cache sharing across users in seven API providers, including OpenAI.
We find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.
arXiv Detail & Related papers (2025-02-11T18:58:04Z) - SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts [5.942801930997087]
Self-modifying code (SMC) allows programs to alter their own instructions.
SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes.
arXiv Detail & Related papers (2025-02-08T03:35:55Z) - Deliberation in Latent Space via Differentiable Cache Augmentation [48.228222586655484]
We show that a frozen large language model can be augmented with an offline coprocessor that operates on the model's key-value (kv) cache.
This coprocessor augments the cache with a set of latent embeddings designed to improve the fidelity of subsequent decoding.
We show experimentally that when a cache is augmented, the decoder achieves lower perplexity on numerous subsequent tokens.
arXiv Detail & Related papers (2024-12-23T18:02:25Z) - InstCache: A Predictive Cache for LLM Serving [9.878166964839512]
We propose to predict user-instructions by an instruction-aligned LLM and store them in a predictive cache, so-called InstCache.
Experimental results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB.
arXiv Detail & Related papers (2024-11-21T03:52:41Z) - Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models.
We propose an importance-driven cache merging strategy to prune redundancy caches.
For instruction encoding, we utilize the frequency to evaluate the importance of caches.
Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z) - Training-Free Exponential Context Extension via Cascading KV Cache [49.608367376911694]
We introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens.
Our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens.
arXiv Detail & Related papers (2024-06-24T03:59:17Z) - EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data.
While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated.
We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z) - Prime+Retouch: When Cache is Locked and Leaked [8.332926136722296]
Caches on modern commodity CPUs have become one of the major sources of side-channel leakages.
To thwart the cache-based side-channel attacks, two types of countermeasures have been proposed.
We present the Prime+Retouch attack that completely bypasses these defense schemes.
arXiv Detail & Related papers (2024-02-23T16:34:49Z) - Random and Safe Cache Architecture to Defeat Cache Timing Attacks [5.142233612851766]
Caches have been exploited to leak secret information due to the different times they take to handle memory accesses.
We present a systematic view of the attack and defense space and show that no existing defense has addressed all cache timing attacks.
We propose Random and Safe (RaS) cache architectures to decorrelate cache state changes from memory requests.
arXiv Detail & Related papers (2023-09-28T05:08:16Z) - BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions [7.46215723037597]
L1 data cache attacks pose a significant privacy and confidentiality threat.
BackCache always achieves cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache.
BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions.
arXiv Detail & Related papers (2023-04-20T12:47:11Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.