Related papers: CacheSquash: Making caches speculation-aware

CacheSquash: Making caches speculation-aware

URL: http://arxiv.org/abs/2406.12110v2
Date: Thu, 08 May 2025 07:55:38 GMT
Title: CacheSquash: Making caches speculation-aware
Authors: Hossam ElAtali, N. Asokan,
Abstract summary: Speculation is key to achieving high CPU performance, yet it enables risks like Spectre attacks.<n>We propose a novel mitigation, CacheSquash, that cancels mis-speculated memory accesses.<n>We implement CacheSquash on gem5 and show that it thwarts practical Spectre attacks, with near-zero performance overheads.
Score: 11.499924192220274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speculation is key to achieving high CPU performance, yet it enables risks like Spectre attacks which remain a significant challenge to mitigate without incurring substantial performance overheads. These attacks typically unfold in three stages: access, transmit, and receive. Typically, they exploit a cache timing side channel during the transmit and receive phases: speculatively accessing sensitive data (access), altering cache state (transmit), and then utilizing a cache timing attack (e.g., Flush+Reload) to extract the secret (receive). Our key observation is that Spectre attacks only require the transmit instruction to execute and dispatch a request to the cache hierarchy. It need not complete before a misprediction is detected (and mis-speculated instructions squashed) because responses from memory that arrive at the cache after squashing still alter cache state. We propose a novel mitigation, CacheSquash, that cancels mis-speculated memory accesses. Immediately upon squashing, a cancellation is sent to the cache hierarchy, propagating downstream and preventing any changes to caches that have not yet received a response. This minimizes cache state changes, thereby reducing the likelihood of Spectre attacks succeeding. We implement CacheSquash on gem5 and show that it thwarts practical Spectre attacks, with near-zero performance overheads.

Related papers

EXAM: Exploiting Exclusive System-Level Cache in Apple M-Series SoCs for Enhanced Cache Occupancy Attacks [2.198430261120653]
Cache occupancy attacks exploit the shared nature of cache hierarchies to infer a victim's activities by monitoring overall cache usage. We propose a suite of SLC-cache occupancy attacks, the first of its kind, where an adversary can monitor GPU and other CPU cluster activities from their own CPU cluster.
arXiv Detail & Related papers (2025-04-18T00:21:00Z)
Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation [2.03921019862868]
We study an interaction between two state-of-the art defenses in this paper. TORC mitigates cache-hit based attacks and DSRC mitigates speculative coherence state change attacks. We demonstrate a new covert channel attack is possible using this vulnerability.
arXiv Detail & Related papers (2025-04-14T15:27:32Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation. DiTs come with significant drawbacks, including increased computational and memory costs. We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
Auditing Prompt Caching in Language Model APIs [77.02079451561718]
We investigate the privacy leakage caused by prompt caching in large language models (LLMs) We detect global cache sharing across users in seven API providers, including OpenAI. We find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.
arXiv Detail & Related papers (2025-02-11T18:58:04Z)
SMaCk: Efficient Instruction Cache Attacks via Self-Modifying Code Conflicts [5.942801930997087]
Self-modifying code (SMC) allows programs to alter their own instructions. SMC introduces unique microarchitectural behaviors that can be exploited for malicious purposes.
arXiv Detail & Related papers (2025-02-08T03:35:55Z)
vCache: Verified Semantic Prompt Caching [75.87215136638828]
This paper proposes vCache, the first verified semantic cache with user-defined error rate guarantees.<n>It employs an online learning algorithm to estimate an optimal threshold for each cached prompt, enabling reliable cache responses without additional training.<n>Our experiments show that vCache consistently meets the specified error bounds while outperforming state-of-the-art static-threshold and fine-tuned embedding baselines.
arXiv Detail & Related papers (2025-02-06T04:16:20Z)
Deliberation in Latent Space via Differentiable Cache Augmentation [48.228222586655484]
We show that a frozen large language model can be augmented with an offline coprocessor that operates on the model's key-value (kv) cache. This coprocessor augments the cache with a set of latent embeddings designed to improve the fidelity of subsequent decoding. We show experimentally that when a cache is augmented, the decoder achieves lower perplexity on numerous subsequent tokens.
arXiv Detail & Related papers (2024-12-23T18:02:25Z)
InstCache: A Predictive Cache for LLM Serving [9.878166964839512]
We propose to predict user-instructions by an instruction-aligned LLM and store them in a predictive cache, so-called InstCache. Experimental results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB.
arXiv Detail & Related papers (2024-11-21T03:52:41Z)
RollingCache: Using Runtime Behavior to Defend Against Cache Side Channel Attacks [2.9221371172659616]
We present RollingCache, a cache design that defends against contention attacks by dynamically changing the set of addresses contending for cache sets. RollingCache does not rely on address encryption/decryption, data relocation, or cache partitioning. Our solution does not depend on having defined security domains, and can defend against an attacker running on the same or another core.
arXiv Detail & Related papers (2024-08-16T15:11:12Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Hidden Web Caches Discovery [3.9272151228741716]
This paper presents a novel methodology for cache detection using timing analysis. Our approach eliminates the dependency on cache status headers, making it applicable to any web server.
arXiv Detail & Related papers (2024-07-23T08:58:06Z)
Training-Free Exponential Context Extension via Cascading KV Cache [49.608367376911694]
We introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens. Our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens.
arXiv Detail & Related papers (2024-06-24T03:59:17Z)
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference [57.53291046180288]
Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference. We propose PyramidInfer, a method that compresses the KV cache by layer-wise retaining crucial context. PyramidInfer improves 2.2x throughput compared to Accelerate with over 54% GPU memory reduction in KV cache.
arXiv Detail & Related papers (2024-05-21T06:46:37Z)
EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data. While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated. We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z)
Prime+Retouch: When Cache is Locked and Leaked [8.332926136722296]
Caches on modern commodity CPUs have become one of the major sources of side-channel leakages. To thwart the cache-based side-channel attacks, two types of countermeasures have been proposed. We present the Prime+Retouch attack that completely bypasses these defense schemes.
arXiv Detail & Related papers (2024-02-23T16:34:49Z)
On the Amplification of Cache Occupancy Attacks in Randomized Cache Architectures [11.018866935621045]
We show that MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack. We leverage MIRAGE's global eviction property to demonstrate covert channel with byte-level granularity. We extend our attack vectors to include side-channel, template-based fingerprinting of workloads in a cross-core setting.
arXiv Detail & Related papers (2023-10-08T14:06:06Z)
Random and Safe Cache Architecture to Defeat Cache Timing Attacks [5.142233612851766]
Caches have been exploited to leak secret information due to the different times they take to handle memory accesses. We present a systematic view of the attack and defense space and show that no existing defense has addressed all cache timing attacks. We propose Random and Safe (RaS) cache architectures to decorrelate cache state changes from memory requests.
arXiv Detail & Related papers (2023-09-28T05:08:16Z)
BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions [7.46215723037597]
L1 data cache attacks pose a significant privacy and confidentiality threat. BackCache always achieves cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions.
arXiv Detail & Related papers (2023-04-20T12:47:11Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.