Related papers: Breaking Diffusion with Cache: Exploiting Approximate Caches in Diffusion Models

Breaking Diffusion with Cache: Exploiting Approximate Caches in Diffusion Models

URL: http://arxiv.org/abs/2508.20424v1
Date: Thu, 28 Aug 2025 04:46:44 GMT
Title: Breaking Diffusion with Cache: Exploiting Approximate Caches in Diffusion Models
Authors: Desen Sun, Shuncheng Jie, Sihang Liu,
Abstract summary: We introduce a prompt stealing attack using the cache, where an attacker can recover existing cached prompts based on cache hit prompts.<n>We introduce a poisoning attack that embeds the attacker's logos into the previously stolen prompt, to render them in future user prompts that hit the cache.<n>These attacks are all performed remotely through the serving system, which indicates severe security vulnerabilities in approximate caching.
Score: 1.399348653165494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models are a powerful class of generative models that produce content, such as images, from user prompts, but they are computationally intensive. To mitigate this cost, recent academic and industry work has adopted approximate caching, which reuses intermediate states from similar prompts in a cache. While efficient, this optimization introduces new security risks by breaking isolation among users. This work aims to comprehensively assess new security vulnerabilities arising from approximate caching. First, we demonstrate a remote covert channel established with the cache, where a sender injects prompts with special keywords into the cache and a receiver can recover that even after days, to exchange information. Second, we introduce a prompt stealing attack using the cache, where an attacker can recover existing cached prompts based on cache hit prompts. Finally, we introduce a poisoning attack that embeds the attacker's logos into the previously stolen prompt, to render them in future user prompts that hit the cache. These attacks are all performed remotely through the serving system, which indicates severe security vulnerabilities in approximate caching.

Related papers

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z)
DiCache: Let Diffusion Model Determine Its Own Cache [63.73224201922458]
We present DiCache, a training-free adaptive caching strategy for accelerating diffusion models at runtime.<n>Online Probe Profiling Scheme leverages a shallow-layer online probe to obtain a stable prior for the caching error in real time.<n> Dynamic Cache Trajectory Alignment combines multi-step caches based on shallow-layer probe feature trajectory to better approximate the current feature.
arXiv Detail & Related papers (2025-08-24T13:30:00Z)
Auditing Prompt Caching in Language Model APIs [77.02079451561718]
We investigate the privacy leakage caused by prompt caching in large language models (LLMs)<n>We detect global cache sharing across users in seven API providers, including OpenAI.<n>We find evidence that OpenAI's embedding model is a decoder-only Transformer, which was previously not publicly known.
arXiv Detail & Related papers (2025-02-11T18:58:04Z)
vCache: Verified Semantic Prompt Caching [75.87215136638828]
This paper proposes vCache, the first verified semantic cache with user-defined error rate guarantees.<n>It employs an online learning algorithm to estimate an optimal threshold for each cached prompt, enabling reliable cache responses without additional training.<n>Our experiments show that vCache consistently meets the specified error bounds while outperforming state-of-the-art static-threshold and fine-tuned embedding baselines.
arXiv Detail & Related papers (2025-02-06T04:16:20Z)
RollingCache: Using Runtime Behavior to Defend Against Cache Side Channel Attacks [2.9221371172659616]
We present RollingCache, a cache design that defends against contention attacks by dynamically changing the set of addresses contending for cache sets. RollingCache does not rely on address encryption/decryption, data relocation, or cache partitioning. Our solution does not depend on having defined security domains, and can defend against an attacker running on the same or another core.
arXiv Detail & Related papers (2024-08-16T15:11:12Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Hidden Web Caches Discovery [3.9272151228741716]
This paper presents a novel methodology for cache detection using timing analysis. Our approach eliminates the dependency on cache status headers, making it applicable to any web server.
arXiv Detail & Related papers (2024-07-23T08:58:06Z)
CacheSquash: Making caches speculation-aware [11.499924192220274]
Speculation is key to achieving high CPU performance, yet it enables risks like Spectre attacks.<n>We propose a novel mitigation, CacheSquash, that cancels mis-speculated memory accesses.<n>We implement CacheSquash on gem5 and show that it thwarts practical Spectre attacks, with near-zero performance overheads.
arXiv Detail & Related papers (2024-06-17T21:43:39Z)
Systematic Evaluation of Randomized Cache Designs against Cache Occupancy [11.018866935621045]
This work fills in a crucial gap in current literature on randomized caches.<n>Most randomized cache designs defend only contention-based attacks, and leave out considerations of cache occupancy.<n>Our results establish the need to also consider cache occupancy side-channel in randomized cache design considerations.
arXiv Detail & Related papers (2023-10-08T14:06:06Z)
Random and Safe Cache Architecture to Defeat Cache Timing Attacks [5.142233612851766]
Caches have been exploited to leak secret information due to the different times they take to handle memory accesses. We present a systematic view of the attack and defense space and show that no existing defense has addressed all cache timing attacks. We propose Random and Safe (RaS) cache architectures to decorrelate cache state changes from memory requests.
arXiv Detail & Related papers (2023-09-28T05:08:16Z)
BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions [7.46215723037597]
L1 data cache attacks pose a significant privacy and confidentiality threat. BackCache always achieves cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions.
arXiv Detail & Related papers (2023-04-20T12:47:11Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.