Related papers: InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks

InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks

URL: http://arxiv.org/abs/2411.18191v2
Date: Fri, 29 Nov 2024 08:33:49 GMT
Title: InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks
Authors: Xinyao Zheng, Husheng Han, Shangyi Shi, Qiyan Fang, Zidong Du, Xing Hu, Qi Guo,
Abstract summary: Large language models (LLMs) possess extensive knowledge and question-answering capabilities.<n> cache-sharing methods are commonly employed to enhance efficiency by reusing cached states or responses for the same or similar inference requests.<n>We propose a novel timing-based side-channel attack to execute input theft in LLMs inference.
Score: 9.748438507132207
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) possess extensive knowledge and question-answering capabilities, having been widely deployed in privacy-sensitive domains like finance and medical consultation. During LLM inferences, cache-sharing methods are commonly employed to enhance efficiency by reusing cached states or responses for the same or similar inference requests. However, we identify that these cache mechanisms pose a risk of private input leakage, as the caching can result in observable variations in response times, making them a strong candidate for a timing-based attack hint. In this study, we propose a novel timing-based side-channel attack to execute input theft in LLMs inference. The cache-based attack faces the challenge of constructing candidate inputs in a large search space to hit and steal cached user queries. To address these challenges, we propose two primary components. The input constructor employs machine learning techniques and LLM-based approaches for vocabulary correlation learning while implementing optimized search mechanisms for generalized input construction. The time analyzer implements statistical time fitting with outlier elimination to identify cache hit patterns, continuously providing feedback to refine the constructor's search strategy. We conduct experiments across two cache mechanisms and the results demonstrate that our approach consistently attains high attack success rates in various applications. Our work highlights the security vulnerabilities associated with performance optimizations, underscoring the necessity of prioritizing privacy and security alongside enhancements in LLM inference.

Related papers

How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance. We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices. One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM. We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z)
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems [26.528288876732617]
A set of new timing side channels can be exploited to infer confidential system prompts and those issued by other users. These vulnerabilities echo security challenges observed in traditional computing systems. We propose a token-by-token search algorithm to efficiently recover shared prompt prefixes in the caches.
arXiv Detail & Related papers (2024-09-30T06:55:00Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
LLMs for Test Input Generation for Semantic Caches [1.8628177380024746]
Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems. At scale, the cost of serving thousands of users increases massively affecting also user experience. We present VaryGen, an approach for using LLMs for test input generation that produces similar questions from unstructured text documents.
arXiv Detail & Related papers (2024-01-16T06:16:33Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets. Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries [29.52075716869515]
Black-box adversarial attack suffers from the high model querying complexity. How to eliminate redundant model queries is rarely explored. We propose a query-efficient approach BufferSearch to effectively attack general intelligent NLP systems.
arXiv Detail & Related papers (2023-10-14T19:49:02Z)
Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation [62.275143240798236]
Video semantic segmentation dataset has limited categories per video. Less than 10% of queries could be matched to receive meaningful gradient updates during VSS training. Our method achieves state-of-the-art performance on the latest challenging VSS benchmark VSPW without bells and whistles.
arXiv Detail & Related papers (2023-09-14T20:31:06Z)
Leakage-Abuse Attacks Against Forward and Backward Private Searchable Symmetric Encryption [13.057964839510596]
Dynamic searchable encryption (DSSE) enables a server to efficiently search and update over encrypted files. To minimize the leakage during updates, a security notion named forward and backward privacy is expected for newly proposed DSSE schemes. It remains underexplored whether forward and backward private DSSE is resilient against practical leakage-abuse attacks (LAAs)
arXiv Detail & Related papers (2023-09-09T06:39:35Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.