Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
- URL: http://arxiv.org/abs/2404.16251v2
- Date: Fri, 26 Apr 2024 07:47:49 GMT
- Title: Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
- Authors: Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu,
- Abstract summary: leakage in large language models (LLMs) poses a significant security and privacy threat.
leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner.
This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs.
- Score: 125.21418304558948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.
Related papers
- Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference [29.55937864144965]
This study is the first to study safety in multi-turn dialogue coreference in large language models (LLMs)
We created a dataset of 1,400 questions across 14 categories, each featuring multi-turn coreference safety attacks.
The highest attack success rate was 56% with the LLaMA2-Chat-7b model, while the lowest was 13.9% with the Mistral-7B-Instruct model.
arXiv Detail & Related papers (2024-06-25T15:13:02Z) - Increased LLM Vulnerabilities from Fine-tuning and Quantization [0.0]
Large Language Models (LLMs) have become very popular and have found use cases in many domains.
LLMs are vulnerable to different types of attacks, such as jailbreaking, prompt injection attacks, and privacy leakage attacks.
We show that fine-tuning and quantization reduces jailbreak resistance significantly, leading to increased LLM vulnerabilities.
arXiv Detail & Related papers (2024-04-05T20:31:45Z) - Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation [98.02846901473697]
We propose ECSO (Eyes Closed, Safety On), a training-free protecting approach that exploits the inherent safety awareness of MLLMs.
ECSO generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs.
arXiv Detail & Related papers (2024-03-14T17:03:04Z) - A New Era in LLM Security: Exploring Security Concerns in Real-World
LLM-based Systems [47.18371401090435]
We analyze the security of Large Language Model (LLM) systems, instead of focusing on the individual LLMs.
We propose a multi-layer and multi-step approach and apply it to the state-of-art OpenAI GPT4.
We found that although the OpenAI GPT4 has designed numerous safety constraints to improve its safety features, these safety constraints are still vulnerable to attackers.
arXiv Detail & Related papers (2024-02-28T19:00:12Z) - Speak Out of Turn: Safety Vulnerability of Large Language Models in
Multi-turn Dialogue [10.703193963273128]
Large Language Models (LLMs) have been demonstrated to generate illegal or unethical responses.
This paper argues that humans could exploit multi-turn dialogue to induce LLMs into generating harmful information.
arXiv Detail & Related papers (2024-02-27T07:11:59Z) - LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing
LLMs' Vulnerability Reasoning [18.025174693883788]
Large language models (LLMs) have demonstrated significant poten- tial for many downstream tasks, including vulnerability detection.
Recent attempts to use LLMs for vulnerability detection are prelim- inary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability.
We propose a unified evaluation framework named LLM4Vuln, which separates LLMs' vulnerability reasoning from their other capabilities.
arXiv Detail & Related papers (2024-01-29T14:32:27Z) - SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs)
Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z) - On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation.
Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.