Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
- URL: http://arxiv.org/abs/2403.04786v2
- Date: Sat, 23 Mar 2024 05:21:49 GMT
- Title: Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
- Authors: Arijit Ghosh Chowdhury, Md Mofijul Islam, Vaibhav Kumar, Faysal Hossain Shezan, Vaibhav Kumar, Vinija Jain, Aman Chadha,
- Abstract summary: Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP)
This paper presents a comprehensive survey of the various forms of attacks targeting LLMs.
We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.
- Score: 18.624280305864804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP), offering transformative capabilities in understanding and generating human-like text. However, with their rising prominence, the security and vulnerability aspects of these models have garnered significant attention. This paper presents a comprehensive survey of the various forms of attacks targeting LLMs, discussing the nature and mechanisms of these attacks, their potential impacts, and current defense strategies. We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation. The paper also explores the effectiveness of different attack methodologies, the resilience of LLMs against these attacks, and the implications for model integrity and user trust. By examining the latest research, we provide insights into the current landscape of LLM vulnerabilities and defense mechanisms. Our objective is to offer a nuanced understanding of LLM attacks, foster awareness within the AI community, and inspire robust solutions to mitigate these risks in future developments.
Related papers
- HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment [1.8843687952462742]
This paper aims to address gaps in the current literature on jailbreaking techniques and the evaluation of LLM vulnerabilities.
Our contributions include the creation of a novel dataset designed to assess the harmfulness of model outputs across multiple harm levels.
We provide a comprehensive benchmark of state-of-the-art jailbreaking attacks, specifically targeting the Vicuna 13B v1.5 model.
arXiv Detail & Related papers (2024-11-11T10:02:49Z) - Recent Advances in Attack and Defense Approaches of Large Language Models [27.271665614205034]
Large Language Models (LLMs) have revolutionized artificial intelligence and machine learning through their advanced text processing and generating capabilities.
Their widespread deployment has raised significant safety and reliability concerns.
This paper reviews current research on LLM vulnerabilities and threats, and evaluates the effectiveness of contemporary defense mechanisms.
arXiv Detail & Related papers (2024-09-05T06:31:37Z) - Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.
LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model.
We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Defending Large Language Models Against Attacks With Residual Stream Activation Analysis [0.0]
Large Language Models (LLMs) are vulnerable to adversarial threats.
This paper presents an innovative defensive strategy, given white box access to an LLM.
We apply a novel methodology for analyzing distinctive activation patterns in the residual streams for attack prompt classification.
arXiv Detail & Related papers (2024-06-05T13:06:33Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models [0.0]
This article explores two attack categories: attacks on models themselves and attacks on model applications.
The former requires expertise, access to model data, and significant implementation time.
The latter is more accessible to attackers and has seen increased attention.
arXiv Detail & Related papers (2023-12-18T07:07:32Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities.
We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.