Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs
- URL: http://arxiv.org/abs/2509.04615v1
- Date: Thu, 04 Sep 2025 18:59:07 GMT
- Title: Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs
- Authors: Brennen Hill, Surendra Parla, Venkata Abhijeeth Balabhadruni, Atharv Prajod Padmalayam, Sujay Chandra Shekara Sharma,
- Abstract summary: Large Language Models (LLMs) have introduced critical security challenges.<n> adversarial actors can manipulate input prompts to cause significant harm and circumvent safety alignments.<n>This survey aims to inform the research community's efforts in building the next generation of secure LLMs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of Large Language Models (LLMs) has introduced critical security challenges, where adversarial actors can manipulate input prompts to cause significant harm and circumvent safety alignments. These prompt-based attacks exploit vulnerabilities in a model's design, training, and contextual understanding, leading to intellectual property theft, misinformation generation, and erosion of user trust. A systematic understanding of these attack vectors is the foundational step toward developing robust countermeasures. This paper presents a comprehensive literature survey of prompt-based attack methodologies, categorizing them to provide a clear threat model. By detailing the mechanisms and impacts of these exploits, this survey aims to inform the research community's efforts in building the next generation of secure LLMs that are inherently resistant to unauthorized distillation, fine-tuning, and editing.
Related papers
- Exploiting Web Search Tools of AI Agents for Data Exfiltration [0.46664938579243564]
Large language models (LLMs) are now routinely used to execute complex tasks, from natural language processing to dynamic like web searches.<n>The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse.<n>We analyze how susceptible current LLMs are to indirect prompt injection attacks, which parameters, including model size and manufacturer, shape their vulnerability, and which attack methods remain most effective.
arXiv Detail & Related papers (2025-10-10T07:39:01Z) - A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - Security Concerns for Large Language Models: A Survey [4.1824815480811806]
Large Language Models (LLMs) have caused a revolution in natural language processing, but their capabilities also introduce new security vulnerabilities.<n>This survey provides a comprehensive overview of these emerging concerns, categorizing threats into several key areas.<n>We conclude by emphasizing the importance of advancing robust, multi-layered security strategies to ensure LLMs are safe and beneficial.
arXiv Detail & Related papers (2025-05-24T22:22:43Z) - LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures [49.1574468325115]
This survey seeks to define and categorize the various attacks targeting large language models (LLMs)<n>A thorough analysis of these attacks is presented, alongside an exploration of defense mechanisms designed to mitigate such threats.
arXiv Detail & Related papers (2025-05-02T10:35:26Z) - Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)<n>In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.<n>We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation [4.241100280846233]
AI agents, powered by large language models (LLMs), have transformed human-computer interactions by enabling seamless, natural, and context-aware communication.<n>This paper investigates a critical vulnerability: adversarial attacks targeting the LLM core within AI agents.
arXiv Detail & Related papers (2024-12-05T18:38:30Z) - Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO)
This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z) - Recent Advances in Attack and Defense Approaches of Large Language Models [27.271665614205034]
Large Language Models (LLMs) have revolutionized artificial intelligence and machine learning through their advanced text processing and generating capabilities.<n>Their widespread deployment has raised significant safety and reliability concerns.<n>This paper reviews current research on LLM vulnerabilities and threats, and evaluates the effectiveness of contemporary defense mechanisms.
arXiv Detail & Related papers (2024-09-05T06:31:37Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [18.624280305864804]
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP)
This paper presents a comprehensive survey of the various forms of attacks targeting LLMs.
We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.
arXiv Detail & Related papers (2024-03-03T04:46:21Z) - Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and
Vulnerabilities [14.684194175806203]
Large language models (LLMs) can be misused for fraud, impersonation, and the generation of malware.
We present a taxonomy describing the relationship between threats caused by the generative capabilities of LLMs, prevention measures intended to address such threats, and vulnerabilities arising from imperfect prevention measures.
arXiv Detail & Related papers (2023-08-24T14:45:50Z) - Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following.
This capability brings with it the risk of prompt injection attacks.
We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.