Related papers: Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications

Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications

URL: http://arxiv.org/abs/2311.16153v2
Date: Wed, 29 Nov 2023 03:43:03 GMT
Title: Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Authors: Fengqing Jiang, Zhangchen Xu, Luyao Niu, Boxin Wang, Jinyuan Jia, Bo Li, Radha Poovendran
Abstract summary: Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications. In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle. We identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat. We develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.
Score: 37.316238236750415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications such as code completion and AI-powered search. LLM-integrated applications serve as middleware to refine users' queries with domain-specific knowledge to better inform LLMs and enhance the responses. Despite numerous opportunities and benefits, LLM-integrated applications also introduce new attack surfaces. Understanding, minimizing, and eliminating these emerging attack surfaces is a new area of research. In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle. We focus on the communication rounds that begin with user's queries and end with LLM-integrated application returning responses to the queries, powered by LLMs at the service backend. For this query-response protocol, we identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat initiator that is able to control the database access, manipulate and poison data that are high-risk for the user. Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator. We assess such threats against LLM-integrated applications empowered by OpenAI GPT-3.5 and GPT-4. Our empirical results show that the threats can effectively bypass the restrictions and moderation policies of OpenAI, resulting in users receiving responses that contain bias, toxic content, privacy risk, and disinformation. To mitigate those threats, we identify and define four key properties, namely integrity, source identification, attack detectability, and utility preservation, that need to be satisfied by a safe LLM-integrated application. Based on these properties, we develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.

Related papers

Security Steerability is All You Need [3.475823664889679]
We show that while LLMs cannot protect against ad-hoc application specific threats, they can provide the framework for applications to protect themselves against such threats. Our first contribution is defining Security Steerability - a novel security measure for LLMs, assessing the model's capability to adhere to strict guardrails that are defined in the system prompt. Our second contribution is a methodology to measure the security steerability of LLMs, utilizing two newly-developed datasets.
arXiv Detail & Related papers (2025-04-28T06:40:01Z)
Do LLMs Consider Security? An Empirical Study on Responses to Programming Questions [10.69738882390809]
ChatGPT can volunteer context-specific information to developers, promoting safe coding practices. We evaluate the degree of security awareness exhibited by three prominent LLMs: Claude 3, GPT-4, and Llama 3. Our findings show that all three models struggle to accurately detect and warn users about vulnerabilities, achieving a detection rate of only 12.6% to 40% across our datasets.
arXiv Detail & Related papers (2025-02-20T02:20:06Z)
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs) In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z)
Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z)
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications [0.0]
Large Language Models (LLMs) have revolutionized various applications by providing advanced natural language processing capabilities. This paper explores the threat modeling and risk analysis specifically tailored for LLM-powered applications.
arXiv Detail & Related papers (2024-06-16T16:43:58Z)
Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications [10.06789804722156]
We reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process. Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33% success rate, and achieve a 66.67% success rate in the real-world application.
arXiv Detail & Related papers (2024-04-26T07:11:18Z)
Prompt Leakage effect and defense strategies for multi-turn LLM interactions [95.33778028192593]
Leakage of system prompts may compromise intellectual property and act as adversarial reconnaissance for an attacker. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts.
arXiv Detail & Related papers (2024-04-24T23:39:58Z)
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models [6.931433424951554]
Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We evaluate multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama.
arXiv Detail & Related papers (2024-04-19T20:11:12Z)
Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal [0.0]
We propose a risk assessment process using tools like the risk rating methodology which is used for traditional systems. We conduct scenario analysis to identify potential threat agents and map the dependent system components against vulnerability factors. We also map threats against three key stakeholder groups.
arXiv Detail & Related papers (2024-03-20T05:17:22Z)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI) Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z)
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities. Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content. We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z)
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models. We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.