Whispers in the Machine: Confidentiality in LLM-integrated Systems
- URL: http://arxiv.org/abs/2402.06922v1
- Date: Sat, 10 Feb 2024 11:07:24 GMT
- Title: Whispers in the Machine: Confidentiality in LLM-integrated Systems
- Authors: Jonathan Evertz, Merlin Chlosta, Lea Sch\"onherr, Thorsten Eisenhofer
- Abstract summary: Large Language Models (LLMs) are increasingly integrated with external tools.
Malicious tools can exploit vulnerabilities in the LLM itself to manipulate the model and compromise the data of other services.
We provide a systematic way of evaluating confidentiality in LLM-integrated systems.
- Score: 5.500627268249088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly integrated with external tools.
While these integrations can significantly improve the functionality of LLMs,
they also create a new attack surface where confidential data may be disclosed
between different components. Specifically, malicious tools can exploit
vulnerabilities in the LLM itself to manipulate the model and compromise the
data of other services, raising the question of how private data can be
protected in the context of LLM integrations.
In this work, we provide a systematic way of evaluating confidentiality in
LLM-integrated systems. For this, we formalize a "secret key" game that can
capture the ability of a model to conceal private information. This enables us
to compare the vulnerability of a model against confidentiality attacks and
also the effectiveness of different defense strategies. In this framework, we
evaluate eight previously published attacks and four defenses. We find that
current defenses lack generalization across attack strategies. Building on this
analysis, we propose a method for robustness fine-tuning, inspired by
adversarial training. This approach is effective in lowering the success rate
of attackers and in improving the system's resilience against unknown attacks.
Related papers
- LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models [50.794651919028965]
Recent studies have demonstrated the vulnerability of sequential recommender systems to Model Extraction Attacks (MEAs)<n>Black-box attacks in prior MEAs are ineffective at exposing recommender system vulnerabilities due to random sampling in data selection.<n>We propose LLM4MEA, a novel model extraction method that leverages Large Language Models (LLMs) as human-like rankers to generate data.
arXiv Detail & Related papers (2025-07-22T19:20:23Z) - Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z) - Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning [4.275908952997288]
Federated Learning (FL) is a collaborative learning framework designed to protect client data.<n>Despite FL's privacy-preserving goals, its distributed nature makes it particularly susceptible to model extraction attacks.<n>This paper examines the vulnerability of FL-based victim models to two types of model extraction attacks.
arXiv Detail & Related papers (2025-05-25T22:40:10Z) - Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms [0.9091225937132784]
We reveal a critical control-plane attack surface to traditional data-plane vulnerabilities.
We introduce Constrained Decoding Attack, a novel jailbreak class that weaponizes structured output constraints to bypass safety mechanisms.
Our findings identify a critical security blind spot in current LLM architectures and urge a paradigm shift in LLM safety to address control-plane vulnerabilities.
arXiv Detail & Related papers (2025-03-31T15:08:06Z) - EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling [2.494935495983421]
This paper proposes a novel ensemble attack method that integrates several existing MIAs techniques into an XGBoost-based model to enhance overall attack performance (EM-MIAs)
Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets.
arXiv Detail & Related papers (2024-12-23T03:47:54Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - A Blackbox Model Is All You Need to Breach Privacy: Smart Grid
Forecasting Models as a Use Case [0.7714988183435832]
We show that a black box access to an LSTM model can reveal a significant amount of information equivalent to having access to the data itself.
This highlights the importance of protecting forecasting models at the same level as the data.
arXiv Detail & Related papers (2023-09-04T11:07:37Z) - On the Evaluation of User Privacy in Deep Neural Networks using Timing
Side Channel [14.350301915592027]
We identify and report a novel data-dependent timing side-channel leakage (termed Class Leakage) in Deep Learning (DL) implementations.
We demonstrate a practical inference-time attack where an adversary with user privilege and hard-label blackbox access to an ML can exploit Class Leakage.
We develop an easy-to-implement countermeasure by making a constant-time branching operation that alleviates the Class Leakage.
arXiv Detail & Related papers (2022-08-01T19:38:16Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Risk Management Framework for Machine Learning Security [7.678455181587705]
Adversarial attacks for machine learning models have become a highly studied topic both in academia and industry.
In this paper, we outline a novel framework to guide the risk management process for organizations reliant on machine learning models.
arXiv Detail & Related papers (2020-12-09T06:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.