Related papers: Information Leakage from Embedding in Large Language Models

Information Leakage from Embedding in Large Language Models

URL: http://arxiv.org/abs/2405.11916v3
Date: Wed, 22 May 2024 04:04:17 GMT
Title: Information Leakage from Embedding in Large Language Models
Authors: Zhipeng Wan, Anda Cheng, Yinggui Wang, Lei Wang,
Abstract summary: This study aims to investigate the potential for privacy invasion through input reconstruction attacks. We first propose two base methods to reconstruct original texts from a model's hidden states. We then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers.
Score: 5.475800773759642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find that these two methods are effective in attacking the embeddings from shallow layers, but their effectiveness decreases when attacking embeddings from deeper layers. To address this issue, we then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers. Our analysis reveals that Embed Parrot effectively reconstructs original inputs from the hidden states of ChatGLM-6B and Llama2-7B, showcasing stable performance across various token lengths and data distributions. To mitigate the risk of privacy breaches, we introduce a defense mechanism to deter exploitation of the embedding reconstruction process. Our findings emphasize the importance of safeguarding user privacy in distributed learning systems and contribute valuable insights to enhance the security protocols within such environments.

Related papers

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models [9.551704901720726]
The privacy violation attack (PVA) was revealed by Staab et al., introduces serious personal privacy issues.<n>Existing defense methods mainly leverage large language models (LLMs) to anonymize the input query.<n>We propose a novel defense paradigm based on retrieval-confused generation (RCG) of LLMs, which can efficiently and covertly defend the PVA.
arXiv Detail & Related papers (2025-06-24T07:28:29Z)
PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII) Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage. We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z)
PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models [51.458089902581456]
We introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images. Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.
arXiv Detail & Related papers (2025-02-22T09:47:55Z)
Hidden Data Privacy Breaches in Federated Learning [24.47236055167954]
Federated Learning (FL) emerged as a paradigm for conducting machine learning across broad and decentralized datasets. Recent studies show that attackers can steal private data through model manipulation or gradient analysis. We propose a novel data-reconstruction attack leveraging malicious code injection, supported by two key techniques.
arXiv Detail & Related papers (2024-11-27T12:04:37Z)
Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity [5.7601856226895665]
We propose Subword Embedding from Bytes (SEB) and encode subwords to byte sequences using deep neural networks. Our solution outperforms conventional approaches by preserving privacy without sacrificing efficiency or accuracy. We verify SEB obtains comparable and even better results over standard subword embedding methods in machine translation, sentiment analysis, and language modeling.
arXiv Detail & Related papers (2024-10-21T18:25:24Z)
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal. Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z)
Defending against Data Poisoning Attacks in Federated Learning via User Elimination [0.0]
This paper introduces a novel framework focused on the strategic elimination of adversarial users within a federated model. We detect anomalies in the aggregation phase of the Federated Algorithm, by integrating metadata gathered by the local training instances with Differential Privacy techniques. Our experiments demonstrate the efficacy of our methods, significantly mitigating the risk of data poisoning while maintaining user privacy and model performance.
arXiv Detail & Related papers (2024-04-19T10:36:00Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Visual Privacy Auditing with Diffusion Models [52.866433097406656]
We propose a reconstruction attack based on diffusion models (DMs) that assumes adversary access to real-world image priors. We show that (1) real-world data priors significantly influence reconstruction success, (2) current reconstruction bounds do not model the risk posed by data priors well, and (3) DMs can serve as effective auditing tools for visualizing privacy leakage.
arXiv Detail & Related papers (2024-03-12T12:18:55Z)
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems [28.175920880194223]
We investigate factors related to embedding models that may impact text recoverability via Vec2Text. We propose a simple embedding transformation fix that guarantees equal ranking effectiveness while mitigating the recoverability risk.
arXiv Detail & Related papers (2024-02-20T07:49:30Z)
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks. We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z)
Defending against Reconstruction Attacks with R\'enyi Differential Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget. We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z)
Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity. We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z)
Federated Deep Learning with Bayesian Privacy [28.99404058773532]
Federated learning (FL) aims to protect data privacy by cooperatively learning a model without sharing private data among users. Homomorphic encryption (HE) based methods provide secure privacy protections but suffer from extremely high computational and communication overheads. Deep learning with Differential Privacy (DP) was implemented as a practical learning algorithm at a manageable cost in complexity.
arXiv Detail & Related papers (2021-09-27T12:48:40Z)
Rethinking Privacy Preserving Deep Learning: How to Evaluate and Thwart Privacy Attacks [31.34410250008759]
This paper measures the trade-off between model accuracy and privacy losses incurred by reconstruction, tracing and membership attacks. Experiments show that model accuracies are improved on average by 5-20% compared with baseline mechanisms.
arXiv Detail & Related papers (2020-06-20T15:48:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.