Related papers: Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks

Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks

URL: http://arxiv.org/abs/2504.16609v1
Date: Wed, 23 Apr 2025 10:50:23 GMT
Title: Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks
Authors: Antonios Tragoudaras, Theofanis Aslanidis, Emmanouil Georgios Lionis, Marina Orozco González, Panagiotis Eustratiadis,
Abstract summary: In this study, we reproduce GEIA's findings across various neural sentence embedding models.<n>We propose a simple yet effective method without any modification to the attacker's architecture proposed in GEIA.<n>Our findings indicate that following our approach, an adversary party can recover meaningful sensitive information related to the pre-training knowledge of the popular models used for creating sentence embeddings.
Score: 1.6427658855248815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text data are often encoded as dense vectors, known as embeddings, which capture semantic, syntactic, contextual, and domain-specific information. These embeddings, widely adopted in various applications, inherently contain rich information that may be susceptible to leakage under certain attacks. The GEIA framework highlights vulnerabilities in sentence embeddings, demonstrating that they can reveal the original sentences they represent. In this study, we reproduce GEIA's findings across various neural sentence embedding models. Additionally, we contribute new analysis to examine whether these models leak sensitive information from their training datasets. We propose a simple yet effective method without any modification to the attacker's architecture proposed in GEIA. The key idea is to examine differences between log-likelihood for masked and original variants of data that sentence embedding models have been pre-trained on, calculated on the embedding space of the attacker. Our findings indicate that following our approach, an adversary party can recover meaningful sensitive information related to the pre-training knowledge of the popular models used for creating sentence embeddings, seriously undermining their security. Our code is available on: https://github.com/taslanidis/GEIA

Related papers

ALGEN: Few-shot Inversion Attacks on Textual Embeddings using Alignment and Generation [9.220337458064765]
We present a Few-shot Textual Embedding Inversion Attack using ALignment and GENeration (ALGEN) We find that ALGEN attacks can be effectively transferred across domains and languages, revealing key information. We establish a new textual embedding inversion paradigm with broader applications for embedding alignment in NLP.
arXiv Detail & Related papers (2025-02-16T23:11:13Z)
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space [35.61862064581971]
Language models (LMs) risk inadvertently memorizing and divulging sensitive or personally identifiable information (PII) seen in training data, causing privacy concerns. We propose REVS, a novel non-gradient-based method for unlearning sensitive information from LMs.
arXiv Detail & Related papers (2024-06-13T17:02:32Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Adaptive Domain Inference Attack with Concept Hierarchy [4.772368796656325]
Most known model-targeted attacks assume attackers have learned the application domain or training data distribution.<n>Can removing the domain information from model APIs protect models from these attacks?<n>We show that the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data.
arXiv Detail & Related papers (2023-12-22T22:04:13Z)
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights. We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks. We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z)
Substance or Style: What Does Your Image Embedding Know? [55.676463077772866]
Image foundation models have primarily been evaluated for semantic content. We measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE)
arXiv Detail & Related papers (2023-07-10T22:40:10Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models [27.100909068228813]
Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. In this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector. Experimental results on sentiment analysis and sentence-pair classification tasks show that our method is more efficient and stealthier.
arXiv Detail & Related papers (2021-03-29T12:19:45Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Information Leakage in Embedding Models [19.497371893593918]
We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data. We develop three classes of attacks to systematically study information that might be leaked by embeddings.
arXiv Detail & Related papers (2020-03-31T18:33:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.