Related papers: Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

URL: http://arxiv.org/abs/2310.15007v2
Date: Mon, 15 Jul 2024 19:12:43 GMT
Title: Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models
Authors: Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye,
Abstract summary: We propose a black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B. We show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task.
Score: 17.993892458845124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With large language models (LLMs) poised to become embedded in our daily lives, questions are starting to be raised about the data they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LLMs become increasingly reluctant to disclose details on their training corpus. We here introduce the task of document-level membership inference for real-world LLMs, i.e. inferring whether the LLM has seen a given document during training or not. First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date. We then propose a practical, black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B with both books and academic papers. We show our methodology to perform very well, reaching an AUC of 0.856 for books and 0.678 for papers. We then show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task. We further evaluate whether smaller models might be less sensitive to document-level inference and show OpenLLaMA-3B to be approximately as sensitive as OpenLLaMA-7B to our approach. Finally, we consider two mitigation strategies and find the AUC to slowly decrease when only partial documents are considered but to remain fairly high when the model precision is reduced. Taken together, our results show that accurate document-level membership can be inferred for LLMs, increasing the transparency of technology poised to change our lives.

Related papers

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
Training Plug-n-Play Knowledge Modules with Deep Context Distillation [52.94830874557649]
In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs) KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets.
arXiv Detail & Related papers (2025-03-11T01:07:57Z)
LitLLMs, LLMs for Literature Review: Are we there yet? [15.785989492351684]
This paper explores the zero-shot abilities of recent Large Language Models in assisting with the writing of literature reviews based on an abstract. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review.
arXiv Detail & Related papers (2024-12-15T01:12:26Z)
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations [17.927224387698903]
Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. In recent times research in the area of Machine Unlearning (MUL) has become active. Main idea is to force LLMs to forget (unlearn) certain information (e.g., PII) without suffering from performance loss on regular tasks.
arXiv Detail & Related papers (2024-11-23T07:20:36Z)
Membership Inference Attack against Long-Context Large Language Models [8.788010048413188]
We argue that integrating all information into the long context makes it a repository of sensitive information. We propose six membership inference attack strategies tailored for LCLMs. We examine the underlying reasons why LCLMs are susceptible to revealing such membership information.
arXiv Detail & Related papers (2024-11-18T09:50:54Z)
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference [39.29939437034823]
We propose a novel unlearning framework called Unlearning from Logit Difference (ULD) Our method efficiently achieves the intended forgetting while preserving the LLM's overall capabilities, reducing training time by more than threefold.
arXiv Detail & Related papers (2024-06-12T19:26:35Z)
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z)
ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts. In this paper, we identify the common characteristics shared by these models. We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z)
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models [1.443696537295348]
Privacy leakage and copyright violation are still underexplored. Our unlearning algorithms are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee.
arXiv Detail & Related papers (2024-03-13T18:57:30Z)
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG. InFO-RAG is low-cost and general across various tasks. It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z)
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction [36.40833517478628]
Large language models require updates to remain up-to-date or adapt to new domains. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. Despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence.
arXiv Detail & Related papers (2024-02-16T06:29:16Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.