Related papers: Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

URL: http://arxiv.org/abs/2409.03363v2
Date: Wed, 15 Jan 2025 01:51:55 GMT
Title: Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
Authors: Cheng Wang, Yiwei Wang, Bryan Hooi, Yujun Cai, Nanyun Peng, Kai-Wei Chang,
Abstract summary: Existing methods typically analyze target text in isolation or solely with non-member contexts.<n>We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
Score: 118.75567341513897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The training data in large language models is key to their success, but it also presents privacy and security risks, as it may contain sensitive information. Detecting pre-training data is crucial for mitigating these concerns. Existing methods typically analyze target text in isolation or solely with non-member contexts, overlooking potential insights from simultaneously considering both member and non-member contexts. While previous work suggested that member contexts provide little information due to the minor distributional shift they induce, our analysis reveals that these subtle shifts can be effectively leveraged when contrasted with non-member contexts. In this paper, we propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts through contrastive decoding, amplifying subtle differences to enhance membership inference. Extensive empirical evaluations demonstrate that Con-ReCall achieves state-of-the-art performance on the WikiMIA benchmark and is robust against various text manipulation techniques.

Related papers

Auditing Language Model Unlearning via Information Decomposition [68.48660428111593]
We introduce an interpretable, information-theoretic framework for auditing unlearning using Partial Information Decomposition (PID)<n>By comparing model representations before and after unlearning, we decompose the mutual information with the forgotten data into distinct components, formalizing the notions of unlearned and residual knowledge.<n>Our work introduces a principled, representation-level audit for unlearning, offering theoretical insight and actionable tools for safer deployment of language models.
arXiv Detail & Related papers (2026-01-21T15:51:19Z)
ContextLeak: Auditing Leakage in Private In-Context Learning Methods [24.89856411893133]
We introduce ContextLeak, the first framework to empirically measure the worst-case information leakage in ICL.<n>We find that ContextLeak tightly correlates with the theoretical privacy budget and reliably detects leakage.<n>Our results further reveal that existing methods often strike poor privacy-utility trade-offs, either leaking sensitive information or severely degrading performance.
arXiv Detail & Related papers (2025-12-18T00:53:19Z)
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis [54.53152524778821]
integration of speech into Large Language Models (LLMs) has substantially expanded their capabilities, but often at the cost of weakening their core textual competence.<n>We propose an analytical framework based on parameter importance estimation, which reveals that fine-tuning for speech introduces a textual importance distribution shift.<n>We investigate two mitigation strategies: layer-wise learning rate scheduling and Low-Rank Adaptation (LoRA)<n> Experimental results show that both approaches better maintain textual competence than full fine-tuning, while also improving downstream spoken question answering performance.
arXiv Detail & Related papers (2025-09-28T09:04:40Z)
Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations [9.082885521130617]
We propose a novel approach combining profile generation via hierarchical interaction summarization (PGHIS) with contrastive prompting for explanation generation (CPEG)<n>Our approach outperforms existing state-of-the-art methods, achieving a great improvement on metrics about explainability (e.g., 5% on GPTScore) and text quality.
arXiv Detail & Related papers (2025-07-08T14:45:47Z)
Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis [20.503153899462323]
We propose a framework for semi-supervised sentiment analysis. We introduce two prompting strategies to semantically enhance unlabeled text. Experiments show our method achieves remarkable performance over prior semi-supervised methods.
arXiv Detail & Related papers (2025-01-29T12:03:11Z)
On the loss of context-awareness in general instruction fine-tuning [101.03941308894191]
Post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs can harm existing capabilities learned during pretraining. We propose two methods to mitigate the loss of context awareness in instruct models: post-hoc attention steering on user prompts and conditional instruction fine-tuning with a context-dependency indicator.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset [1.825224193230824]
We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation. Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
arXiv Detail & Related papers (2024-08-01T19:11:08Z)
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z)
READ: Improving Relation Extraction from an ADversarial Perspective [33.44949503459933]
We propose an adversarial training method specifically designed for relation extraction (RE) Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations.
arXiv Detail & Related papers (2024-04-02T16:42:44Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition [4.7938839332508945]
We propose a Prompt-based Logical Semantics Enhancement (PLSE) method for Implicit Discourse Relation Recognition (IDRR) Our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction. Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
arXiv Detail & Related papers (2023-11-01T08:38:08Z)
Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts. Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks. We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z)
Semantic Interactive Learning for Text Classification: A Constructive Approach for Contextual Interactions [0.0]
We propose a novel interaction framework called Semantic Interactive Learning for the text domain. We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that enables more semantic alignment between humans and machines. We introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples.
arXiv Detail & Related papers (2022-09-07T08:13:45Z)
Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity. We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach. We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.