Related papers: The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers

The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers

URL: http://arxiv.org/abs/2510.16122v1
Date: Fri, 17 Oct 2025 18:09:33 GMT
Title: The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers
Authors: Owais Makroo, Siva Rajesh Kasa, Sumegh Roychowdhury, Karan Gupta, Nikhil Pattisapu, Santhosh Kasa, Sumit Negi,
Abstract summary: Membership Inference Attacks (MIAs) pose a critical privacy threat by enabling adversaries to determine whether a specific sample was included in a model's training dataset.<n>We show that fully generative classifiers which explicitly model the joint likelihood $P(X,Y)$ are most vulnerable to membership leakage.
Score: 6.542294761666199
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Membership Inference Attacks (MIAs) pose a critical privacy threat by enabling adversaries to determine whether a specific sample was included in a model's training dataset. Despite extensive research on MIAs, systematic comparisons between generative and discriminative classifiers remain limited. This work addresses this gap by first providing theoretical motivation for why generative classifiers exhibit heightened susceptibility to MIAs, then validating these insights through comprehensive empirical evaluation. Our study encompasses discriminative, generative, and pseudo-generative text classifiers across varying training data volumes, evaluated on nine benchmark datasets. Employing a diverse array of MIA strategies, we consistently demonstrate that fully generative classifiers which explicitly model the joint likelihood $P(X,Y)$ are most vulnerable to membership leakage. Furthermore, we observe that the canonical inference approach commonly used in generative classifiers significantly amplifies this privacy risk. These findings reveal a fundamental utility-privacy trade-off inherent in classifier design, underscoring the critical need for caution when deploying generative classifiers in privacy-sensitive applications. Our results motivate future research directions in developing privacy-preserving generative classifiers that can maintain utility while mitigating membership inference vulnerabilities.

Related papers

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation [3.797867929356259]
Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections.<n>Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model.<n>We propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of security and privacy constraints from the generation process itself.
arXiv Detail & Related papers (2026-01-16T11:22:02Z)
Generative Classifiers Avoid Shortcut Solutions [84.23247217037134]
Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail under minor distribution shift.<n>We show that generative classifiers can avoid this issue by modeling all features, both core and spurious, instead of mainly spurious ones.<n>We find that diffusion-based and autorerimigressive generative classifiers achieve state-of-the-art performance on five standard image and text distribution shift benchmarks.
arXiv Detail & Related papers (2025-12-31T18:31:46Z)
Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis [9.529147118376464]
Membership inference attacks (MIAs) reveal whether specific data was used to train machine learning models.<n>Our work explores how examining internal representations, rather than just their outputs, may provide additional insights into potential membership inference signals.<n>Our findings suggest that internal model behaviors can reveal aspects of training data exposure even when output-based signals appear protected.
arXiv Detail & Related papers (2025-09-05T19:05:49Z)
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z)
Preference Learning for AI Alignment: a Causal Perspective [55.2480439325792]
We frame this problem in a causal paradigm, providing the rich toolbox of causality to identify persistent challenges.<n>Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation.<n>We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness.
arXiv Detail & Related papers (2025-06-06T10:45:42Z)
Center-Based Relaxed Learning Against Membership Inference Attacks [3.301728339780329]
We propose a new architecture-agnostic training paradigm called center-based relaxed learning (CRL) CRL is adaptive to any classification model and provides privacy preservation by sacrificing a minimal or no loss of model generalizability. We empirically show that this approach exhibits comparable performance without requiring additional model capacity or data costs.
arXiv Detail & Related papers (2024-04-26T19:41:08Z)
GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure. First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
Generative Models with Information-Theoretic Protection Against Membership Inference Attacks [6.840474688871695]
Deep generative models, such as Generative Adversarial Networks (GANs), synthesize diverse high-fidelity data samples. GANs may disclose private information from the data they are trained on, making them susceptible to adversarial attacks. We propose an information theoretically motivated regularization term that prevents the generative model from overfitting to training data and encourages generalizability.
arXiv Detail & Related papers (2022-05-31T19:29:55Z)
Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)
Systematic Evaluation of Privacy Risks of Machine Learning Models [41.017707772150835]
We show that prior work on membership inference attacks may severely underestimate the privacy risks. We first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks. We then introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score.
arXiv Detail & Related papers (2020-03-24T00:53:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.